Model and Data
The QR framework, estimation pipeline, and market impact
The Queue-Reactive Model
The queue-reactive (QR) model treats the limit order book as a continuous-time Markov jump process: the rate of each event (limit order, cancellation, market order) depends on the current state of the book.
In the original formulation of Huang et al. (2015), each queue is treated independently and event intensities are functions of individual queue sizes. We simplify the state conditioning: instead of the full queue vector, we project onto two quantities:
\[ \Phi(\text{LOB}) = (\text{Imb}, n) \]
where \(\text{Imb} = (q_{-1} - q_1)/(q_{-1} + q_1)\) is the volume imbalance at the best level and \(n\) is the bid-ask spread in ticks. This captures most of the bid-ask dependency that matters while keeping estimation tractable.
Sampling loop
In state \(\Phi(\text{LOB})\), the total event rate and per-event probabilities are:
\[ \Lambda = \sum_e \lambda^e(\Phi), \qquad p^e = \frac{\lambda^e(\Phi)}{\Lambda} \]
The simulation draws two independent quantities at each step:
- Which event: \(e^* \sim \text{Categorical}(p^e)\)
- When: \(\Delta t \sim \text{Exp}(\Lambda)\)
- How large: \(v^* \sim p(v \mid \Phi, e^*)\)
Then the event is applied to the book and the state updates.
Event set
When spread \(= 1\): Add and Cancel at \(q_{\pm 1}, q_{\pm 2}\); Trade at \(q_{\pm 1}\).
When spread \(\geq 2\): only CreateBid and CreateAsk fire (one tick inside the spread), closing the gap before normal activity resumes. This makes sense for large-tick assets since participants rarely cross a spread wider than 1 tick.
Random volumes
The importance of modelling random order sizes in the QR framework was noted by Bodor & Carlier (2025). In our implementation, we sample event sizes from empirical distributions conditioned on \((\Phi, e, q_{\pm i})\), capped at 50 MES. This lets queue depletion arise naturally from a single large trade rather than requiring a dedicated event type.
Data
Source
All estimation uses Databento MBP-10 data covering December 2023 to December 2025. The raw MBP-10 feed is passed through a preprocessing step (see src/qr/preprocessing.py) that aggregates trade and create messages, computes queue sizes, and normalises volumes by MES. Four S&P 500 constituents trading around the $30 mark: PFE, INTC, VZ, T.
To avoid open/close effects, we discard the first and last 30 minutes of each trading day, keeping the 10:00–15:30 ET window.
Preprocessing
Raw MBP-10 data is preprocessed into the QR format — one row per event with queue sizes, prices, imbalance, spread, and event classification:
from pathlib import Path
from qr.preprocessing import mbp10_to_qr
# One day of raw MBP-10 data
file = Path("data/PFE/mbp-10-raw/2024-06-03.parquet")
df = mbp10_to_qr(file).collect()See the docstring of mbp10_to_qr in src/qr/preprocessing.py for the full output schema.
For data loading we use lobib, an internal tool. You can replace this with whatever data loader fits your setup — mbp10_to_qr expects a standard Databento MBP-10 parquet file as input.
Estimation procedure
All parameters are estimated by maximum likelihood. For each state \(\Phi\):
Event probabilities: empirical frequencies \[ \hat{p}^e(\Phi) = \frac{\#\{k : e_k = e, \Phi_k = \Phi\}}{\#\{k : \Phi_k = \Phi\}} \]
Total intensity: inverse of mean waiting time \[ \hat{\Lambda}(\Phi) = \left(\frac{1}{N_\Phi}\sum_{k \in \mathcal{K}(\Phi)} \Delta t_k\right)^{-1} \]
Size distributions: empirical CDF per \((\Phi, e, q)\) cell
After estimation, all statistics are symmetrised between bid and ask: the probability of a bid event at imbalance \(+x\) is averaged with the corresponding ask event at \(-x\). This eliminates systematic drift and halves the number of parameters.
Output files
The estimation pipeline (run with uv) produces the following files in data/{ticker}/qr_params/:
event_probabilities.csv
Event probabilities \(p^e(\text{Imb}, n)\) for all events.
| Column | Type | Description |
|---|---|---|
imbalance |
float | Imbalance bin (0.0 to 1.0, positive half only — mirrored at load time) |
spread |
int | 1 or 2 |
event |
str | Add, Cancel, Trade, Create_Bid, Create_Ask |
queue |
int | Signed queue number: -2, -1, 1, 2 (0 for creates) |
side |
int | -1 (bid) or 1 (ask) |
probability |
float | Estimated \(\hat{p}^e\) |
delta_t_exponential.csv
Mean inter-arrival time per state, used for exponential \(\Delta t\) sampling.
| Column | Type | Description |
|---|---|---|
imbalance |
float | Imbalance bin |
spread |
int | 1 or 2 |
average_dt |
float | Mean \(\Delta t\) in nanoseconds |
delta_t_gmm.csv
5-component Gaussian mixture parameters for \(\log_{10}(\Delta t)\), per event type.
| Column | Type | Description |
|---|---|---|
imbalance |
float | Imbalance bin |
spread |
int | 1 or 2 |
event |
str | Event type |
queue |
int | Signed queue number |
side |
int | -1 or 1 |
w_1..w_5 |
float | Component weights |
mu_1..mu_5 |
float | Component means (in \(\log_{10}\) ns) |
sig_1..sig_5 |
float | Component standard deviations |
size_distrib.csv
Empirical size probability mass functions per \((\text{Imb}, e, q)\) cell.
| Column | Type | Description |
|---|---|---|
imbalance |
float | Imbalance bin |
spread |
float | 1 or 2 |
event |
str | Event type |
queue |
float | Signed queue number |
side |
float | -1 or 1 |
1..50 |
float | Probability of size \(v\) (in MES units) |
invariant_distributions_qmax100.csv
Stationary queue size distributions used by the order book to regenerate levels during the clean pass.
| Column | Type | Description |
|---|---|---|
queue_level |
int | Level index (1–4) |
0..100 |
float | Probability of queue size \(q\) (in MES units) |
params.json
{
"median_event_sizes": {"1": 200, "2": 200, "3": 150, "4": 100},
"total_best_quantiles": [17, 25, 35, 51]
}MES per queue level (in shares) and the quantile boundaries for total best volume binning.
Per-day intermediate results are stored as parquet files in data/{ticker}/daily_estimates/, one per trading day.
Biasing Probabilities
The QR model samples the next event from conditional probabilities \(p^e(\Phi)\). We can bias these probabilities to inject external signals into the book dynamics. The mechanism is general: multiply selected event probabilities by a factor and renormalise. The main application is market impact, but the same interface supports any signal (alpha, strategy feedback, etc.).
How it works
Each StateParams holds two probability vectors: base_probs (immutable, estimated from data) and probs (working copy that gets biased before sampling). Given a bias factor \(b\):
- If \(b > 0\): bid (sell) trade probabilities are multiplied by \(e^b\)
- If \(b < 0\): ask (buy) trade probabilities are multiplied by \(e^{-b}\)
- All non-trade events keep their base probabilities
The cumulative distribution is then recomputed for sampling. This is the actual implementation:
void bias(double b) {
total = 0.0;
for (size_t i = 0; i < events.size(); i++) {
if (events[i].type == OrderType::Trade) {
double factor = 1;
if ((events[i].side == Side::Bid) && (b > 0)) {
factor = std::exp(b);
} else if ((events[i].side == Side::Ask) && (b < 0)) {
factor = std::exp(-b);
}
probs[i] = base_probs[i] * factor;
} else
probs[i] = base_probs[i];
total += probs[i];
}
cum_probs[0] = probs[0];
for (size_t i = 1; i < probs.size(); i++)
cum_probs[i] = cum_probs[i - 1] + probs[i];
}The biasing is single-sided: only the side opposing the signal gets boosted. When b > 0 (e.g. recent net buying), only sell trades become more likely; when b < 0, only buy trades. Non-trade events are unaffected, so the overall event mix stays close to the baseline.
Sampling then draws from the biased distribution:
const Event &sample_event(std::mt19937_64 &rng) const {
double u = std::uniform_real_distribution<>(0, total)(rng);
auto it = std::lower_bound(cum_probs.begin(), cum_probs.end(), u);
return events[it - cum_probs.begin()];
}Note that sample_event draws uniformly in \([0, \text{total})\) rather than \([0, 1)\) — the probabilities are not renormalised to sum to 1, but the sampling range adjusts to total, which is equivalent and avoids a division pass.