A reinforcement learning approach to improve the performance of the Avellaneda-Stoikov market-making algorithm PLOS ONE

time step

But for now, it is essential to know that using a significant κ value, you are assuming that the order book is denser, and your optimal spread will have to be smaller since there is more competition on the market. There is a lot of mathematical detail on the paper explaining how they arrive at this factor by assuming exponential arrival rates. There are many different models around with varying methodologies on how to calculate the value. The model was created before Satoshi Nakamoto mined the first Bitcoin block, before the creation of trading markets that are open 24/7. But as its value increases, the distance between the mid-price and the reservation price will increase when the trader inventory is different from his target.

genetic algorithm

More advanced models have been developed with adverse selection effects and stronger market order dynamics, see for example the paper of Cartea et al. . Guéant et al. have extended and formalized the results of Avellaneda and Stoikov . Another extended market making model with inventory constraints has been provided by Fodra and Labadie who consider a general case of midprice by linear and exponential utility criteria and find closed-form solutions for the optimal spreads. Cartea and Jaimungal have proposed a solution to deal with the problem of including the market impact on the midprice and have worked on risk metrics for the high-frequency trading strategies they have developed. Moreover, Yang et al. have improved the existing models with Heston stochastic volatility model, to characterize the volatility of the stock price with price impact and, implemented an approximation method to solve the nonlinear HJB equation.

Avellaneda & Stoikov MM paper

You will be asked the avellaneda-stoikov paperimum and minimum spread you want hummingbot to use on the following two questions. Cryptocurrency markets are 24/7, so there is no market closing time. On hummingbot, you choose what the asset inventory target is, and the bot calculates the value of q. On the other hand, using a smaller κ, you are assuming the order book has low liquidity, and you can use a more extensive spread. On Hummingbot, the value of q is calculated based on the target inventory percentage you are aiming for. This article will simplify what each of these formulas and values means.

  • Fortunately, the stochastic control theory helps to handle such kind of optimization problem by seeking an optimal strategy in order to maximize the trader’s objective function and to face a dyadic problem for the high-frequency trading.
  • A single parent individual is selected randomly from the current population , with a selection probability proportional to the Sharpe score it has achieved (thus, higher-scoring individuals have a greater probability of passing on their genes).
  • The 10 generations thus yield a total of 450 individuals, ranked by their Sharpe ratio.
  • The price to pay is a diminished nuance in the learning from very large values, while retaining a higher sensitivity for the majority, which are much smaller.
  • We show that the optimal full information spreads are biased when the exact market regime is unknown, and the market maker needs to adjust for additional regime uncertainty in terms of P&L sensitivity and observed order flow volatility.

In the training phase we fit our two Alpha-AS models with data from a full day of trading . In this, the most time-consuming step of the backtest process, our algorithms learned from their trading environment what AS model parameter values to choose every five seconds of trading (in those 5 seconds; see Section 4.1.3). Consequently, the Alpha-AS agent adapts its bid and ask order prices dynamically, reacting closely (at 5-second steps) to the changing market. This 5-second interval allows the Alpha-AS algorithm to acquire experience trading with a certain bid and ask price repeatedly under quasi-current market conditions.

[Level 1] Basic Concepts of Crypto Trading

The back-test experiment on China’s A-share market shows that IIFI achieves superior performance — the stock profitability can be increased by more than 20% over the baseline methods. Meanwhile, interpretable results show that IIFI can effectively distinguish between important and redundant features via rating corresponding scores to each feature. As a byproduct of our interpretable methods, the scores over features can be used to further optimize the investment strategy. In this paper, we investigated the high-frequency trading strategies for a market maker using a mean-reverting stochastic volatility models that involve the influence of both arrival and filled market orders of the underlying asset.

The avellaneda-stoikov paper of the neural network has room for improvement through systematic optimisation of the network’s parameters. Characterisation of different market conditions and specific training under them, with appropriate data , can also broaden and improve the agent’s strategic repertoire. The agent’s action space itself can potentially also be enriched profitably, by adding more values for the agent to choose from and making more parameters settable by the agent, beyond the two used in the present study (i.e., risk aversion and skew). In the present study we have simply chosen the finite value sets for these two parameters that we deem reasonable for modelling trading strategies of differing levels of risk. This helps to keep the models simple and shorten the training time of the neural network in order to test the idea of combining the Avellaneda-Stoikov procedure with reinforcement learning.

Optimal trading strategy and supply/demand dynamics

In the framework of the optimal trading strategy for high-frequency trading in a LOB, there have been many papers following early studies of Grossman and Miller and Ho and Stoll . Avellaneda and Stoikov have revised the study of Ho and Stoll building a practical model that considers a single dealer trading a single stock facing with a stochastic demand modeled by a continuous time Poisson process. The literature on the optimal market making problem has been burgeoning since 2008 with the work of Avellaneda and Stoikov , inspiring Guilbaud and Pham to derive a model involving limit and market orders with optimal stochastic spreads. Bayraktar and Ludkovski have considered the optimal liquidation problem where they model the order arrivals with intensities depending on the liquidation price.


A weighted average of the values of the two parents’ genes is then ETC computed. Mean decrease accuracy , a feature-specific estimate of average decrease in classification accuracy, across the tree ensemble, when the values of the feature are permuted between the samples of a test input set . To obtain MDA values we applied a random forest classifier to the dataset split in 4 folds.

To this approach, more specifically one based on deep reinforcement learning, we turn to next. The stochastic control problem of optimal market making is among the central problems in quantitative finance. In this paper, a deep reinforcement learning-based controller is trained on a weakly consistent, multivariate Hawkes process-based limit order book simulator to obtain market making controls. The proposed approach leverages the advantages of Monte Carlo backtesting and contributes to the line of research on market making under weakly consistent limit order book models.

Top 10 Quant Professors 2022 – Rebellion Research

Top 10 Quant Professors 2022.

Posted: Thu, 13 Oct 2022 07:00:00 GMT [source]

It is worth mentioning that the trader changes her qualitative behavior depending on the liquidation and penalizing variations of the constants and her positions on inventories as the time approaches to maturity. On the optimal quotes will have just the opposite effect of when ETC k is employed. Increases as the trader expects the price to move up, she sends the orders at higher prices to get profit from the price increase which meets with our expectation.

Table 2 shows that one or the other of the two Alpha-AS models achieved better Sharpe ratios, that is, better risk-adjusted returns, than all three baseline models on 24 (12+12) of the 30 test days. Furthermore, on 9 of the 12 days for which Alpha-AS-1 had the best Sharpe ratio, Alpha-AS-2 had the second best; conversely, there are 11 instances of Alpha-AS-1 performing second best after Alpha-AS-2. Thus, the Alpha-AS models came 1stand 2nd on 20 out of the 30 test days (67%). The mean and the median of the Sharpe ratio over all test days was better for both Alpha-AS models than for the Gen-AS model , and in turn the Gen-AS model performed significantly better on Sharpe than the two non-AS baselines. The results obtained suggest avenues to explore for further improvement. First, the reward function can be tweaked to penalise drawdowns more directly.


The https://www.beaxy.com/ DQN receives as input the state-defining features, with their values normalised, and it outputs a value between 0 and 1 for each action. The DQN has two hidden layers, each with 104 neurons, all applying a ReLu activation function. At the start of every 5-second time step, the latest state (as defined in Section 4.1.4) is fed as input to the prediction DQN.

  • We model the market-agent interplay as a Markov Decision Process with initially unknown state transition probabilities and rewards.
  • There is a general predominance of features corresponding to the latest orderbook movements (i.e., those denominated with low numerals, primarily 0 and 1).
  • On the optimal quotes will have just the opposite effect of when k is employed.
  • Section 5 describes the experimental setup for backtests that were performed on our RL models, the Gen-AS model and two simple baselines.
  • Table11 which is obtained from all simulations depicts the results of these two strategies.

Leave a Reply

Your email address will not be published. Required fields are marked *