One Trading & AWS enhance crypto trading to offer the lowest market latency
Written by Dr. Stefan Blackwood, Atiek Arian, Boris Litvin, Hani Masri, and Sercan Karaoglu
Latency is one of the sources of competitive edge and a continuous focus for exchanges and Market Makers in capital markets. Improving latency positively impacts the execution of trading strategies, enhancing liquidity and increasing profitability for market venues and participants. AWS possesses advanced technological capabilities that will upgrade One Trading’s network performance through cloud-native colocation.
In this blog, we explored the topic of network latency in crypto trading and the advantages provided to exchanges and Market Makers by using shared Amazon Elastic Compute Cloud (Amazon EC2) cluster placement groups (CPGs). One Trading will soon launch a new exchange product (F.A.S.T.) offering native colocation on the AWS Cloud. In the following article, wewill share representative testing performed in partnership with AWS.
AWS worked with One Trading, pre-launch, to quantify the expected Market Maker experience from the perspective of network latency across a range of potential exchange access topologies. We built a simplified high-frequency trading (HFT) client that implemented a specific strategy and measured round-trip times for simulated orders, including matching engine latency.
According to the AWS test results, One Trading's F.A.S.T achieved sub 74 microseconds for a roundtrip, and even at very high volumes, the round trip (create and cancel) reached 112 microseconds (400k orders per second) on p95 latencies through next-generation AWS network topologies. One Trading’s game-changing technology provides an average end-to-end trading of just 112 microseconds, quicker even than the 126 microseconds achieved on the London Stock Exchange (LSE) Group’s Turquoise, a leading Multilateral Trading Facility (MTF) (LSE, 2022). Further still, One Trading’s risk and matching engine, which operates below 1 microsecond (μs), is 1000x faster than CME’s Global Matching Engine, which operates 1-3 milliseconds (ms) (Lambert, 2023).
The matching engine's unprecedented 1μs latency remained consistent even during a 24-hour stress test, processing 40 billion orders without any performance degradation.
Our vision at One Trading is to bring the highest standards of the traditional finance world to digital asset trading through technology and regulation. We have spent the last two years pursuing a regulatory strategy that will enable us to develop products in an increasingly competitive industry and build F.A.S.T., a next-generation crypto digital asset trading venue for spot trading and, soon, regulated derivative products.
Dr Stefan Blackwood, Head of Quant Engineering at One Trading, said on working with AWS specialists during pre-launch activities,“We decided to work closely with AWS, looking to hit the types of latencies usually seen in the market leading traditional trading venues, but we wanted these to be available for all customer types whether they are retail or institutional. We embarked on an aggressive plan to bring our round-trip latency to under 200 microseconds.
Josh Barraclough, CEO of One Trading said, “Our goal was to build a trading venue that could provide the fastest price discovery, fastest execution and be able to maintain this without performance degradation whatever the volumes thrown at the venue. Our ambition is to scale this product, using the AWS Cloud, beyond digital assets, and under our new licensing structure we will be able to offer traditional securities for all customers.”
For testing purposes, a variety of network topologies were used to simulate real-world connectivity options that One Trading can provide for exchange access to their customers. Each topology provides a different latency profile and can be considered as different “connectivity tiers”. A simulated Market Maker AWS Account was created within which we deployed the test HFT client. Primary emphasis was placed on optimizing network latency for these tests, hence only a single AWS Availability Zone (AZ) was in scope when measuring latency. The same AZ is used across both Exchange and Market Maker accounts for all tests, to prevent additional network latency incurred from crossing AZ boundaries. Resilient architectures would involve the use of multiple AZs with leader-follower Amazon EC2 instances and synchronous or asynchronous replication mechanisms, usually implemented at the application layer. Our following diagrams depict a resilient configuration in two AZs.
For this test configuration, an Amazon Virtual Private Cloud (Amazon VPC) peering connection was created between the One Trading VPC and the Market Maker VPC. An Amazon EC2 CPG was then created in the One Trading AWS Account that was shared with the Market Maker AWS Account. Trade engine, order gateway, and matching engine Amazon EC2 instances were launched from both AWS Accounts into this CPG.
Amazon EC2 instances launched into a CPG benefit from greater locality on the underlying physical AWS network within the Availability Zone. Logical connectivity is achieved by using the peering connection and is private with traffic being carried over the AWS network in the AWS Region.
This topology provides for the lowest latency access to the exchange.
This test configuration is similar to the previous, without the use of Amazon EC2 CPGs. It incorporates the default Amazon EC2 instance placement strategy where instances are randomly provisioned from the perspective of underlying physical capacity in the Availability Zone. Hence this does not include any special provision for network locality between Amazon EC2 instances. Logical connectivity, as before, is private and achieved through Amazon VPC peering.
This topology provides for low, but not the lowest, latency access to the exchange.
For this test configuration, logical connectivity between One Trading and Market Maker VPCs is provided by AWS PrivateLink. PrivateLink is a fully managed service designed to provide connectivity between VPCs at extremely large scales (thousands of VPCs) and introduces additional hops in the network path (endpoints and load balancers).
This network topology continues to provide private connectivity over the AWS network within the AWS Region. PrivateLink requires the provision of an Elastic Load Balancer–Network Load Balancer inside the One Trading VPC through which the order gateways are exposed as a service. These gateways receive order flow traffic from PrivateLink endpoints created in the Market Maker VPC.
This topology provides for medium levels of latency access to the exchange.
In this test configuration, connectivity is provided through an Amazon CloudFront content distribution network. The CloudFront distribution originates from a public-facing Elastic Load Balancer, which routes traffic to the order gateways in the One Trading VPC. This network topology involves public connectivity and allows for exchange access from both inside and outside the AWS Region.
Market Makers with footprints inside the AWS Region will achieve optimal connectivity since it is routed over the AWS Region and border network and does not egress to the public Internet—despite traffic using a public IP space. Market Makers external to AWS can access the exchange by using their own Internet connectivity and will be routed to the nearest CloudFront point of presence (PoP).
If trade engines in the Market Maker VPC are required to be deployed into private subnets, this architecture can be supplemented by the addition of managed NAT Gateways in those VPCs. Do note that this would add additional latency.
Among all test configurations, this topology provides the highest latency access to the exchange.
We created a HFT client that was deployed on Amazon EC2 instances inside the Market Maker VPC. This client implemented a simplified low-latency trading algorithm that directed order flow across the various network test topologies to the One Trading exchange by executing the following steps:
The following (Figure 5) is a simple sequence diagram illustrating the preceding message flow steps.
Once this order cycle is completed, the entire order flow sequence is repeated. Multiple accounts were created on the exchange, and this process was executed in parallel. The business logic, therefore, tests both sequential and parallel execution on both the trading engine and exchange sides. Parallel execution also allowed for generating different order flow throughputs—low and high rates.
Low-rate configurations generated 10K messages per second, and high-rates 400K messages per second, at an average message payload size of 120 bytes.
Testing was conducted on Amazon EC2 c6id.metal instances running Amazon Linux 2023. The following application layer optimizations and techniques were implemented for the HFT client:
1. Thread processor affinity through CPU core pinning: To reduce latency caused by the copying of thread data and instructions into CPU cache, each thread is pinned to a distinct core. The operating system ensures a given thread only executes on a specific core.
2. Composite buffers: The HFT client implements Netty as the underlying application networking framework. Composite buffers in Netty reduce unnecessary object allocations and copy operations when merging multiple frames of data.
3. IO_uring: IO_uring is an asynchronous I/O interface for the Linux kernel. It implements shared memory ring buffers that provide a queue between the application and kernel space, reducing latency by eliminating additional system calls for application I/O operations.
4. Thread segregation: Threads responsible for network I/O are kept distinct from those that calculate the round-trip latencies and generate histogram data. This single responsibility model prevents latency incurred from business logic impacting order message transmission.
5. Reduce pressure from garbage collection (GC): Various techniques are used, including warming up Java virtual machine (JVM) processes to prevent repeated interpretation and compile native code for use from the cache, regular process restarts, and specific JVM parameters to reduce GC pressure.
We have made the HFT client available in this GitHub repository, where you can view the code and read more about how these optimizations are implemented.
To maintain a straightforward baseline, many additional stack optimizations typically implemented for specific HFT workload types were not applied for this testing. Workload types not applied were IRQ handling, CPU P-state and C-state controls, network buffers, kernel bypass, receive side scaling, transmit packet steering, Linux scheduler policies, and AWS Elastic Network Adapter tuning.
The tests were performed simultaneously across all test network topologies for a continuous 24-hour period. For the purposes of clarity and greatest utility, round-trip times do not include latency added by HFT client business logic and are, therefore, a clear representation of the network performance cost for each topology. The following table (Figure 6) displays the aggregated results.
The results obtained demonstrate that using Amazon VPC peering and shared Amazon EC2 cluster placement groups for high message rates at P99, is 98% faster than access by using the Internet with Amazon CloudFront and 41% faster than access by using Amazon VPC peering without the use of shared CPGs.
In this blog, we discussed how we worked with AWS during the pre-launch activities for our new F.A.S.T. exchange product. We demonstrated that our HFT client and the One Trading exchange, deployed on Amazon EC2 c6id.metal instances are capable of scaling to process a large number of orders per second with minimal application contention. Latency was introduced as a result of the various test network topologies deployed.
We showed how implementing an architecture involving Amazon VPC peering and shared Amazon EC2 cluster placement groups provide the lowest latency connectivity pattern.
While all topologies tested are functionally valid and can be provided by One Trading to different categories of market participants, Amazon VPC peering and shared CPGs provides a materially beneficial access tier for their largest market-making customers.
If you are interested in learning more about One Trading’s new F.A.S.T. exchange product, please contact One Trading. If you want to run similar performance tests by using or modifying the code created for this HFT client, please contact an AWS Representative.
Dr. Stefan Blackwood
An accomplished algorithm developer and Maths PhD known for consistently surpassing leading applications through custom solutions. As an entrepreneurial tech enthusiast, he thrives on innovation, problem-solving and paving the untrodden path in the ever-evolving world of technology.
Atiek Arian
Atiek is a Senior Manager in Solutions Architecture within Global Financial Services at AWS. He has over 20 years of experience architecting and managing network, compute and storage solutions in the Financial Services industry.
Boris Litvin
Boris is a Principal Solution Architect at AWS. His job is Financial Services industry Innovation. Boris joined AWS from the industry, most recently Goldman Sachs, where he held a variety of Quantitative roles across Equity, FX, Interest Rates and was CEO and Founder of a Quantitative Trading FinTech startup.
Hani Masri
Hani Masri is a Senior Solutions Architect within Global Financial Services at AWS. He supports Financial Services customers in their journey to cloud migration and digital transformation. Hani is passionate about data analytics and has been working in the industry for 10+ years.
Sercan Karaoglu
Sercan Karaoglu is Senior Solutions Architect, specialized in capital markets. He is a former data engineer and passionate about quantitative investment research.