Circumventing the GFW with TLS Record Fragmentation

How Fragmentation Can Be Extended to the TLS Layer

TCP fragmentation has long been known as a viable deep packet inspection (DPI) circumvention technique. However, censors are increasingly aware of this technique. We propose TLS record fragmentation as a new censorship circumvention technique on the TLS layer that functions analogously to TCP fragmentation. Using TLS record fragmentation, we successfully circumvented the DPI of the Great Firewall of China (GFW). We also found that over 90% of TLS servers support this new circumvention technique. To contextualize TLS record fragmentation for future work, we discuss its possibilities and limitations.

0x1703030005436972637517030300056d76656e74170303
  0005696e67207417030300056865204746170303000157

Introduction

In this section, we provide background information on TLS censorship and fragmentation before showing the viability of TLS record fragmentation in later sections.

TLS (Censorship)

The TLS protocol provides confidentiality, authenticity, and integrity to internet traffic in a client‑server setting . While TLS can encrypt arbitrary application data, it is prominently used to encrypt HTTP connections. Google Chrome reports that Chrome serves around 93% of its connections over HTTPS (HTTP+TLS) . The encryption of HTTP makes HTTPS connections resilient to censors’ analyses of fields such as the HTTP Host header. However, before TLS can transmit encrypted application data it performs a so-called handshake. This unencrypted handshake contains the Server Name Indication (SNI), which mirrors the content of the HTTP Host header. The handshake is depicted below.

A TLS 1.2 handshake with an HTTP GET request. (Dark mode image - for description see light mode image)

A TLS 1.2 handshake. Unencrypted messages are marked in blue while encrypted messages are marked in yellow. The SNI extension is visible in the unencrypted ClientHello message while the Host header of the HTTP GET request is encrypted.

Censors around the globe utilize the SNI extension to facilitate the censorship of HTTPS connections . As a countermeasure, the IETF has proposed ESNI and ECH . Both encrypt the SNI extension in the ClientHello message. Unfortunately, the standard is still in the drafting phase, and its adoption is far from widespread . The only website for which we could find a valid ECH configuration is Cloudflare’s designated testing server. The slow adoption of ECH necessitates intermediate solutions for SNI censorship circumvention. One such solution is the fragmentation of TLS messages across multiple TCP fragments, known as TCP fragmentation.

TCP Fragmentation

TCP is a stream-based protocol over which users and applications can send data using abstract data streams. These data streams are translated by TCP into actual network packets called TCP segments. Each TCP segment can contain either complete application messages or only parts of it. The latter is called TCP fragmentation and is depicted below with an HTTP GET message.

TCP fragmented HTTP GET request. (Dark mode image - for description see light mode image)

The left side contains an unfragmented HTTP GET request. The same request is depicted in two TCP segments on the right side. Censors that want to extract the hostname of the website from the fragmented HTTP GET request have to concatenate both fragments.

Interestingly, TCP fragmentation can be used in censorship circumvention as it aggravates the complexity of traffic analysis. In the above example, a censor has to concatenate both TCP fragments to correctly identify the destination of the GET request. This effectively forces the censor to maintain a state and allocate costly memory for every connection it analyzes. The costs of analyzing TCP fragmentation caused many censors to ignore it in the past . As it proved successful, TCP fragmentation was implemented in various censorship circumvention tools . Recently, though, China’s censor has become more sophisticated and begun handling TCP fragmentation .

TLS Record Fragmentation

While TLS messages can be fragmented over multiple TCP segments, they can also be fragmented on the TLS layer alone. This is possible because the TLS layer consists of two different layers: the TLS message layer and the TLS record layer. On the TLS record layer, every TLS message is wrapped in a TLS record structure. Most importantly, a single TLS message can be split across multiple TLS records, resulting in TLS record fragmentation. This is depicted in the figure below.

TLS Record fragmented ClientHello message. TLS Record fragmented ClientHello message.

The left side depicts a TLS ClientHello message in a complete TLS record and TCP segment. A TLS record fragmented ClientHello message is depicted on the right. Both TLS records are contained in the same TCP segment. A censor that wants to analyze the SNI extension of the fragmented TLS message has to concatenate both TLS records.

In this example, the SNI extension is split across different TLS records. Similar to TCP fragmentation, this forces the censor to maintain a state and allocate memory for potential reassembly. To the best of our knowledge, TLS record fragmentation has been proposed for censorship circumvention only by Thomas Pornin since 2014. We are not aware of any analyses or implementations of TLS record fragmentation as a censorship circumvention technique. In this blog post, we bridge this gulf and effectively rediscover TLS record fragmentation as a viable censorship circumvention technique.

Contributions

Our primary contribution is circumventing China’s censor—The Great Firewall of China (GFW)—with TLS record fragmentation. To infer the feasibility of TLS record fragmentation on the internet, we also measured its support by TLS servers.

Proof Of Concept

As mentioned, we circumvented the GFW with TLS record fragmentation. To this end, we implemented a DPYProxy: a simple Python proxy that applies TLS record fragmentation to all handshake messages passing through it. Next to TLS record fragmentation, DPYProxy supports TCP fragmentation both standalone and in combination with TLS record fragmentation. The proxy runs locally and can be set as an HTTP(S) proxy in browsers like Firefox or Chrome. Any previous HTTP(S) proxy—needed for IP censorship circumvention—can be provided to DPYProxy, which routes traffic through it as well. The figure below visualizes both our setup and the behavior of the GFW.

Setup and censor handling of two test vectors. Setup and censor handling of two test vectors.
This figure depicts the setup of our scans for two test vectors. We can see that the GFW intercepts unfragmented TLS ClientHello messages. It ignores TLS record fragmented TLS ClientHello messages. We omitted HTTP CONNECT messages sent to DPYProxy and the HTTP Proxy for improved readability.

We set up DPYProxy on a vantage point in China (AS4837) and let it connect to an HTTP proxy in the DFN. From there, we queried https://wikipedia.org/wiki/turtle using curlcurl -Ls --proxy 127.0.0.1:4433 https://wikipedia.org/wiki/turtle with different settings of our DPYProxy. Specifically, we ran DPYProxy with any combination of TCP and TLS record fragmentation enabled. When combining TCP and TLS record fragmentation, we fit one TLS record into exactly one TCP segment. In all tests, we fragmented the ClientHello message before and after the SNI extension. We refer to this as “Early Split” and “Late Split” in the table of results below.

Fragmentation Split Circumvents Censor
None - -
TCP Early Yes
Late -
TLS Early Yes
Late Yes
TLS+TCP Early Yes
Late Yes


Our results lead to a few interesting conclusions. First, we could verify that TCP fragmentation can still circumvent the GFW. Specifically, the GFW only censored our connection attempts when the SNI extension was present in the first TCP segment. Here, we encountered both the primary and secondary censors of the GFW detected by Bock et al. . Both censors are circumventable reliably with TLS record fragmentation; it suffices to place any byte of the ClientHello message into a different TLS record. For that, multiple TLS records can be either contained in a single TCP segment or split across multiple TCP segments. Overall, we detect that the GFW handles TCP fragmentation partially but is overchallenged with any kind of TLS record fragmentation.

TLS Server Support

To assess the usability of TLS record fragmentation, we also measured TLS servers’ support for it. To this end, we analyzed the domains of the Tranco Top 1M list and all https:// domains from the global list of censored domains by the CitizenLab. We provide the per-server results of our analysis on GitHub. Below, we summarize our results.

List Scanned
Domains
a
Support TLS
record fragmentation
CitizenLab 1 135 1 092 (96.21%)
Tranco Top 1M 830 357 766 909 (92.36%)


  1. We excluded domains that are not resolvable, do not handshake TLS, or requested exclusion from our scans in a previous scan.[↩︎]

We found that slightly over 96% of domains from the CitizenLab list support TLS record fragmentation. In comparison, the domains from the Tranco Top 1M list support TLS record fragmentation with a slightly smaller share of over 92%. Interestingly, TLS record fragmentation enjoys widespread support across all ranks of the Tranco Top 1M list as can be seen below.

TLS server support for TLS record fragmentation by Tranco rank.

Overall, we determined that TLS record fragmentation is largely supported by TLS servers as of today. This holds for the top TLS servers on the internet as well as censored domains.

Discussion

The GFW is the most sophisticated censor in the world and often also the first censor to implement new protocol analyses. As even the GFW does not analyze TLS messages that are fragmented over multiple TLS records, we believe that TLS record fragmentation circumvents other censors as well. To ascertain the viability of TLS record fragmentation around the world, we endorse an analysis in other countries.

How Can You Manipulate the TLS Handshake with a Proxy?

One might think that it’s impossible to manipulate TLS traffic as a MitM/proxy server. Fundamentally, that is correct. The TLS protocol authenticates TLS handshake messages with the Finished message. However, TLS does not authenticate the encompassing TLS record headers. These are only authenticated for encrypted handshake messages and application data. As we only manipulated the TLS record headers of unencrypted handshake messages we did not break the TLS handshake in our analyses. Any manipulation of other parts of the handshake such as the SNI extension would indeed break authentication.

As another nitty-gritty detail: The addition of implicit sequence numbers with the addition of additional records does not break the following authentication of data. Sequence numbers are reset before encryption starts.

How Long Will TLS Record Fragmentation Stay Viable?

Currently, we are not sure why TLS record fragmentation works so well on the GFW. We suggest that the GFW is currently only able to hold state on the TCP layer but not in its DPI of the TLS layer. If that is the case, we conjecture the GFW and other censors to require some time until they can reassemble TLS records as well. We are even more positive about the viability of circumvention techniques that combine alterations on the TLS and TCP layer. For instance, one could fragment a TLS handshake message on the TLS and TCP layer, send these segments out-of-order, and inject TLS or TCP packets with a low TTL in between. In the end, we cannot definitely answer how long TLS record fragmentation will work on the GFW. We still conjecture it to be viable for a non-negligible amount of time, especially as a building block for more sophisticated circumvention techniques.

Can’t the GFW Block TLS Record Fragmentation Completely?

Yes, and no. The GFW could completely block all fragmented TLS messages. Doing so risks blocking all connections that exhibit naturally occurring TLS record fragmentation. TLS record fragmentation can occur naturally when the size of a TLS message (2^24 bytes max) exceeds the maximum size of a TLS record (2^16 bytes). Additionally, the maximum size of TLS records can be lowered with TLS extensions . To minimize the viability of a complete block of TLS record fragmentation, we encourage browser vendors and other TLS clients to incorporate fragmented TLS records in their connection attempts. This might also convince the remaining server owners to start supporting TLS record fragmentation, improving the interoperability of the TLS landscape as a whole.

I Want to Add TLS Record Fragmentation to my DPI Circumvention Tool. What Do I Have to Consider?

As an application layer protocol, the TLS layer can be manipulated without root privileges on the operating system. This makes it possible for TLS clients such as custom browsers to enforce TLS fragmentation from user space. As TLS record headers can be manipulated as a MitM, it is also possible to implement TLS record fragmentation into DPI-circumventing proxies. DPYProxy does exactly that. Limitations exist for tools such as GoodByeDpi that manipulate TCP packets. For each newly added record, five bytes are inserted into the TCP stream. This leads to a mismatch in TCP sequence numbers between the client and server application. While the TCP sequence numbers can be changed accordingly this has to be done for all subsequent messages in the handshake. Ironically, this forces the circumvention tool to maintain a TCP connection state.

Conclusion

In this blog post, we extended fragmentation-based censorship circumvention to the TLS layer. We hope to aid both researchers and people affected by censorship with an additional tool in the ongoing struggle against internet censorship. The code of our TLS record fragmentation proxy is accessible on GitHub. Feel free to get in touch for discussions and follow-up work at niklas.niere@upb.de.

Citation

@online{circumventing-the-gfw-with-tls-record-fragmentation,
  author = {Niere, Niklas},
  title = {Circumventing the GFW with TLS Record Fragmentation},
  year = 2023,
  url = {https://upb-syssec.github.io/blog/2023/record-fragmentation/},
  urldate = {2024-04-24}
}