Understanding the 4 Pillars of HTTP/2

Persistent sessions, multiplexed streams, header compression, and prioritization

by Joe Honton

In this episode Ernesto discovers how HTTP/2 reduces the latency of request/response round-trips and increases overall throughput.

Tangled Web Services has just finished decommissioning all of its HTTP/1.1 servers. Everything is now up and running on fast HTTP/2 servers.

Devin and Ken have been running benchmark tests using the h2load test harness, in order to measure HTTP/2 throughput rates. They've reported their findings in other blog posts. Overall, the results were impressive.

"But what makes it so much faster?" Ernesto asked, to no one in particular. He had a few minutes to spare before the morning scrum, so he began reading the IETF docs. (He's that kind of nerd.) It wasn't light reading.

During stand-up, he mentioned to Devin and Ken that he was going to look into speculative push. "I want to see if push can improve our web page loading times," he announced.

"Cool!" said Devin.

"There be dragons!" Ken smirked. (It was September 19th after all.)


The next morning, Ernesto shared what he had learned so far. He launched into a tech soliloquy —

"Here's what makes HTTP/2 so fast.

  • persistent sessions
  • multiplexed streams
  • header compression, and
  • prioritization

"Together, these four features reduce the latency of request/response round-trips.

"Persistent sessions allow the TCP connection between the client and server to remain open even when there is no activity on the wire. This means that the overhead of establishing the connection only needs to be borne by the first request. Subsequent requests do not need to issue a DNS query to resolve the hostname — somewhere around 30 ms savings. They also do not need to go through the TLS key exchange protocol — somewhere around 70 ms savings."

At this point Devin interrupted him. "Hmm. I thought persistent sessions were already available in HTTP/1.1 through its use of keep-alive headers."

"That's true," Ernesto continued, "and it's proven itself to be beneficial. HTTP/2, on the other hand, is persistent by design, removing the need for the extra keep-alive headers."

"OK, got it," said Ken, gesturing with his hands to have him pick up the pace.

Ernesto continued. "Multiplexed streams allow multiple requests to be issued without waiting for their full response. This eliminates the head-of-line (HOL) bottleneck that has been the bane of HTTP/1.1 and which has been the root of so many cherished hacks."

He became a bit more animated at this point. "Guess why we use inline data URLs! Guess why so much effort has been wasted on image sprites! Guess why Browserify and Webpack have such elaborate bundling strategies! Yup. The HOL bottleneck. And every one of those hacks breaks the best performance booster of them all: caching. Need to change one image, or one JavaScript module, or one CSS component — caching goes out the window and the whole bundle needs to be retrieved."

"Multiplexed streams are brilliant," Devin chimed in. "Sounds like it has the potential to change the way we think about web packaging and delivery."

"For sure," said Ernesto, before launching into full lecture mode. "Header compression reduces communication overhead in three ways: by indexing header names, by reusing unchanged header values, and by packing the exchange into a binary format.

"Indexed header names means that frequently used header names can be represented as a small number, instead of a long string. For example, cookie simply becomes the index number 32.

"Reusing unchanged header values is an optimization where header values are added to a dynamic table during the first request, and referenced by index value when a second request reuses that same value. So for example, if a cookie header value of AB12CD34EF5600AABBCC is sent along with 100 resource requests, only the first request incurs the 20-byte overhead. The other 99 requests simply pass the dynamic table's index number that was established for it.

"Compression using a binary format means that the number 32 can be compressed by stuffing it into just 7 bits and the 20 byte cookie AB12CD34EF5600AABBCC can be squeezed into just 16 bytes. It's a Huffman encoding designed specifically for HTTP headers, where every little bit counts."

"TMI," Ken mumbled, starting to get impatient.

"Finally, prioritization is the scheme which allows the scheduling of responses to be fine tuned. Given that requests are no longer blocked by the head-of-line bottleneck, there can be many requests in-flight simultaneously.

"In order to schedule which of these resources needs to be transmitted earlier, and which can be delayed, HTTP/2 allows them to be prioritized using a weighting mechanism. Each response can be assigned a weight, from 1 (low priority) to 256 (high priority), allowing the order of transmission to be separate and distinct from the sequential order defined by the source document.

"So," asked Devin, "prioritization could allow style sheets and web fonts to be sent early enough to prevent the flash of unstyled text (FOUT)."

"Exactly," Ernesto went on, "and prioritization might also allow scripts to be sent early enough to prevent the user from experiencing 'stalled' applications . . . you know, where the page loads and looks nice, but doesn't respond to user actions because its JavaScript is still in transmission.

"To sum it all up — persistent sessions, multiplexed streams, and header compression all add up to faster page loads. And prioritization fine tunes the transmission schedule to meet website-specific requirements."

"Cool!" Devin was impressed. He liked this kind of detail.

"But what about speculative push?" asked Ken, "Wasn't that what you were going to look into?"

"Aah yes," Ernesto paused, "that's a whole nother thing." He was stalling for time, "I'm going to work on that today."

Ernesto wasn't out of the woods yet. But at least he was on firm footing.


No minifig characters were harmed in the production of this Tangled Web Services episode.

Follow the adventures of Antoní, Bjørne, Clarissa, Devin, Ernesto, Ivana, Ken and the gang as Tangled Web Services boldly goes where tech has gone before.

Understanding the 4 Pillars of HTTP/2 — Persistent sessions, multiplexed streams, header compression, and prioritization

🔗 🔎