Understanding the 4 Pillars of HTTP/2
Persistent sessions, multiplexed streams, header compression, and prioritizationby Joe Honton
In this episode Ernesto discovers how HTTP/2 reduces the latency of request/response round-trips and increases overall throughput.
Tangled Web Services has just finished decommissioning all of its HTTP/1.1 servers. Everything is now up and running on fast HTTP/2 servers.
Devin and Ken have been running benchmark tests using the
h2load test harness, in order to measure HTTP/2 throughput rates. They've reported their findings in other blog posts. Overall, the results were impressive.
"But what makes it so much faster?" Ernesto asked, to no one in particular. He had a few minutes to spare before the morning scrum, so he began reading the IETF docs. (He's that kind of nerd.) It wasn't light reading.
During stand-up, he mentioned to Devin and Ken that he was going to look into speculative push. "I want to see if push can improve our web page loading times," he announced.
"Cool!" said Devin.
"There be dragons!" Ken smirked. (It was September 19th after all.)
The next morning, Ernesto shared what he had learned so far. He launched into a tech soliloquy —
"Here's what makes HTTP/2 so fast.
- persistent sessions
- multiplexed streams
- header compression, and
"Together, these four features reduce the latency of request/response round-trips.
"Persistent sessions allow the TCP connection between the client and server to remain open even when there is no activity on the wire. This means that the overhead of establishing the connection only needs to be borne by the first request. Subsequent requests do not need to issue a DNS query to resolve the hostname — somewhere around 30 ms savings. They also do not need to go through the TLS key exchange protocol — somewhere around 70 ms savings."
At this point Devin interrupted him. "Hmm. I thought persistent sessions were already available in HTTP/1.1 through its use of
"That's true," Ernesto continued, "and it's proven itself to be beneficial. HTTP/2, on the other hand, is persistent by design, removing the need for the extra
"OK, got it," said Ken, gesturing with his hands to have him pick up the pace.
Ernesto continued. "Multiplexed streams allow multiple requests to be issued without waiting for their full response. This eliminates the head-of-line (HOL) bottleneck that has been the bane of HTTP/1.1 and which has been the root of so many cherished hacks."
"Multiplexed streams are brilliant," Devin chimed in. "Sounds like it has the potential to change the way we think about web packaging and delivery."
"For sure," said Ernesto, before launching into full lecture mode. "Header compression reduces communication overhead in three ways: by indexing header names, by reusing unchanged header values, and by packing the exchange into a binary format.
"Indexed header names means that frequently used header names can be represented as a small number, instead of a long string. For example,
cookie simply becomes the index number
"Reusing unchanged header values is an optimization where header values are added to a dynamic table during the first request, and referenced by index value when a second request reuses that same value. So for example, if a
cookie header value of
AB12CD34EF5600AABBCC is sent along with 100 resource requests, only the first request incurs the 20-byte overhead. The other 99 requests simply pass the dynamic table's index number that was established for it.
"Compression using a binary format means that the number
32 can be compressed by stuffing it into just 7 bits and the 20 byte cookie
AB12CD34EF5600AABBCC can be squeezed into just 16 bytes. It's a Huffman encoding designed specifically for HTTP headers, where every little bit counts."
"TMI," Ken mumbled, starting to get impatient.
"Finally, prioritization is the scheme which allows the scheduling of responses to be fine tuned. Given that requests are no longer blocked by the head-of-line bottleneck, there can be many requests in-flight simultaneously.
"In order to schedule which of these resources needs to be transmitted earlier, and which can be delayed, HTTP/2 allows them to be prioritized using a weighting mechanism. Each response can be assigned a weight, from 1 (low priority) to 256 (high priority), allowing the order of transmission to be separate and distinct from the sequential order defined by the source document.
"So," asked Devin, "prioritization could allow style sheets and web fonts to be sent early enough to prevent the flash of unstyled text (FOUT)."
"To sum it all up — persistent sessions, multiplexed streams, and header compression all add up to faster page loads. And prioritization fine tunes the transmission schedule to meet website-specific requirements."
"Cool!" Devin was impressed. He liked this kind of detail.
"But what about speculative push?" asked Ken, "Wasn't that what you were going to look into?"
"Aah yes," Ernesto paused, "that's a whole nother thing." He was stalling for time, "I'm going to work on that today."
Ernesto wasn't out of the woods yet. But at least he was on firm footing.
No minifig characters were harmed in the production of this Tangled Web Services episode.
Follow the adventures of Antoní, Bjørne, Clarissa, Devin, Ernesto, Ivana, Ken and the gang as Tangled Web Services boldly goes where tech has gone before.