Speculative Push with HTTP/2
No hype, just the gory detailsby Joe Honton
In this episode Bjørne traces the path of a web page request and its dependencies when HTTP/2 speculative push is enabled. Caution: this article may cause premature aging, white hair, and memory loss.
Full disclosure: Joe Honton is the founder of Read Write Tools and the author of Read Write Serve, the HTTP/2 server used in this article to explain the inner workings of speculative push.
Devin and Ken have been charting a course forward with HTTP/2, performing benchmark tests to see how fast they could make things go on cheap commodity servers. Only one thing was still eluding them: speculative push (SP) technology.
Ernesto was a bit envious of the impressive results being reported by Devin and Ken. He decided to enter the fray. But he knew that implementing speculative push would not be easy — others had been there and gotten mired in gory implementation details.
So he started off slowly, studying in detail exactly how HTTP/2 is optimized to provide faster page loads. He shared his findings in Discover the 4 pillars of HTTP/2. His final summary: persistent sessions, multiplexed streams, and header compression contribute to faster page loads, while prioritization fine tunes the transmission schedule to meet website-specific requirements.
But speculative push was still a big gap in his knowledge base. Here's what he knew for sure:
Most of the time, in a client/server architecture, the client is responsible for initiating a request, and the server is responsible for fulfilling the request. This is a pull request. But with HTTP/2, once a request has been initiated by the client, the server can take matters into its own hands and initiate the transfer of additional resources without waiting for the client's formal request. This is a push request.
A push request still adheres to the full HTTP request/response protocol with its familiar set of headers for content negotiation, content encoding, caching, etc. But a server both initiates and fulfills a push request, transmitting the response to the client over the persistent session that has already been established.
It does this by opening a new stream, which is interleaved with the existing source document stream. A single session can have 100 or more concurrent streams (with the actual limit being negotiated by the two parties). This concurrency means that an entire web page and all its dependent resources can be on the wire simultaneously.
Of course the public network doesn't have unlimited bandwidth capacity, so the throughput for all those resources has a finite limit. Concurrency simply allows for fewer wasted frames. Keeping the network saturated to full capacity means that that finite limit can be reached, and maximum server efficiency achieved. In one sense, speculative push is all about keeping the pipeline full.
So that's the basic concept.
But beyond this, Ernesto was getting all bogged down with sessions, streams, frames, and the TCP protocol they were all running over. He decided to reach out to anyone who had been there and had made it work.
He tweeted out a plea for help. "Anyone had success with #HTTP2 speculative push? I mean real success. Not just hello world."
Then he remembered the name of someone from last year's bootcamp and DM'd him. "Hey Bjørne, did I remember correctly that you've actually gotten HTTP/2 push working?"
Bjørne's answer came quickly. "Yes. Check out the pages on https://ecmascript.engineer and you can see it in action. Just open up the browser inspector and check the network tab. You'll see 'Push' next to all the resources that the server sent unsolicited."
"Nice!" replied Ernesto, "Was it hard to get it working?"
"Initially, yes. But not anymore," answered Bjørne. "Now it's simply a matter of turning it on in the RWSERVE configuration. Here's what my config looks like:"
push `*.css` *weight=64
push `*.js` *weight=128
push `*.woff2` *weight=32
Ernesto mocked it up on a his test server. Lo and behold, it worked for him too! Simple.
But he was puzzled. So he pestered Bjørne again. "How come I didn't need to use any
link rel='preload' statements? And what prevents the browser from separately requesting the same resources? And what's the purpose of the
rw-pushtag cookie? And how . . ."
"Whoa," Bjørne stopped him midway through, "slow down pardner! I've been meaning to write up what I've learned. I suppose now is as good a time as any."
HTTP/2 PUSH, in a nutshell
Later that day, Ernesto received a long post from Bjørne, explaining exactly what was going on. Here's Bjørne's explanation in a nutshell:
The server's work progresses in three stages:
- The automatic preload detection (APD) algorithm determines which resources are candidates for PUSH.
- The server opens concurrent streams, one for each resource candidate that is approved, immediately sending bytes to the browser without waiting for a response.
- The server calculates a pushtag for each PUSHed resource, uploading it as a cookie for the browser's safekeeping.
The browser's complementary work occurs in three separate stages:
- Provisionally PUSHed resources are saved to the browser's push-cache.
- The browser builds a resource manifest (of all the resources it needs to fully render the page) from the original document's DOM.
- The browser claims any resource that is available in the push-cache, or fetches any resource that is not available using its traditional request/response algorithm.
Automatic Preload Detection
The RWSERVE HTTP/2 Server has automatic preload detection (APD) for HTML documents encoded with BLUEPHRASE notation. That means you don't have to comb through your entire website to find which resources to target for SP. The server does it for you, just-in-time. And it caches the results.
Here's how APD works. When a document is requested the first time, the server parses and examines it for elements that reference resources using a
href attribute. That includes these HTML tags:
Each of the external resources referenced in those tags is examined to see if it is a local resource or an external resource. The external resource references are ignored, but the local references are captured and placed in a resource map associated with the source document. The resource map is then saved to a private server cache for use by all subsequent requests for the source document, regardless of who issues the request.
To get things going, the APD algorithm examines the resource map to determine which resources are candidates for SP. Then it looks at the server's rules, as configured by the webmaster, to filter the candidates. For example, take a look at the configuration snippet I sent to you, and look at its
With that setup, the filtering step will retain any candidates that it found for style sheets, scripts and fonts, but it will discard any candidates that it found for images, audios and videos, because there are no configuration rules defined that match files with an extension of
Part of the filtering process is to remove duplicates. This sometimes occurs with images that appear in more than one place in the document — there's no point in pushing the same resource twice.
Next, the APD algorithm will order the candidates by weight, with larger weights coming first. So, even though CSS resources were found in
<link> tags located in the document's
(weight=128), style sheets next
(weight=64), and fonts last
Now comes the clever part. There's no point in pushing a resource that the client already has. So the APD algorithm computes a pushtag for each candidate, and compares it to all of the pushtags that have already been sent to the current client. If there's a match, the candidate is discarded. Pushtags are like Etags: they uniquely identify a resource and its version.
The Big Push
With all of the preliminaries out of the way, the server is ready to go. Note that all of this will be happening prior to sending the originally requested source document down the wire.
Using the currently opened stream as a base, the server creates a series of push streams, one for each approved candidate, in descending weighted order. Each push stream's
priority flag is set using the weight configured by the webmaster.
Appropriate HTTP response headers are generated for each approved candidate, including
Now, in sequence, each resource file is read from the server's file system, and together with the generated headers, are sent over the associated push stream. The server does not wait for the client to respond, because it doesn't. It simply starts pushing one resource after another until everything is on the wire.
While all of this is happening, the original source document is still pending. This is important. All of the resources that are to be pushed must be in transit before the source document is sent. This prevents the browser from prematurely parsing the document and requesting resources that the server still intends to push.
One more final housekeeping chore still needs to be taken care of. Keeping track of which resources have been pushed. The server creates a cookie value as an encoded string of pushtags, and adds a
set-cookie header to the source document's pending response. By doing this, the browser keeps track of which resources it has received via speculative push. In this way, on subsequent requests for other web pages, the browser can inform the server of which resources to ignore. The server examines the incoming cookie, and the APD algorithm discards push candidates that it has already pushed on a prior request.
Finally, and only after all the push streams have initiated their responses, the server sends the source document's response, with the pushtag cookie and the file's contents.
A few caveats:
- Only resources that are actually used by the source document are of any value. All others are discarded and don't make it into the browser's cache. So pushing resources in anticipation of using them on future pages won't work.
- Only resources from the same domain can be pushed. So, for example, you can't optimize links to Google Analytics or Twitter feeds or other third-party mashups.
- Finally, only push a resource once. If you accidentally push the same resource twice, the browser will reset the stream and throw away whatever it has so far.
The Browser's Role in All of This
All of this server processing occurs within the initial request from the browser. No additional back and forth occurs. One request / multiple responses. Here's what it looks like from the browser's perspective:
A user requests a source document, using a standard
GET method, without any special headers. The
rw-pushtag cookie, if present, is sent along with the initial source document request.
While awaiting the server's response for the source document itself, the browser begins to receive data over the incoming push streams opened by the server. Those data streams are provisionally placed in the "unclaimed push streams container" a.k.a. the push-cache.
Once all of the push streams have been fulfilled and closed, the source document data, that has been pending on stream 0, is finalized.
At this point the browser, which may have already begun tokenizing the source document while the push streams are arriving, finishes its HTML tokenization. It is now ready to build a resource manifest, from the HTML DOM tree, of all the document resources that are needed by the page.
For each reference in the resource manifest, the browser checks the push-cache to see there is a provisional resource with the same name and the same content-type. If there is, it is "claimed".
When a document resource is not found in the push-cache, the browser follows its standard fulfillment rules: checking its persistent http-cache and, if found but expired, issuing a conditional request or, if missing, issuing an unconditional request from the server.
On the other hand, for each claimed resource, the browser creates a reference to it in the browser's memory-cache. It then examines the
etag headers sent along with the pushed resource, and copies the resource to the persistent http-cache when those instructions say to do so.
Once all of the document resources listed in the manifest have been examined, and either claimed or fulfilled, control is passed to the browser's renderer. At this point the push-cache has completed its task and is no longer needed. The browser discards any unclaimed resources that the server might have erroneously pushed.
The only other housekeeping chore is for the browser to honor the
set-cookie: pushtag header sent along with the server's response to the original document. It is processed following the standard browser behavior for cookies, and sent along with all future requests to the same domain, expiring according to standard cookie expiration rules.
If the server's APD algorithm has correctly identified all of the resources needed by the source document, then all of this processing will have occurred within a single request/response cycle. If anything goes haywire with that request, the browser will fall back to it's standard fetching and rendering process.
Wrapping it up
Ernesto was blown away by the write-up. "Wow. This is way more than I expected. Thanks, Bjørne!"
"NP", Bjørne shot back.
"BTW, How good is the APD algorithm? It looks like it doesn't pick up CSS-defined background images or @font-face rules."
"That's correct," Bjørne replied. "But if you add preload statements to your document's head, you can get those pushed as well. Something like this:
<link href='image/hero.png' rel='preload' as='image' />
<link href='fonts/neue.woff2' rel='preload' as='font' crossorigin />
"¡Muchas gracias amigo!"
"Vær så god."
Ernesto was starting to feel a little more confident about HTTP/2 speculative push. Still, he wanted to measure things to be sure it was the right thing to do.
No minifig characters were harmed in the production of this Tangled Web Services episode.
Follow the adventures of Antoní, Bjørne, Clarissa, Devin, Ernesto, Ivana, Ken and the gang as Tangled Web Services boldly goes where tech has gone before.