How the ALPN Protocol Works and Why You Need It

Old school HTTP is the web crawler industry's dirty little secret

by Joe Honton Sep 1, 2019

In this episode Ken announces full support for HTTP/2 with automatic HTTP/1.1 fallback.

Everyone knew that Ken was all fired up about something when he finally walked into the office at 11:30 a.m. But since he had missed the daily scrum (again!), he didn't have the built-in audience he so much craved.

He sauntered over to the kitchen's espresso machine, but nobody was around . . . it seems they had all gone to Starbucks instead . . . then back to the whiteboard, his favorite place for prognostications . . . nobody there either.

Finally he spied Devin, in a corner pocket, staring deeply into his laptop.

"What are you doing over here?"

"Hey, Ken," Devin barely looked up, "Antoní's been amping up the volume all week. I've got to get this Docker file finished by the end of the day."

Devin was emanating obvious do not disturb signals, but Ken ignored them and plowed ahead anyway. "So I got ALPN up and running on our servers last night."

"Cool." That was Devin's answer to everything these days. "So that's why you missed scrum, and I got stuck with this?"

Ken knew that Devin got irritable when the going got tough, but he wasn't going to be deflated by off-hand remarks.

"Do you know what ALPN is?"

"Uhmm, that would be Application Layer Protocol Negotiation," Devin rattled off, "but I'm sure you're going to tell me all about it."

"Amazing! You're the master of acronyms around here.

"So ALPN is part of the TLS handshake that occurs when a browser first connects to a server. The browser advertises what protocols it's ready to accept, and the server responds with the protocol it prefers. That, in a nutshell, is the negotiation. Once the handshake is complete and the peers have agreed on other things like cipher suites and cryptographic keys, the classic HTTP request/response cycle can begin.

"The important part is the order of precedence. If the user-agent — that is, the browser, crawler, or postman tool — says that it wants to use HTTP/2, and if the server can handle it, then they'll use HTTP/2.

"And if the user-agent advertises that it is ready to use either HTTP/2 or HTTP/1.1, then the server can choose whichever it prefers. For us, since we deploy our websites with the rwserve HTTP/2 Server, our negotiated response will obviously be HTTP/2.

"But the tricky part is when the user-agent only advertises that it's able to use HTTP/1.1. In that situation, an HTTP/2-only server, has to respond with status code 505 'version not supported'. That's the situation we were in until yesterday."

Ken was following the gist of it up to this point, so he engaged. "But I thought all of the major browsers have had HTTP/2 support for a while now. Why would we ever need to support HTTP/1.1? Shouldn't we be moving the needle forward?"

"You would think so, right? All the desktop browsers, Chrome, Firefox, IE, Edge, Safari, Opera — and most of their cell-phone cousins — have had support for HTTP/2 for a while now.

"But there's a dirty little industry secret that no one's talking about. Web crawlers! It turns out that most crawlers are still using HTTP/1.1, the most notable one being googlebot, which is surprising to me since Google was the brains behind HTTP/2 in the first place.

"So what gives? Well, recall that one of the big advantages of HTTP/2 is persistent connections. It's a real keeper for a browser that accesses documents, and all their associated resources, in one short burst. But crawlers use a very different access strategy. They tend to access one resource at a time, with long delays between each request. Their strategy is based on the premise that web administrators would ban them if they gobbled up all the server's resources in quick succession. So as a courtesy, crawlers throttle their requests, and that makes HTTP/2 much less advantageous.

And with that burst of enthusiasm spent, Ken moved on. "So, who's ready for a pint of Rock City Brewing's finest IPA?" (It had barely been 20 minutes since he arrived, but now was not the time to get started on messy new problems.)

Ken arrived at RCB first and got started, while Devin lagged behind for a few extra minutes.

When he arrived, Devin began, "I took a quick look at the rwserve online docs before coming over and I didn't see any configuration settings for ALPN. Was it hard to get it working?"

"Nope. There's nothing to configure, it's all builtin and automatic. I just had to download the latest version and goose the server."

"Goose the server?"

"You know:" systemctl restart rwserve.

"Where do you come up with these sayings!" Then, after a pause, "But I thought you were up late last night configuring it?"

Ken faltered for a moment. He knew he was busted. Then, recovering, "Oh, I was working on a Kubernetes script."

No minifig characters were harmed in the production of this Tangled Web Services episode.

Follow the adventures of Antoní, Bjørne, Clarissa, Devin, Ernesto, Ivana, Ken and the gang as Tangled Web Services boldly goes where tech has gone before.

FOR (I=1951; I<=2020; I++)

The History of Computer Programming FOR Loops

Everything is Foo Bar

A Silicon Valley guide to everything

Smart tech

READ WRITE TOOLS

Distraction-free writing

RWDOC

Rediscover HTML

BLUE PHRASE

Served with pure HTTP/2

RWSERVE