This post was originally publishing on the Washington Post’s Developer Blog.
In June of this year, the Post announced that we were switching our site to be HTTPS by default, starting with the homepage, the National Security section and the technology blog, The Switch. Since then, we’ve moved over all remaining sections and parts of our site. Today, more than 99% of our traffic is redirected to HTTPS.
The launch was the culmination of about 10 months of efforts, touching not just every technical team in our engineering department but all divisions of the Post.
The challenges we faced in this migration weren’t strictly technical in nature. We knew exactly what we had to do at each step but we faced challenges in the following areas: infrastructure, advertising, and newsroom.
Our users never actually reach the servers that render our website. Instead, they reach us through a Content Delivery Network, or CDN.
Our previous CDN wanted a significant amount of money to upgrade to a version of their product that allowed us to use HTTPS. This was in addition to the cost of using HTTPS for our site with their network. From the many other media organizations we’ve talked to, the situation has been similar—the CDN and related infrastructure is often the first hurdle that many encounter, and the list price for serving their website over HTTPS is sometimes staggering. While many wish they could move, because of the CDN’s unique position, and the years of logic that has often been baked into specific configurations and settings, there is a reluctance to make a switch for just HTTPS.
At the Post we’ve developed a great relationship with our CDN, Instart Logic, that has allowed us to not only enable HTTPS on our site but also use our own custom EV certificate1. We used Instart Logic to set up a long running “staging” version of our site, allowing us to test a website that was all HTTPS internally for months before our final roll out. We’ll talk more about how and why we used this strategy in subsequent posts, but suffice it to say our move to HTTPS wouldn’t have been as easy as it was from an infrastructure perspective if we had any other CDN vendor.
Thankfully, the situation in the CDN market is changing rapidly. CloudFlare famously gives its customers free HTTPS, and has been doing so since September of 20142. Akamai has partnered with Let’s Encrypt to make getting a certificate and enabling it on their network easier for their customers.
Ask any developer at a major media organization what the biggest hurdle to HTTPS adoption is, and the answer is always going to be advertising. However, unless you understand the ins-and-outs of how digital advertising is implemented, it’s difficult to see why this presents a challenge.
A portion of the advertising that appears on the Washington Post is sold by our own sales representatives, a portion comes from popular ad exchange networks, like Google’s AdX and PubMatic, and a portion are so called “house ads”, which are for our own products and services. The content that makes up these ads—often called “creative”—has to also be HTTPS compliant for our site not to throw mixed content warnings. This presents different challenges based on the source.
For advertising that we’ve sold, the creative often comes from a third party agency, and although we can test this creative, we have little control over what goes into it. Moreover, many organizations change creative frequently over the life of a particular campaign or ad buy. So creative that has been validated for HTTPS compliance before it runs, might start throwing mixed content warnings throughout a campaign.
For the ads that come from ad exchanges, what’s inside is even more of a black box—we have no idea what resources they will include, and no way of preventing specific resources. Rene Ritchie at iMore described the situation well in a recent post on ad blockers (another interesting subject, but not today’s topic):
We have no ability to screen ad exchange ads ahead of time; we get what they give us. We can and have set policies, for example, to disallow autoplay video or audio ads. But we get them anyway, even from Google. Whether advertisers make mistakes or try to sneak around the restrictions and don’t get caught, we can’t tell. It happens, though, all the time.
When bad ads appear, we report them and ask that they be disabled. Since different people in different geographies see different ads, it can be a challenge to identify them, and it can take a while to get them pulled. It’s a horrible process for everyone involved.
While we’ll go into more detail about how we planned for and measured the impact of advertising on HTTPS in a future post, we thought it would be good to discuss techniques we’ve used to help detect violations stemming from creative.
First, we have a series of Selenium tests that are constantly running through a set of the most popular URLs on our site (according to Chartbeat), and loading them to test for any mixed content warnings. If one is found, we send an alert to our engineering operations and advertising teams with the details of the mixed content warnings.
Second, we’ve just started using the
Content-Security-Policy-Report-Only header3, and sending the reports to Sentry, a great error tracking service that we use for a lot of our back-end apps. This approach allows us to get real reports of mixed content warnings for our users, which helps debug the wider variety of advertising that appears on the site.
While all major social sites like Facebook, Twitter, Instagram and YouTube are and have been producing HTTPS-compliant embed codes for a while, a large number of sites still aren’t, like MLB.tv and Comedy Central. Even for sites that are compatible, often HTTPS is not the default. Some rewriting of the embed code needs to be done in order to create a tag that won’t produce mixed content warnings. Finally, there are some sites that have what looks like an HTTPS compatible embed code but actually contains mixed content, like EllenTube4.
In general, we pride ourselves on the flexibility of our tools, which includes the ability to add arbitrary embed tags. In a way, the move to HTTPS violated that general principle, we had no choice but to start restricting the embeds that authors could use.
Thankfully, we were at an advantage when it came to combating embeds because of our site’s architecture. Rather than an integrated CMS solution, like WordPress, directly serving traffic, our site is rendered using a service oriented architecture approach. We have an application, internally called StoryAdapter, that takes content from our legacy CMS as well as WordPress, massages it into a common format, and presents it to our rendering application that delivers it to users. This gave us a single point of attack to root out embeds that were not HTTPS compatible, and replace them with HTTPS compatible ones when possible. You can see a snippet of the code we used to do this here.
Special thanks to Greg Franczyk for reviewing drafts of this post, and to Stephanie Clark, Christopher Kankel, Devin Castro, Amanda Hicks, and Matt Pierce for their tireless efforts to finally make HTTPS a reality.
- Because who doesn’t like an extra green bar? ↩︎
- Although they do charge more for custom certificates, which makes sense.↩︎
- With the following value, in case anyone is interested:
default-src https: 'unsafe-inline' 'unsafe-eval'; font-src https: data:; img-src https: data:;↩︎
- No, I didn’t know she had her own video website either. ↩︎