18 octobre 2016
On October 13, 2016 around 11:10am GMT, users visiting websites using GlobalSign TLS certificates, including some hosted by Fastly, started experiencing TLS certificate validation errors. This issue was caused by incorrect certificate revocation information published by our certificate vendor, GlobalSign.
This security advisory describes the root cause of this issue, and describes the actions Fastly has taken to limit customer impact.
Specific website visitors may have seen certificate validation errors, such as “NET::ERR_CERT_REVOKED” when accessing specific websites using GlobalSign certificates.
This incident affected Fastly customers, given several of our TLS options involve Fastly customers using certificates procured for them by Fastly from our TLS certificate authority vendor GlobalSign. For instance, customers using Fastly shared certificates automatically use certificates issued through GlobalSign. Customers who use our Customer Certificate Hosting or SNI customer certificate hosting with certificates issued by a vendor other than GlobalSign were not affected by the issue.
Errors would have been inconsistent and only affected a small number of users, who received incorrect OCSP responses. The issue may have subsisted for customers for up to four days when their local machine cached the OCSP entry. Fastly reviewed our traffic levels at the time of the incident, and did not observe a notable decrease in traffic across its customers.
Root cause
Around October 7, 2016, Certificate Authority GlobalSign removed a cross-certificate and issued a Certificate Revocation List (CRL) listing this revocation. Due to a technical error on their end in compiling Online Certificate Status Protocol (OCSP) responses, starting October 13, 2016, their OCSP responder returned inaccurate responses for a number of intermediate certificates. This led some browsers to infer that several intermediate certificates issued by GlobalSign had been revoked.
This information was propagated through the Online Certificate Status Protocol (OCSP) responders of GlobalSign, and in some cases cached by an intermediate, non-Fastly CDN and on client systems.
These inaccurate OCSP responses led some web users to experience certificate validation errors from websites leveraging GlobalSign certificates, including many Fastly hosted services. The errors resulted in users having to accept certificate failures prior to being able to access the target website, or prevented site access.
Errors would have been inconsistent and limited to a subset of users, as not all browsers validate OCSP prior to allowing access to a web site, and the issue did not affect other mechanisms to signal validity, such as Certificate Revocation Lists (CRL).
Our mitigation and response
Once Fastly was informed of the issue, we contacted GlobalSign. GlobalSign investigated, determined the root cause, and addressed the issue by removing the incorrect OCSP responses.
However, by that time incorrect responses were cached at a number of levels, including the local OCSP cache, part of the operating system. Responses are commonly stored up to the final validity of the OCSP responses; GlobalSign issues OCSP responses with a validity of four days, which means that once received by a client, the client will deem them to be valid for a full four days.
GlobalSign’s OCSP responder used a CDN other than Fastly which may have cached responses and resulted in their responder returning failures even after the root cause was addressed. In addition, due to the caching behavior of client operating systems, some client machines that had accessed websites using a failed OCSP response continued to cache that response. This resulted in clients on those machines not being able to access affected web sites even after the issue was addressed by GlobalSign.
While users can flush the OCSP cache on their machine manually, and GlobalSign had made available guidance for them on how to do so, this went beyond the technical capability of most end users. In addition, the workarounds provided by GlobalSign were not effective in all situations.
Fastly did not consider this sufficient mitigation to be passed along to our customers’ end users. After careful investigation, Fastly offered customers another option to address issues:
Fastly was unable to take action independently from our customers to address the root cause, as we recognize that some customers leverage certificate pinning (see “More information,” below) in their client application. Due to this, we could not immediately roll existing customers to a new intermediate certificate without customer acknowledgement.
Revocation errors should disappear after the expiry of the OCSP response lifetime, four days after the original incident. Customers who migrated to the updated maps provided by Fastly would have seen the issue mitigated shortly after their move.
Fastly recognizes that customers rely on third-party certificate authorities as well as on the CDN to successfully accept and deliver user traffic. As an outcome of this event, we are working on the following remediation and mitigation steps:
We are working with our vendor GlobalSign to ensure plans are put in place to mitigate future events related to certificate issuance and revocation.
Background on certificate validation and revocation checking
When a browser connects to a website and evaluates an X.509 certificate, the browser typically wants to ensure that the certificate is still valid. In order to support this, X.509 allows the certificate authority to confirm validity in a number of different ways. We include reference explanation below, as it helps explain why specific clients were or were not affected by issues resulting from this incident. In principle, most issues would have been seen by users whose browser or operating system performed interactive OCSP requests:
Browsers may, in addition to validity checking, also check whether a certificate is valid for a specific site through Public Key Pinning. This mechanism consists of the client application, whether an app or a browser, validating whether the root, intermediate, or end-entity certificate of a service are to be expected. Pins can be hardcoded in the client, or distributed through the Public Key Pinning Extension for HTTP (HPKP). Pinning is a valuable and commonly used security feature that reduces the risk of a Certificate Authority being subverted to issue an otherwise valid certificate for a service. However, Pinning also may reduce the flexibility of a website to rapidly move to another certificate hierarchy. In this incident, the possible use of pinning by our customers limited Fastly’s ability to automatically and transparently migrate all customers to another hierarchy.
Globalsign Incident Report
GlobalSign has published an incident report with information on their incident response at https://www.globalsign.com/en/customer-revocation-error/. This document contains further information on the steps taken by GlobalSign to prevent recurrence of this type of incident.