Flagsmith + Fastly case study

Flagsmith is a feature flag and remote config service with a difference. This commercial open-source software (COSS) platform allows clients to release features with confidence and manage flags across web, mobile, and server-side applications – via hosted API, private cloud, or on-premises.

flagsmith.com
Industry: Software Development
Location: London, United Kingdom
Customer since: 2023

Favorite features
Content Delivery Network (CDN)
SSE (server sent events)
Load balancer

Flagsmith rocks real-time updates with Fastly’s secret fix

Formed in 2019, after starting out as an open-source project in 2017, Flagsmith fulfills founder Ben Rometsch’s vision of making feature flagging more accessible.

It’s an open-source platform allowing customers to release features and manage flags quickly and efficiently over a range of platforms. As Ben explains, “I had worked with lots of organizations doing cloud transformations, building APIs, and helping them deploy and deliver applications. I saw a common problem where they were doing sprint-based development and trying to deploy their code once every couple of weeks, once every four weeks, or whatever, with multiple teams working interdependently.

There’s a lot of pressure with multiple teams trying to get their tests to pass and their applications to talk to each other and work in a consistent way. Over and over again, one or more teams wouldn’t be ready, and this would push the release to the next sprint. This causes blockages, the whole platform gets delayed, and all of a sudden you’ve got twice as much code going live. It’s a complicated problem that’s process and people related.”

Feature flags felt like the ideal answer. That’s because they allow teams to release code when they want to (and when they’re able to), rather than rushing when a release is scheduled. They’re a godsend for companies with teams running legacy stacks, lacking good test coverage, and lagging in interdependent processes – and the problem-solving foundation of Flagsmith’s accessible, open-source feature flagging platform.

Fast-forward a few years and Flagsmith is forging ahead, with a growing team of 15 running an API handling close to 2 billion requests a month (and counting).

However, success comes at a cost. That’s because the team regularly has to scale out its API to 5X bursts – which, depending on customer needs, can equate to over 4k requests per second. So how could Flagsmith’s infrastructure cope with such dramatic bursts in traffic load (sometimes over just 10 seconds)?

Without having to resort to onerous tactics like long polling, they needed a way to update client SDKs in real time when the value of a flag changed. The search was on.

Horrendously bursty traffic

If you’re integrating the Flagsmith SDK into a mobile app for example, you want to make sure someone in Sydney gets their flags ASAP. Our edge API gets billions of requests a month and our core API gets in the region of a million maybe, so there’s a 1000:1 ratio between the traffic.

Flagsmith’s setup evolved alongside its success. Originally a single API, hosted on a single physical server in a data center, the platform is now split into two parts: the core management API and the API serving flags to customer applications. But these two APIs have very different requirements and load traffic behavior. The core API is very predictable, doesn’t have low latency requirements, and doesn’t need multi-regional latency – by contrast, Flagsmith’s current ‘at edge’ API runs in 8 global regions serving flags to millions of clients.

As Ben explains: “if you’re integrating the Flagsmith SDK into a mobile app for example, you want to make sure someone in Sydney gets their flags ASAP. Our Edge API gets billions of requests a month and our core API is in the low millions, so there’s something like a 1000:1 ratio between the traffic.”

But to meet the demands of global low-latency replication, Flagsmith was spending too much time dealing with the scaling issues of running a monolithic classical infrastructure. And this was compounded by the demands of what Ben describes as “horrendously bursty traffic” – caused by an event like a large client sending out a push notification to all their users within a few seconds. The team received a tsunami of calls from clients triggered by the notification and the surge placed unreasonable demands on the infrastructure.

This was the crux of the issue. And customers started to demand a real-time feature where if an app was open, an engineer or product owner could toggle flags and the real-time connection would receive the message so the flags could get updated.

Nice idea. But it was one which presented a big problem because Flagsmith had just moved to serverless architecture for this part of the API.

SSE – Fastly’s ‘secret’ fix

Flagsmith wanted a solution with a couple of interesting requirements:

It needed to scale forever. In other words, to the point where theoretically no other solution would ever be needed again.

Real-time data features are typically used by betting or trading applications with an obvious need for real-time streaming. This is a core UX requirement for those types of sites and so justifies the required data and cost.

However, this wasn’t the case with Flagsmith. That’s because it only needed to send one piece of information to the SDK to say the flags had changed, and to trigger a flag update in the app. This made the cost for a web socket based streaming service exorbitantly high compared to the tiny amount of data required and the value they would be binding to customers.

This is when Ben stumbled upon a Fastly blog explaining an oft-overlooked Fastly CDN functionality – SSE (server-sent events). After more research (and discussion with Fastly experts), Ben and Flagsmith Senior Backend Engineer Gagan Trivedi realized SSE provided an elegant and economical solution. Crucially, one which satisfied client demand for real-time functionality without the eye-watering cost.

For Gagan, the crucial Fastly SSE feature is request collapsing – which allows a second request for the same URL you’re already fetching from origin to join on to the same origin request, avoiding a second origin fetch for the URL. This prevents the “cache stampede” issue for normal requests, but crucially, also acts to fan out one stream to multiple users.

As Gagan explains, request collapsing is exactly what Flagsmith required, because “the unique thing about us is that we have very few environments compared to connections. For instance, we could have one environment that, let’s say, is used by a web app that pushes notifications to five million users. After a request collapses, that can equate to just one connection to us, so we don’t have to worry about the scaling part because in our infrastructure, we offload that to the service handling the fan out pattern, which is Fastly.”

Fantastic functionality (plus a load off the server’s shoulders)

Supercharging content delivery is a more typical use case for Fastly CDN. But Flagsmith’s unique harnessing of its SSE capability further underlines the product’s quality, flexibility, and versatility.

After implementation, Flagsmith was relieved to have found an economical solution to meet client demands and enhance UX: practical and powerful real-time flag updates which weren’t prohibitively expensive.

Request collapsing means Fastly handling the massive majority of requests (upper graph) compared to what hits the Flagsmith server (lower graph). As you can see, the results are striking.

Flagsmith image 1

Flagsmith image 2

Ben’s more than happy with the results: “We didn’t have this entire feature. We didn’t have real-time flag updates, so it’s a completely new piece of functionality we were able to provide our customers.”

And he also confirms Fastly was “super-responsive working through what’s an unusual way of implementing the platform. Sometimes it’s hard to get the ear of larger providers because they think ‘this isn’t a big enough account – just go and look at the docs’. But Fastly was super-helpful in terms of the technical implementation and helping us bend the platform in ways you wouldn’t expect”.

Ben Rometsch, Co-Founder & CEO, Flagsmith

After such a satisfying outcome, real-time flag updates might signal the start of a deeper relationship with Fastly. “The experience we’ve had with the organization in terms of people and technology has been great,” says Ben. “So I would not be surprised if we’re having a conversation at some point about migrating the compute workload to Fastly too.”

However the story unfolds, we’re happy to fly with Flagsmith.

“After request collapsing, we don’t have to worry about the scaling part because in our infrastructure we offload that to the service handling the fan out pattern, which is Fastly.”

Gagan Trivedi
Senior Backend Engineer, Flagsmith

“We didn’t have real-time flag updates, so it’s a completely new piece of functionality we were able to provide our customers. Fastly was super-helpful in terms of the technical implementation and helping us bend the platform in ways you wouldn’t expect.”

Ben Rometsch
Co-Founder & CEO, Flagsmith

Fastly を試してみませんか ?

アカウントを作成してすぐにご利用いただけます。また、いつでもお気軽にお問い合わせください。

無料アカウントの登録お問い合わせ