Using Circuit Breaker Feature Flags
No web application is an island. Modern web applications are an amalgamation of services, and any single one of those services can need maintenance or be offline at any given moment. In some cases, a degraded service can take down the entire application or create ripple effects in terms of customer satisfaction, increased support requests, or a variety of other challenges. With carefully considered “circuit breakers” in place, however, that can change.
What is a circuit breaker?
Circuit breaker flags in web applications aren’t all that much different from the standard electric circuit breaker that protects devices so they don’t receive too much current and potentially start a fire. The main difference in web applications is that they aren’t limited to safety-centric scenarios. They can also be used to temporarily disable specific areas of functionality for a variety of purposes and even help mitigate spammers, bots, or standard traffic surges.
Unlike temporary feature flags that are removed after a feature has been fully rolled out, circuit breaker flags tend to be long-lived—if not essentially permanent. Since the services behind circuit breakers represent core functionality, the circuit breakers serve as a devops tool to help mitigate issues and provide a better user experience during downtime or downgraded performance.
Why use circuit breakers?
Like many devops tools, circuit breakers make it less painful to handle unexpected events that lead to degraded performance or downtime. By designing an application with circuit breakers for the various services, it’s easier to proactively keep customers informed and ensure they aren’t unceremoniously dumped onto an empty page, or worse, a less-than-friendly generic error page.
With circuit breakers and the relevant interface elements, teams can reduce support requests by informing customers of degraded performance or downtime right where they are. Instead of wondering if a feature is broken or if they should file a bug report, they can stay focused on their task without worrying about whether the team is working on it.
Similarly, circuit breakers can streamline maintenance by disabling only the affected services with the hope that it can prevent more significant downtime in the larger application.
- Reduced support requests. With proactive messaging and explanations, users don't need to contact support.
- Better user experience. End users can receive explanations and answers instead of encountering generic error pages.
- Easier maintenance. Temporarily disabling a feature in order to perform maintenance or updates to that specific feature reduces the chances of conflicts or downstream effects.
- Less overall downtime. By catching and blocking problems caused by one feature, we increase the chances of keeping that feature from causing problems that could take down the entire application.
With circuit breakers in place, teams can disable specific services and design the interface to gracefully handle the downtime. In the case of search, the search fields could be disabled and display a message proactively explaining the downtime as well as providing an estimate on when it will return.
And circuit breakers don’t have to be limited to external services. They can be used for any type of maintenance or safety valve. Migrating or upgrading a database? Use flags to temporarily put the primary database into read-only mode during the cutover. Traffic surge affecting a non-critical service? Use a flag to disable it so it can’t cause problems for the more critical services. Spammers hammering a registration page? Temporarily disable registration or enable extra layers of verification for new accounts.
Where can circuit breakers help?
In the case of web applications, we can use circuit breakers to help manage a few different common challenges without needing to roll back or re-deploy. With Flipper, you can even toggle the flags from a mobile device if necessary. When adding a circuit breaker, we have two primary goals: protect the larger system from smaller failures and provide a more graceful experience for end users who would otherwise encounter the problem. Any scenario that can benefit from one or both of those conditions is a good candidate for a circuit breaker.
In some cases, circuit breakers can used to completely disable a service, but they can also be designed to throttle a service so it can still work but maybe not as quickly as it normally would. Let’s take a look at some specific scenarios and example solutions using circuit breakers.
1. Mitigate External Provider Downtime
With circuit breakers, teams can have an instant way to temporarily disable functionality affected by external providers. So for features like search or third-party logins, if the vendor is offline, the functionality can be disabled gracefully. In other cases like webhooks or polling, if provider goes offline, the webhooks and polling can be temporarily disabled until the provider is back online.
Imagine an application’s search functionality depends on a third party provider, and that provider has a one-hour maintenance window planned where search temporarily won’t work. Without circuit breakers, when that provider starts their maintenance, the search box will still be visible in the interface, and people will still try to search.
They may end up seeing an error page or an empty results page, and in many cases, that can mean an increase in support requests or even some lost trust or perception that the service is unreliable.
Similarly, for apps that rely on third-party chat or support widgets, if the service goes offline, a circuit breaker could disable the interface elements for those tools and instead focus on directing people to send emails the old-fashioned way as a fallback for anyone that needs help.
2. Fail & Troubleshoot Gracefully
In addition to providing insulation from external service issues, circuit breakers can also make it easier to troubleshoot. An internal service can temporarily disabled in production for most users while leaving it enabled for internal team members so they can still investigate and troubleshoot in production if necessary.
For example, if an application has a collection of standardized reports that can be run and cached in the background but also has more performance-intensive ad hoc reporting that’s creating problems, then the ad hoc reporting could be disabled for all but the developers working on the problem. That way, they can still run ad hoc reports in production and follow the logs without having to leave the feature enabled for everyone.
In the meantime, while the ad hoc reporting is offline, end users receive a status messaging that lets them know what’s happening and that the team is already working to fix the problem. It provides a better experience for the team fixing the problem and the customers who are inconvenienced by it.
3. Mitigate Downtime Caused by Non-critical services
In the case of the previous example, being able to disable the ad hoc reporting could also play a role in mitigating larger downtime. If the reporting problem is database-related, that could create ripple effects that creates problems for other services. But by temporarily disabling the problematic service, the team gets some breathing room to address the issue.
The same idea can be applied to areas like logging as well. If a surge in traffic is creating issues related to logging, being able to temporarily throttle or delay logging can be an almost invisible change that also creates some headroom until the larger problem can be solved. This can work with background jobs as well. If the server running background jobs is struggling to keep up, jobs can be processed on a delay, or systems that spin up a lot of non-critical jobs can let those jobs linger for a bit without being processed.
# Only log 10% of requests
# Log 100% of requests
Alternatively, in cases like e-commerce, if there’s a problem processing a specific form of payment, a circuit breaker could temporarily disable that form of payment in order to avoid needing to completely disable checkout. It might temporarily affect sales, but it would minimize the amount of lost sales by ensuring the other payment methods stay available.
4. Lower Risk for Migrations & Maintenance Mode
Sometimes when making large changes to the user-facing portion of a web application, some level of downtime is borderline inevitable. But what if the API interface could still work even if the web interface is in maintenance mode? It's not ideal, but it sure beats taking the whole application offline.
# Present a helpful message about the web interface maintenance,
# and clarify that other API and other requests are unaffected.
# Handle the request like normal.
Circuit breaker flags can also help with planned maintenance, migrations, or upgrades. They can provide a way to disable individual features during maintenance while leaving other functionality unaffected.
For example, if you're making significant updates to your search indexing, you can temporarily disable the search interface while the rest of the application can still be fully usable. So instead of fielding endless support questions or taking your application offline, you can temporarily turn off search while leaving everything else unaffected.
if Flipper.enabled?(:search, current_user)
# Enable Search UI in Views
# Disable Search UI in Views & Show Maintenance Message
Similarly, during a database upgrade, a circuit breaker could be used to temporarily disable writes so that all of the data is still available to read, and end users are only partially inconvenienced instead of completely inconvenienced.
5. Protect Services from Bots and Spammers
Another common use for circuit breakers is protecting public-facing web forms like registration or support. At the simplest level, the forms can be completely disabled until the attack subsides. Or, in the case of contact forms, the form could automatically be replaced with an email address or social media handles so legitimate users still have options for support.
<% if Flipper.enabled?(:registration) %>
# Show the Form
<% else %>
# Hide the Form and Display a Message
<% end %>
Alternatively, if a registration form is being hammered and creating junk accounts, instead of fully disabling the form, a circuit breaker could enable more draconian additional verification steps during the attack in order to stem the tide while still letting legitimate traffic through—albeit at a slightly increased level of hassle.
<% if Flipper.enabled?(:registration) %>
# Show the Form
<% elsif Flipper.enabled?(:registration_protection) %>
# Require additional verification steps
<% else %>
# Hide the Form and Display a Message
<% end %>
Fully-disabling registration or public forms temporarily is certainly an option of last resort, but in some cases, it's the only thing that works when playing cat-and-mouse with bots or spammers. Even if it's only used to disable registration long enough to implement better spam controls, it can help. It's never any fun to try and manage a batch of spam while also trying to fix the underlying problem at the same time.
We don't want a hammer and nail problem where everything we see looks like an opportunity to use feature flags, but we have endless scenarios where judicious feature flags can play a significant role. Circuit breakers provide a powerful way to help keep a large application running for the vast majority of users while minimizing the inconvenience to a subset of users.
And being able to enable and disable those features from mobile devices can be the key between hours of review and cleanup or hitting pause to buy time to fix the problem. Better to briefly disable one feature than let a problem spiral out of control until it brings down the entire application.