Accelerating Development with Feature Flags
A few options were considered, including other commercial solutions as well as rolling our own from scratch. Ultimately we decided to use AWS AppConfig. There were a number of reasons:
Frontend and Backend Support
The Subskribe application uses Node.js, Typescript, React, and GraphQL for its frontend and Java for its backend. Our approach therefore needs to support the ability to query whether a flag is enabled from both Typescript hosted in Node.js, as well as Java code running in the JVM.
While AWS AppConfig exposes an API that can be called from Node.js and Typescript, we want our frontend and backend to have a consistent view of which features are enabled. Because of that, we decided to expose a GraphQL query from our Java backend which returns the flag values stored there. The backend handles all queries to the AppConfig API.
Data Fetching and Caching
Feature Flag settings are stored both locally in configuration files deployed with the Subskribe application as well as remotely in the AWS AppConfig service. This allows developers to change settings locally without needing to depend on AWS. That said, if a feature has a setting both within a local configuration file as well as in AWS AppConfig, the AWS setting takes precedence (whether enabled or disabled). This enables us to toggle a feature on or off irrespective of what configuration settings happen to get deployed with the application.
Unlike at application start, once the application is up and running, if a call to AWS to fetch the Feature Flag configuration fails (after some retries) we simply log an error and return the old, cached value. Subsequent queries of the feature flag values from Java or UI code will trigger new attempts to fetch the configuration from AWS.
Why do we require the flag values stored in AWS to be successfully read by the application on startup and not fallback to the config files? This is because we view AWS AppConfig as the source of truth for this data. If we relied on the local config file values in the face of AWS download failures, we would either need to ensure the local config values kept getting updated (which would defeat the purpose of dynamic flag settings) or we would have to live with an inconsistent set of flags whenever we had a hiccup in contacting AWS on application startup.
While the AppConfig library has an easy-to-use API, we wanted something simpler for our backend and frontend developers to query. As such, we built a very simple Features class which provides an interface to query whether a specific feature is enabled, abstracting away calls to AWS as well as lookups to any internal configuration.
As an implementation detail, we created a wrapper class around AWS’s AppConfigDataClient that abstracts away the loading, caching, and fetching that was described above.
To query from application Java code:
As noted above, we created a GraphQL query so these values can also be retrieved by our UI. The query has a simple definition which can be called using your favorite GraphQL client library:
which returns a boolean value.
The block diagram below provides an overview of the architecture we use in the Subskribe application.
Deploying Feature Flag Updates
We ended up building out simple yaml files, the format of which looks like:
Those files are stored in github. We have a separate deployment pipeline which listens to github for updates to those yaml files and when a change is committed it makes AWS API calls (via the AWS cli), pushing the changes to the appropriate environment in AWS.
The following diagram illustrates our different deployment flows.
While our code deployment pipeline can take minutes or hours to work through (depending on various factors), our Feature Flag deployments complete within a few seconds.
We have been working with Feature Flags for a while. They have been successful in reducing the number of deployment issues we are seeing related to pushing functionality. When we have found an issue with a feature in one of our deployment environments, we have been able to quickly disable the offending functionality without needing to rollback or redeploy our code.