Leveraging Postgres Advisory Locks for Distributed Consensus
At Subskribe, we solve pretty hard engineering problems. One such hard problem is mutual exclusion of invoice generation. This blog post is the story of how we achieved this without spending a lot of engineering hours by utilizing the technologies at our disposal.
If there is anything that I have learned over the past 15 years of my engineering career, it is that distributed computing is not an easy problem to tackle.
We build mission critical business logic. The business logic gets quite complex, and one critical area of this business logic is invoice generation. There are several constraints on invoice generation, and one such constraint is, "at any given time you should generate only one invoice for a given subscription."
Let us see how this constraint can be violated in a distributed scenario. Imagine two machines M1 and M2 both get instructions to generate an invoice for a subscription S1. M1 wants to generate a usage-related invoice for S1, but M2 wants to generate a recurring-charges-related invoice for S1.
If M1 is not aware of M2 and cannot coordinate, they may generate an Invoice for S1 at the same time, and—voilà—we have violated our constraint, "at any given time you should generate only one invoice for a given subscription."
Advisory Locks API/Contract
At Subskribe, we only use the optimistic variant (try to acquire lock and fail) of the advisory locks. Pessimistic locking (try to acquire lock but wait until you can or timeout) is, in general, not a good pattern, and we haven’t seen much use for it in our engineering needs.
How Did It Solve the Problem?
At last, we can see how all of the theories above can actually solve the problem.
Subscription is a first-class entity on our platform. When executing a sales order in our platform it creates the subscription object with all future invoice generation hinged off of this subscription entity. The subscription entity is modified whenever we create new orders and amendments with several background jobs dependent on those mutations.
So we did not want to lock the subscription because then other operations on subscription would also have to wait. Invoice generation was a specific case where mutual exclusion was required, while the rest of the operations can proceed on a subscription.
The insight we had was to lock the subscription ID for the invoice generation context. Simply put, by locking the string “invoice_gen/SUB-1234”, then two competing processes could coordinate on this key and safeguard the invoice generation critical section.
Here are the steps in execution:
- Time T1: M1 (machine one) tries to acquire a transaction lock on “invoice_gen/SUB-1234”, and successfully acquires the lock.
- Time T2: M1 proceeds to generate an invoice for S1.
- Time T2: M2 (machine two) tries to acquire a lock on “invoice_gen/SUB-1234” and FAILS because the lock is held by M1.
- Time T3: M2 optimistically fails and will possibly try later to acquire locks.
- Time T4: M1 finishes invoice generation for S1 and commits the transaction.
- Time T5: Postgres releases the advisory transaction lock on “invoice_gen/SUB-1234” because the transaction is over (application need not release the lock).
So what did we achieve? We managed to achieve distributed mutual exclusion using Postgres advisory locks using only an arbitrary key (which is not even stored in the database).
Locking String Vs. Number
The reason we lock a string like “invoice_gen/SUB-1234” is that is what is available from the application layer and corresponds to the entity key/identifier we are locking. The advisory locks, however, use a bigint key as the argument (we are locking an integer not a string).
Show Me the Code
This is all well and good, but can you show me the code? Here is the entire listing of the helper class we wrote to make use of the Postgres advisory locks:
Upon completion of this project, we have several takeaways:
- You do not always need expensive consensus management systems to solve problems like Distributed Mutual Execution. Just looking a little further into existing and available tools may lead to quicker working solutions and avoid costly delays.
- Postgres is a very versatile database and has a lot of bells and whistles that can prove useful.
- Defining the problem you want to solve clearly can lead to creative, focused solutions.
- We had to write very minimal code (the code in the post above) to achieve mutual distributed exclusion, which was a big win in terms of time to production.