Thanks @high-optician-2097 for the feedback 🙏
Of course it's specific, but I was wondering if there was some tweaking that could be done in order to smooth the deployment 😉
We are using postgresql as a DB for Hydra.
The strategy we came up with so far:
• As you say, dry-run the migration
◦ See how long it takes on our prod DB
◦ If everything still works as expected
◦ But in an isolated env
• Once we have this is done and we have a good understanding of the time taken and the specifics:
◦ Clone the DB in prod
◦ Spin up Hydra v2 servers pointing to the cloned DB + the migration job on the cloned DB
◦ Once the migration is done, switch routing from the v1 servers to the v2 servers
◦ And later, clean up v1 servers and the prior DB
This should avoid downtime, with the trade-off that users that received Hydra OAuth2 tokens in mean time will have to reconnect.
Therefore we are planning to do this when we expect low traffic, so minimize the degraded UX.
Since the migration cannot be "undone", our fear is that the actual production env ends up in a state where we cannot come back to the prior one quickly enough (restoring a backup + rolling back instances to v1).