I only posted this here after about two weeks of debugging, and I figured out the answer only a few hours later! 🎉
The primary way this occurs is when you don’t provide a cipher secret that can be used to generate the code. I actually had run into this before about two years ago and forgot the solution.
My case was a little more of an edge-case but I will document it here for posterity. My app was previously hosted at
app.appname.com (not the real address) and is now hosted at
app.companyname.com. In order to maintain my users’ accounts, I used the same postresql database in the old deployment for
app.appname.com, and
app.companyname.com. Here’s where things get funky. The code is generated and placed into the database and then a worker courier service sends out the emails. Since both deployments were pointed to the same database, starting a flow from the first deployment would sometimes be picked up by the second deployment’s courier service! I did not dive into the code to figure out how this is handled, but I am assuming that some mechanism determines the discrepancy and drops the code from the email for security reasons. Eliminating the first deployment’s kratos courier service resolved this issue!