wonderful-mechanic-93160
03/04/2024, 10:18 AMreturn_to
parameter and the resulting URL encoding. Here's a quick overview of the relevant parts of our setup:
⢠Multiple Kubernetes clusters (most of them at version v1.27.3
)
⢠Traefik as ingress controller
⢠Ory Kratos (version 1.0.0
via Helm chart version 0.35.0
) that interacts with providers like Azure AD (now called _Entra ID_⦠because they like to confuse everyone)
⢠Ory Oathkeeper (version 0.46.6
via Helm chart version 0.35.0
) that only interacts with the Ory Kratos workloads.
⢠A _Traefik_[ ](https://doc.traefik.io/traefik/middlewares/http/forwardauth)`ForwardAuth` middleware that basically passed every external request coming into the cluster (e.g. through _Traefik_ ingresses) to Ory Oathkeeper (like described in the Ory official documentation).
In the logs we see that the X-Forwarded-URI
header contains the correct value, which is passed by Traefik through the middleware to Ory Oathkeeper, but the path
field, for whatever reason, suddenly contains the value where only encoded the ?
is encoded (%3F
) while other "encodable" characters like =
are left untouched. This behavior is the absolute opposite of what should happen since the ?
character is safe to be passed, or even more it is required to be non-encoded.
The result is that this encoded ?
is then double encoded (%253F
) later on when the full return_to
parameter value is encoded, so basically the complete URL. The final result is that the browser now redirects to the URL where the ?
character is encoded (%3F
) which makes it impossible for applications to parse it, e.g. ones that provide a HTTP REST API but of course no mapping exists for such requests because the %3F
is interpreted as part of the path, not as URL parameter.
Out Ory Oathkeeper configuration does not modify the parameters in any way but only uses basic authentication flows and methods without mutators (at least not for the routes that suffer from this problem).
In short the flow is the following, using the <http://example.org|example.org>
domain as example:
1. An unauthenticated external request is send to a Ory Oathkeeper protected route (assuming the whole <http://example.org|example.org>
domain only allows authenticated requests): <https://example.org/foo/bar?hello=world>
2. The _Traefik_ ingress matches the request and applies the ForwardAuth
middleware that passes it to Ory Oathkeeper.
⢠in the logs of Traefik the RequestPath
field contains /foo/bar?hello=world
⢠X-Forwarded-URI
header is set containing the exact same value and passed to Ory Oathkeeper
3. Ory Oathkeeper logs that the X-Forwarded-URI
header is /foo/bar?hello=world
.
4. Ory Oathkeeper logs and error: "Access credentials are invalid", which is expected because the request is not authenticated.
⢠the http_url
field in this (JSON) log contains the correct value <https://example.org/foo/bar>
5. In the next Ory Oathkeeper log the X-Forwarded-URI
header is still /foo/bar?hello=world
, but the path
field is suddenly set to /foo/bar%3Fhello=world
!
This is not expected at all since the ?
is "safe" and must not be encoded!
1. Some log lines later (the logging for "Access request granted"), after the user authenticated against Azure AD through our own web UI (that communicates with the Ory stack), the http_url
field is set to <https://example.org/foo/bar%253Fhello%3Dworld>
so the ?
character is now encoded twice!
2. The final redirect URL for the browser now contains the double encoded ?
character which, when decoded again, results in the invalid URL <https://example.org/foo/bar%3Fhello%3Dworld>
.
So in short the URL changes from <https://example.org/foo/bar?hello=world>
ā <https://example.org/foo/bar%3Fhello=world>
ā <https://example.org/foo/bar%253Fhello=world>
ā <https://example.org/foo/bar%253Fhello=world>
ā <https://example.org/foo/bar%3Fhello%3Dworld>
.
I know that all of this is hard to understand in text form and also hard for me to explain in details, but I still work on a way to create a SSCCE through a local kind Kubernetes cluster (debugging in the actual clusters disturbs the workflow of our teams, even in pure development clusters). The plan was to compile Ory Oathkeeper manually to try to prove that this is a bug in the code.
I delved deeply into the Ory GitHub repositories and found issues #1003 which I was very confident that it could be related to our problem. It was resolved in PR #1025, bu as far as I can see only in the api/decision.go
file (for the /decision
HTTP endpoint), and it hasn't changed for version `0.46.6` since, but the `pipeline/errors/error_redirect.go` file was not adjusted. I'm not that familiar with the code yet (planned to contribute to the project in the future), but my guess is that this is a bug which causes this problem we are currently facing. I'm not 100% sure yet since this must have been already noticed by other users since it "breaks" all of our redirects and results in error pages because the applications can't handle these requests (not only our products, but even Grafana throws errors), but I haven't found any other reports yet.
My hope for this wall-of-textā¢ļø is that maybe someone has an idea how to solve this (feel free to shout at me how stupid I am when you know where we did something wrong š) or maybe find others who also faced, also still facing, the same problem.