Hey Ory team :wave: Yesterday, we bumped our self-...
# talk-oathkeeper
g
Hey Ory team šŸ‘‹ Yesterday, we bumped our self-hosted Oathkeeper instances from
v0.38.17
to
v0.40.0
Since this upgrade, we observe a significant rise of latency. Our p95 went from 20ms to 33ms. We use 3 rules, in decreasing order of no. of calls:
oauth2_introspection
,
cookie_session
and
bearer_token
. Any idea what introduces this? And we can do? šŸ™‚
We are aware that tracing was enabled on
cookie_session
and
bearer_token
, which is one suspect for this latency increase. And then there is the "improvements to cache logic and request detection" of v0.39.0 that could also have something to do with this šŸ™‚
s
hard to tell just by this chart... I'd recommend to try with tracing en- and disabled, and maybe compare the latency per endpoint?
g
Of course, the chart is mostly to show "we observe a higher latency, something is fishy" I'll explore a bit more what we have on our side. Did you have other feedback or saw something similar on your end? šŸ¤”
s
nothing I know of...
g
It seems that the instances require more CPU than previously to function correctly under our current load, and in our kube cluster they get throttled since we upgraded versions.
Here a screenshot of the CPU usage, to show that we see at least a x10 on CPU usage 😱
s
hm this looks worrying indeed any ideas @high-optician-2097? I would not know where to start tbh...
h
g
Perhaps! What would be an easy way to measure a potential CPU load regression on oathkeeper? šŸ¤” In the meantime, we are in the process of rolling back to v0.39.4 for now.
We can confirm on our side that v0.39.4 shows the previous performances: • CPU load significantly decreased again (divided by 10 approximately) • Therefore no more CPU throttling and the response times went back down also The regression was introduced in v0.40.0, with a fair chance that the cause is https://github.com/ory/oathkeeper/pull/999