Hi all I’ve got a question on what I imagine is a ...
# talk-keto
h
Hi all I’ve got a question on what I imagine is a common use case: how to efficiently filter result sets from existing data sources using Keto For example, say: • I have a database containing text documents, and I’m using Keto to store document/user permissions • Both data sets could be large When a user conducts a search, I could: • Search across the document database for the given matching set, then iterate through that set, checking with Keto document by document if the user has access to produce the filtered set • Or do it the other way around, ask Keto for the list of all documents a user has access to, then apply that filter to a database query along with the user requested search But given that result sets returned from Keto and the document search could be large, the cross-service filtering looks like it’ll have performance issues either way Are there any best practise / standard approaches with Keto covering this case?
p
This is the same issue that I’m wrestling with for a design. It feels to me like “the main issue” with a permission microservice…
I don’t have definitive answers yet, but here are a few ideas that I’m mulling over: 1. You need “perimeters” by which to limit the result sets from your database. You need some filters at the database level to limit results from there and then perform more fine grained checks against keto. 2. If you have “layered” permissions, you may need entirely different code paths for queries. The idea is that a subject will have access either to all the resources in a container or relatively few resources in a container. E.g. if you do a permission check and the user is owner of the container, you don’t need to perform any more permission checks on individual resources; you can just query the database and let it do the heavy lifting. If on the other hand the user is not the owner of the container, you first get resources that they can access from keto and then pass that list into your database query as a filter.
All very abstract I know and it’s still formulating in my 🧠 , but that’s the direction I’m going. I’m experimenting today with figuring out how to “get all the resources a subject can access” from keto
h
Yes, I’d been thinking on some form of domain specific denormalization to aid performance in the volume cases. For example: • A field to indicate of the document is ‘public’ assuming you have a large volume of public documents • A field indicating the document owner Used as such: • Query:
SELECT * FROM documents d WHERE d.text="something super important"
• Filter:
Foreach result: filter {document.isPublic || document.ownerId == 1234 || keto.checkPermission(user, document, "read")
As for the layered approach, I’d mulled that too. Say the user is searching within a ‘Collection’ of documents, and the Collection in Keto grants the requested permission to the given user for constituent Documents - then filtering in the database for Documents that are members of the given Collection is sufficient. So in approximate practise: • User requests read permission for documents in the ‘dogs’ Collection • App runs:
Copy code
if keto.checkPermission(user, dog-collection-id, "read") { 
    return SELECT * FROM documents d WHERE d.parent_collection = <dogs-collection-id>
} else { 
  // default filtering
}
Again denormalization, replicating the collection/document hierarchy in Keto and the database. However you cut it, it leaks the permission outside of Keto which irks, but I have feeling pragmatism trumps Pretty early thinking though, I too am formulating!
p
My intent is to use self-hosted Ory keto, and so I’m inclined to just poke around the database and see if I can muster up some queries myself. I agree that leaking permissions is irksome so I’d like to avoid it if possible, but I’m not sure it is.