Storing dependency trees as materialized paths in ES, and search for them with Regex

Hello all, I'd like to share what I'm doing and hopefully get feedback to know if I'm on the right way.

We process some text, and for each sentence, we get a tree representing the sentence structure, like the image below (the parser used is Spacy):

Starting from the root, we can follow different paths:
pushes|dobj|the nail
pushes|nsubj|the hammer
pushes|prep|in|pobj|the wall

In order to save the tree in ES, we store each path as a string (this technique is called materialized paths):

path1 : "pushes|dobj|the nail",
path2 : "pushes|nsubj|the hammer",
path3 : "pushes|prep|in|pobj|the wall"

The total number of these paths will be huge (millions of sentences)

Then, we need to perform queries like the following:

|nsubj| -> returns all the paths that contain two nodes connected by a nsubj relation
|dobj|* -> returns all the paths that have children connected by a |dobj| relation

and so forth. We're going to execute these queries with regular expressions.

I'm writing here to ask if you think that this is a feasible approach and if you have some suggestions. Thank you very much!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.