Hello all, I'd like to share what I'm doing and hopefully get feedback to know if I'm on the right way.
We process some text, and for each sentence, we get a tree representing the sentence structure, like the image below (the parser used is Spacy):
Starting from the root, we can follow different paths:
In order to save the tree in ES, we store each path as a string (this technique is called materialized paths):
path1 : "pushes|dobj|the nail",
path2 : "pushes|nsubj|the hammer",
path3 : "pushes|prep|in|pobj|the wall"
The total number of these paths will be huge (millions of sentences)
Then, we need to perform queries like the following:
|nsubj| -> returns all the paths that contain two nodes connected by a nsubj relation
|dobj|* -> returns all the paths that have children connected by a |dobj| relation
and so forth. We're going to execute these queries with regular expressions.
I'm writing here to ask if you think that this is a feasible approach and if you have some suggestions. Thank you very much!