Storing a tree of documents


(Geoffrey On Rails) #1

I'd like to store elements related to a grid which consist of physical entities (mapped to an individual document in my model), and those entities follows a tree structure (A is a parent of B which is a parent of C...) that can have an important depth (let's say to the hundreds). Therefore, fully denormalize would be a little cost prohibitive.
I've read about parent join as well as terms queries : Assuming I'd like to get the parent/child (potentially list of ancestors/descendants) of a given entity/set of entity, what should I use?
I am able to make it using terms query, by storing the id of the parent (or all the ancestors) in each document and making a direct request/terms request to find either parent of children.

{
"query" : {
    "terms" : {
        "_id" : {
            "index" : "entities",
            "type" : "_doc",
            "id" : "MyID",
            "path" : "parent"
            }
        }
    }
}

I would be able to make it defining a joining query as well, but I've seen it is not recommanded. However, I don't really get why, and what is the use case of those joins if its possible to make it as simply using the terms query.

So my question is more about the impact of each solution (performance-wise, "querying capability"-wise), as I don't know enough about ElasticSearch's internals to compare the two options.

Thanks in advance.


(Magnus Kessler) #2

I would encode the path from the root of the tree into the documents and then use the path-hierarchy tokenizer to index the path field. This way you can retrieve sub-trees easily.

DELETE my_index

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "path_hierarchy"
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "path": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT my_index/doc/1
{
  "path": "/one/two/three"
}

PUT my_index/doc/2
{
  "path": "/one/two/three/four"
}

PUT my_index/doc/3
{
  "path": "/one/two/eight/nine"
}

GET my_index/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "path": "/one/two"
        }
      }
    }
  }
}

(Geoffrey On Rails) #3

Thanks a lot for your answer, I didn't knew about that specific feature. Let's add a few documents :

PUT my_index/doc/4
{
  "path": "/one/two"
}

PUT my_index/doc/5
{
  "path": "/one"
}

I am able to request all the documents below document 4, corresponding to the request you provided :

{
  "query": {
    "bool": {
      "filter": {
        "terms": {
            "path" : {
                "index" : "my_index",
                "type" : "doc",
                "id" : "4",
                "path" : "path"
            }
        }
      }
    }
  }
}

However, I struggle to get documents above document 4 (i.e document 5, or documents 4 and 5). How would you proceed? (We might assume that the path is composed of the documents IDs if that helps, as it would be true in my case)


(Magnus Kessler) #4

Your last query is an example of a terms lookup query, where the actual value is retrieved by querying the content of a document in an Elasticsearch index.

If you wanted to retrieve all documents where the path starts with /one, you'd simply use

GET my_index/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "path": "/one"
        }
      }
    }
  }
}

Note, that in this case you would just pass the value (or if using a terms query, an array of values). Your example was reading the value /one/two from path in document 4.


(Geoffrey On Rails) #5

I got what that query does, I've used the terms lookup since in my case I will know the ID but not necessarily the full path.
Assuming I have Country/Region/City/Street kind of tree, I would know only the city name (hence the lookup) and then be able to fetch all the streets within the city, which is the sub-tree. So far so good.
But knowing that city, I'd also like to have a different query to directly access the documents representing the Region and the Country. This is where I struggle : How do I get only the ancestors (the branch from the root up to the city) but not the descendants.

In my case above, it translates in how to return only document 4 and 5.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.