I am trying to implement a custom relevance algorithm with ElasticSearch
using a parent/child document relationship. I understand that classic
relational database joins are not possible but there is some limited join
functionality using parent/child or nested documents. I like to calculate a
relevance score for the parent document from matching child documents using
a custom script or similar. Simplified the mapping types are defined as
follows:
{
"my_index": {
"my_item": {
"properties" : {
"url": {"type": "string", "index" : "not_analyzed"},
}
},
"relevance": {
"_parent": {"type": "my_item"},
"properties" : {
"search_term": {"type": "string", "index" : "not_analyzed"},
"score_data": {"type": "object", "index" : "no"}
}
}
}
}
Each item has a number of pre-computed relevance entities that increase its
relevance when they match one or more of the query terms. An example query
could be:
{
"query": {
"has_child":{
"type":"relevance",
"query":{
"terms":{
"search_term":["term_1", "term2", "term3"],
"minimum_match": 1
}
}
}
}
}
I would like to sort the matching parent items according to a custom
relevance formular that uses the score data in the matching child set for
every parent found.
Alternatively the children could be searched, e.g.
{
"fields" : ["_parent", "search_term", "score_data"],
"query":{
"terms": {
"search_term":["term_1", "term2", "term3"],
"minimum_match": 1
}
}
}
This requires that I could sort and return the distinct parent list using
the matching child set with my custom formular (script).
It is unclear to me whether ElasticSearch keeps parents and children on the
same shard automatically but it really seems to be the most meaningful
choice otherwise I cannot compute the parent relevance correctly. I also
have ElasticSearch generating the identifiers when the data is indexed.
I would really appreciate the help if anyone has an idea of how to solve
this relevance problem efficiently with ElasticSearch. Cheers.
--