I'm encountering a difficulty related to more_like_this with nested properties. What I would like to do is find similar parent documents where similarity is based on having many similar children. This seems like a standard sort of thing, but my google-fu has failed me.
Allow me to explain. Suppose I have an index set up like this:
{
"my_index": {
"mappings": {
"gizmo": {
"properties": {
"some_property": {"type": "text"}
"doodads": {
"type": "nested"
"properties": {
"prop1": {"type": "text"},
"prop2": {"type": "text"}
}
}
}
}
}
}
}
Given a gizmo, I would like to find other gizmos that have similar doodads. Doodads are similar if their prop1 and their prop2 are similar. My first approach was the following:
{
"query": {
"nested": {
"path": "doodads",
"query": {
"more_like_this": {
"fields": ["doodads.prop1","doodads.prop2"],
"like": {
"_index": "my_index",
"_id": 1
}
}
}
}
}
}
That didn't do at all what I wanted. It threw all of the prop1's and prop2's from every child of gizmo 1 into one big bag and compared it against each of the children of every document in my index (again, throwing their prop1 and prop2 into a bag). That's no good since there is no longer a distinction between prop1 and prop2, let alone between the individual child documents of gizmo 1.
Take two was this:
{
"query": {
"nested": {
"path": "doodads",
"query": {
"bool": {
"should": [{
"more_like_this": {
"fields": ["doodads.prop1"],
"like": {
"_index": "my_index",
"_id": 1
}
}
},
{
"more_like_this": {
"fields": ["doodads.prop2"],
"like": {
"_index": "my_index",
"_id": 1
}
}
}],
"minimum_should_match": 2
}
}
}
}
}
Ok, this is better because at least prop1's are getting compared to prop1's and prop2's are getting compared to prop2's, but it still throws all the prop1's from all the doodads of gizmo 1 into a bag and all the prop2's into a second, independent bag, losing the information of which prop1 and prop2 went together in the same doodad.
Effectively, this query is a big AND-of-ORs:
prop1 like (... OR ... OR ... OR ...) AND prop2 like (... OR ... OR ... OR ...)
What I am looking for is the OR-of-ANDS:
(prop1 like ... AND prop2 like ...) OR (prop1 like ... and prop2 like ...) OR ...
Is this even possible to do without querying once for every child document of gizmo 1 and then aggregating and ranking results client-side? There could be a lot of doodads, so this could turn into a lot of queries. Also, if some doodad shows up in a bunch of gizmos, this could crush the client since it would have to hold all those results. If it is possible, is it possible in a way that also allows me to use some_property in the similarity metric?