Search elements of an array in nested documents

Suppose I have an index which looks like this:

# dummy index
PUT nested
{
  "mappings": {
    "properties": {
      "address": {
        "type": "nested",
        "properties": {
          "city": {
            "type": "text"
          },
          "street": {
            "type": "text"
          }
        }
      }
    }
  }
}

and some documents in there

# dummy data
POST _bulk
{"index": {"_index": "nested", "_id": "1"}}
{"address":[{"city": "Berlin", "street": "Goethestrasse"}]}
{"index": {"_index": "nested", "_id": "2"}}
{"address":[{"city": "Berlin", "street": "Schillerallee"}]}
{"index": {"_index": "nested", "_id": "3"}}
{"address":[{"city": "Frankfurt", "street": "Goethestrasse"}]}
{"index": {"_index": "nested", "_id": "4"}}
{"address":[{"city": "Frankfurt", "street": "Schillerallee"}]}
{"index": {"_index": "nested", "_id": "5"}}
{"address":[{"city": "Berlin", "street": "Goethestrasse"}, {"city": "Frankfurt", "street": "Goethestrasse"}]}
{"index": {"_index": "nested", "_id": "6"}}
{"address":[{"city": "Berlin", "street": "Goethestrasse"}, {"city": "Frankfurt", "street": "Schillerallee"}]}
{"index": {"_index": "nested", "_id": "7"}}
{"address":[{"city": "Berlin", "street": "Schillerallee"}, {"city": "Frankfurt", "street": "Goethestrasse"}]}
{"index": {"_index": "nested", "_id": "8"}}
{"address":[{"city": "Berlin", "street": "Schillerallee"}, {"city": "Frankfurt", "street": "Schillerallee"}]}

How can I use a nested query on this index to find all documents with one or more addresses from an array of addresses? So if I'm inputting

{"address":[{"city": "Berlin", "street": "Goethestrasse"}, {"city": "Frankfurt", "street": "Schillerallee"}]}

I'm looking for docs with addresses that match either or both of the input addresses: 1, 4 and 6 (if scoring is possible, obviously 6 would be ranked higher). One approach would be to use a query builder, but if a solution with mustache templates is possible, we'd prefer that.

Edit: As it seems this is very similar to Matching by array elements

Maybe you can share the query that you tried? A regular nested query in combination with a bool query, that contains a should clause and each of those should clauses is another bool query with two must sound like it should work.

query:
  nested:
    bool:
      should:
        bool:
          must:
            match: city -> Berlin
            match: street -> Gothestrasse
        bool:
          must:
            match: city -> Frankfurt
            match: street -> Schillerstrasse

hope this makes sense.

Thanks for the suggestion which makes perfect sense. The problem in our case is we don't know how many addresses the user is going to search for.

We were looking to build something with more_like_this queries, where we build artificial documents, but unfortunately the relationship of the address parameters (why we use nested objects in the first place) gets lost as you can see in the lucene query:

GET nested/_validate/query?rewrite=true
{
  "query": {
    "nested": {
      "path": "address",
      "query": {
        "more_like_this": {
          "min_term_freq": 1,
          "fields": [
            "address.city",
            "address.street"
          ],
          "like": [
            {
              "_type": "address",
              "doc": {
                "address": [
                  {
                    "city": "Berlin",
                    "street": "Goethestrasse"
                  },
                  {
                    "city": "Frankfurt",
                    "street": "Schillerallee"
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

yields

{
  "valid" : true,
  "explanations" : [
    {
      "index" : "nested",
      "valid" : true,
      "explanation" : "ToParentBlockJoinQuery (+((address.city:frankfurt address.city:berlin address.street:goethestrasse address.street:schillerallee)~1) #_type:__address)"
    }
  ]
}

BTW, moving the second address to a new object in the like array yields the exact same lucene query.

We found a query type which satisfies the logical requirements in query_string, which is pretty readable and maintains the nested objects' attributes relationships:

GET nested/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "address",
            "query": {
              "query_string": {
                "query": "(address.city:Berlin AND address.street:Goethestrasse) OR (address.city:Frankfurt AND address.street:Schillerallee)"
              }
            }
          }
        }
      ]
    }
  }
}

However, putting this type of query together using mustache turns out to be non-trivial... suggestions on the issue are welcome!

what would the input be? With an array like

[
  { "city" : "foo", "street" : "bar" },
  { "city" : "spam", "street" : "eggs" }
]

you could do something like

GET _render/template
{
  "source": {
    "query": {
      "query_string": {
        "default_operator": "OR", 
        "query": "{{#cities}}address.city:\"{{city}}\" AND address.street:\"{{street}}\"{{/cities}}"
      }
    }
  },
  "params": {
    "cities": [
      {
        "city": "foo",
        "street": "bar"
      },
      {
        "city": "spam",
        "street": "eggs"
      }
    ]
  }
}

Thanks for the help! The input would look something like the array you described, hopefully with all parameters present (or we'll just use defaults):

"address": [
  {
    "city": "Berlin",
    "street": "Goethestrasse"
  },
  {
    "city": "Frankfurt",
    "street": "Schillerallee"
  }
]

I think the query_string could be adjusted slightly:

"query": "{{#address}}(address.city:\"{{city}}\" AND address.street:\"{{street}}\"){{/address}}"

which renders to this:

"query" : """(address.city:"Berlin" AND address.street:"Goethestrasse")(address.city:"Frankfurt" AND address.street:"Schillerallee")"""

so the parentheses express the AND logic correctly. The additional "" around the search term are needed for multi-term search phrases, correct?

Edit: Actually it appears that even if a value is missing from the input array, the query_string is rendered correctly as an empty string. So there is no need to escape missing values and the whole thing becomes more readable. Thanks for the help @spinscale!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.