How to filter two fields based on a list of values

Hi there!
I have just started learning Elasticsearch.
I created an index with four fields that contain information on distances between pairs of postcodes.

{
  "distances" : {
    "mappings" : {
      "properties" : {
        "dest" : {
          "type" : "keyword"
        },
        "meters" : {
          "type" : "float"
        },
        "seconds" : {
          "type" : "float"
        },
        "src" : {
          "type" : "keyword"
        }
      }
    }
  }
}

on 'src' and 'dest' fields I have 29 million pairs of postcodes and on 'meters' and 'seconds' distances between each pair. Assume that these pairs of postcodes are all the postcodes of a city.
If I have a list of postcodes, for example, a list of 200 postcodes, and if I wanted to filter all the pairs that are in my list and no other postcode out of this list, how can I filter the index and how the query should look like?
I made a query like this:

for c in cluster_10:
    search_body = {
        "size":120000,
        "query":{
            "multi_match":{
                "query":c,
                "fields":["src", "dest"]
            }
        }
    }
result = es.search(index="distances", body=search_body)
print(json.dumps(result, indent = 1))

where cluster_10 is the list of the postcodes that I want to filter out, the problem is that the result contains distances between the postcodes in my list, and those out of the list. I can see why, but I don't know how to limit the results to only the codes in the list for both fields 'src' and 'dest'.

multi_match query uses best_fields by default. it matches if any of the field matches the query.
I suppose boolean query with terms query is a simple solution.

"query":{
  "bool":{
    "must":[
      {"terms":{"src": c}},
      {"terms":{"dest": c}}
    ]
  }
}
1 Like

Thank you very much Tomo_M
It didn't work that way but helped me to find the correct query. I just needed to use 'filter' query instead of 'must and give a list of values to be filtered (and not one by one through a for loop).

        "query":{
            "bool":{
                "filter":[
                    {"terms":{"src":list_of_values}},
                    {"terms":{"dest":list_of_values}}
                ]
            }
        }

Bests,

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.