We are using percolation queries for tagging user docs. We have currently about 80 queries. the most time consuming part of these queries is the geo distance filter we are using upto on 10000 points within each query.
our doc structure is something like
{
"user_profiles": {
"dynamic": "strict",
"properties": {
"code": {
"type": "string",
"index": "not_analyzed"
},
"freq_1": {
"type": "geo_point"
}
}
}
}
"code" -- is four letter upper case alphabetic. there are only 13 variations of it in the percolation queries. user docs can have any variation.
our sample percolation query is like:
{
"query": {
"filtered": {
"query": {
"match": {
"code": "ABCD"
}
},
"filter": {
"and": [{
"or": [{
"geo_distance": {
"distance": "1m",
"freq_1": {
"lat": 2,
"lon": 3
}
}
}]
}]
},
"strategy": "query_first"
}
}
}
we have a lot of docs that, we run through percolation queries, where the code is will not match any of the queries.
is it better to use the percolation query metadata by putting the country code as a attribute on the percolation query and then filtering for each doc, which queries runs against the doc or just use the above method where all queries will run against all docs but they will be able to process very fast as we are doing query_first on code ?