I have two queries, where one takes about 100ms and the other times out after 24s. The only difference is that the first (SLOW) is roughly
{"and": [FILTER_A, {"and": [FILTER_B, SCRIPT_FILTER_C]}]}
and the second (FAST) is
{"and": [FILTER_A, {"and": [FILTER_B]}, SCRIPT_FILTER_C]}
(Fuller versions below.) SCRIPT_FILTER_C
is expensive, so it should be evaluated last. We're generating these filters to some degree, which is why there's awkward nesting of and
filters.
My questions: 1) Why does this happen? Does the "and" filter with one clause lead to some weird ordering or inability to cache? 2) Is there any way for me to go and diagnose this myself?
Thanks!
SLOW:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"terms": {
"folder_ids": [2756]
}
},
{
"and": [
{
"query": {
"match": {
"bases.substring_ngram_match": {
"minimum_should_match": "100%",
"query": "GAAATTTGTGATGCTATTGCCCTCGTGCGCTCTCCTGTTC",
"analyzer": "sequence_substring_ngram_analyzer"
}
}
}
},
{
"script": {
"lang": "groovy",
"params": {
...some_script_params...
},
"script_file": "source_regex"
}
}
]
}
]
}
}
}
}
FAST:
{
"query": {
"filtered": {
"filter": {
"and": [
{
"terms": {
"folder_ids": [2756]
}
},
{
"and": [
{
"query": {
"match": {
"bases.substring_ngram_match": {
"minimum_should_match": "100%",
"query": "GAAATTTGTGATGCTATTGCCCTCGTGCGCTCTCCTGTTC",
"analyzer": "sequence_substring_ngram_analyzer"
}
}
}
}
]
},
{
"script": {
"lang": "groovy",
"params": {
...some_script_params...
},
"script_file": "source_regex"
}
}
]
}
}
}
}