Should in filtered query


(Ravi Shanker Reddy) #1

A mentioned in the docs I replaced by

GET smsc_logs-2017.01.09/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "wildcard": {
            "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
          }
        }
      ],
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "RecordType": "DelAck"
              }
            }
          ]
        }
      }
    }
  }
}

my previous version query is

GET _search
{
  "query": {
    "filtered": {
      "query": {
        "should": [{
          "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
        }]
      },
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "RecordType": "DelAck"
              }
            }
          ]
        }
      }
    }
  }
}

I replaced my filtered query with bool. Its working with must but not working with should in the bool (filtered) query. Is it a bug or am I understanding wrongly??/


(David Pilato) #2

I doubt the old query ever worked as you'd expect.

This is not a query AFAIK:

{
  "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
}

That said I'd write the query like this (simplify the bool part which is not needed actually).

GET smsc_logs-2017.01.09/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "wildcard": {
            "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
          }
        }
      ],
      "filter": {
         "term": {
            "RecordType": "DelAck"
         }
      }
    }
  }
}

But here I think your problem is more on the wildcard part. What is your mapping for field path?


(Ravi Shanker Reddy) #3

My path mapping is a keyword (non-analysed string).


(David Pilato) #4

Can you reproduce it with a simple script?

Note that in initial version you were searching in all indices and now you are only searching in smsc_logs-2017.01.09.

A full simple recreation script would be helpful to have a better understanding of what you are doing exactly.


(Ravi Shanker Reddy) #5

Its a test setup and we have only one indices. Its same to search in all indices or in smsc_logs-2017.01.09.

[root@localhost Delivery]# cut -d "|" -f 10 SMSCDR_DEL_ATTEMPT_16051810*.log | grep "DelAck" | wc -l
15857
[root@localhost Delivery]# cut -d "|" -f 10 SMSCDR_DEL_ATTEMPT_* | grep "DelAck" | wc -l
349366

When I uses must in the bool (filtered) its returning the correct result. But when I uses should its not considering the path condition and returing all DelAck as mentioned below.

I am thinking both should have the same result. Is it right or wrong???

GET smsc_logs-2017.01.09/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "wildcard": {
            "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
          }
        }
      ],
      "filter": {
         "term": {
            "RecordType": "DelAck"
         }
      }
    }
  }
}

Gives:

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 349366,
    "max_score": 1,
    "hits": [
      {

But

GET smsc_logs-2017.01.09/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
          }
        }
      ],
      "filter": {
         "term": {
            "RecordType": "DelAck"
         }
      }
    }
  }
}

Gives

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 15857,
    "max_score": 1,
    "hits": [

(David Pilato) #6

Please format your code using </> icon as explained in this guide. It will make your post more readable.

I'm editing your post.

I think it's correct because should is not mandatory here as you have another condition which is the filter part.

So if should clause match, you will have a better score than without.

I'd put both in filter in that case. Like:

GET smsc_logs-2017.01.09/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "wildcard": {
            "path": "/home/GEMS/ES/CDR/Delivery/SMSCDR_DEL_ATTEMPT_16051810*.log"
          }
        },{
          "term": {
            "RecordType": "DelAck"
          }
        }
      ]
    }
  }
}

(Ravi Shanker Reddy) #7

Sorry about the quotes. By my before readings I understand that the filtered and filter have the better performance than the simple bool query. If I change my query as you mentioned how the heap and performance will effect???

If I have multiple path variables I need to do should (or) operation on that. Then how the query will change in that scenario???


(David Pilato) #8

May be a https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-dsl-constant-score-query.html would help to wrap your query.

So I'd write something like (pseudo code not tested):

GET smsc_logs-2017.01.09/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "wildcard": {
              "path": "/home/GEMS/ES/CDR/Delivery/PATH1*.log"
            }
          }
        },{
          "constant_score": {
            "wildcard": {
              "path": "/home/GEMS/ES/CDR/Delivery/PATH2*.log"
            }
          }
        }
      ],
      "filter": [
        {
          "term": {
            "RecordType": "DelAck"
          }
        }
      ]
    }
  }
}

May be that would work.


(Ravi Shanker Reddy) #9

Its showing an error of

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "[constant_score] query does not support [wildcard]",
        "line": 7,
        "col": 35
      }
    ],
    "type": "parsing_exception",
    "reason": "[constant_score] query does not support [wildcard]",
    "line": 7,
    "col": 35
  },
  "status": 400
}

(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.