Kibana search based on Lucene Query Parser Syntax


(alexandre) #1

hi all,

I try to understand better the behavior of a lucene search through kibana.

I'm aware about analyzer, tokenizer, filter ...

An exemple:

I have a field "test" and I look for "toto-tata.test.ok" value.

  • If I do test:toto-tata.test.ok I don't get the result as expected.
  • If I do test:"toto-tata.test.ok" I get the result as expected.

I know in lucene, "toto-tata.test.ok" is a Phrase.

So what represents toto-tata.test.ok without quote for lucene ? How kibana make the search on the field "test" when there is no quote for the search value ?

For information, my metafield _all is disable. I have a default field, it is not the "test" field.

Thanks in advance.
Alex


(Joe Fleming) #2

Great question. I've seen this happen before as well, but I never really understood what was happening, so now I had a real reason to look it up :smile:.

When you put something like that in the query bar in Kibana, it passes the contents through to Elasticsearch as the query parameter in a query_sting query. Here's a quick example from the relevant part of the request sent to Elasticsearch, when using clientip:"186.187.11.181" in the query bar:

"query": {
        "bool": {
            "must": [{
                "query_string": {
                    "query": "clientip:\"186.187.11.181\"",
                    "analyze_wildcard": true
                }
            }, {
                "range": {
                    "@timestamp": {
                        "gte": 1467240398565,
                        "lte": 1467241298565,
                        "format": "epoch_millis"
                    }
                }
            }],
            "must_not": []
        }
    },

Drop the quotes and the query becomes:

"query": {
        "bool": {
            "must": [{
                "query_string": {
                    "query": "clientip:186.187.11.181",
                    "analyze_wildcard": true
                }
            }, {
                "range": {
                    "@timestamp": {
                        "gte": 1467240403856,
                        "lte": 1467241303857,
                        "format": "epoch_millis"
                    }
                }
            }],
            "must_not": []
        }
    },

I'm still not entirely sure how the two queries differ, but perhaps it has to fo with the "analyze_wildcard": true specified. According to the docs, "by setting analyze_wildcard to true, an attempt will be made to analyze wildcarded words before searching the term list for matching terms." So, it's possible that the the analyzer is changing the query, and thus affecting the results.

That's not really an answer, I know, but that's as far as I've been able to get with it. In the query string syntax section of the docs, it links back to itself. I'd recommend asking in the Elasticsearch section of the forums, someone over there likely has a better understanding of the query syntax.


(Joe Fleming) #3

You know, I bet it has to do with the - in the query. The dash is a reserved character, so if you don't quote the value or escape that character, it's probably not querying like you think it is. I'm not sure what the - means in the query syntax though...

EDIT: Ah, the - apparently negates a single token, which perhaps means that query is looking for toto*.test.ok, where the * is anything but tata?


(alexandre) #4

Hi,

Thanks a lot for your answer, I think you're right, I need to post in Elasticsearch section.

Regarding the "-" character, I made some tests and it negates a single token if you have a space before the "-" in my point of view.

Alex


(system) #5