Query Lucene on Kibana Discover not working as intended?

soufian.eldouqe · March 20, 2019, 2:37pm

Hello,

I
I'm trying to search servers names on "host" attribut on my logstash index

I'm searching on the Discover tab in Kibana.

When I tape sl00pm in the search bar I got No results found and when I add " * " and search for sl00pm* I got this :

I don't understand why.

But when I do the same manip on another server name slzq85 I got this :

And this is what I'm expecting.

Here is the definition of my Index on LogStash :

{
"logstash-2019.03.20": {
"aliases": {},
"mappings": {
"apache-access": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"match": "message",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"raw": {
"ignore_above": 256,
"index": "not_analyzed",
"type": "string"
}
},
"index": "analyzed",
"omit_norms": true,
"type": "string"
}
}
}
],
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "keyword"
},
"date": {
"type": "text",
"norms": false,
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
},
"host": {
"type": "text",
"norms": false,
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
},
"settings": {
"index": {
"refresh_interval": "5s",
"number_of_shards": "5",
"provided_name": "logstash-2019.03.20",
"creation_date": "1553036402235",
"number_of_replicas": "1",
"uuid": "mCSFLYGETPm6qbgOwShHog",
"version": {
"created": "5060399"
}
}
}
}
}

And the version :
version": {
"number": "5.6.3",
"lucene_version": "6.6.1"
},

Could you please tell me why my results aren't good?

I want to add that I'am using mapping types and having the same attribut in different mapping types of my Index but with the same definition as above

regards

lukas · March 20, 2019, 4:57pm

Hmm, that is indeed strange. Have you tried doing something like host: s100pm? If so, do you get the same result?

soufian.eldouqe · March 21, 2019, 8:13am

Thank you for your answer
I tried this but I'm still having the same results.

lukas · March 22, 2019, 10:19pm

Hmm, in looking at the standard analyzer, it looks like for some reason sl00pm.sii24 gets analyzed as a single token whereas slzq85.sii24 gets analyzed as two separate tokens. You can use the Elasticsearch analyze API to verify this:

GET _analyze
{
  "analyzer" : "standard",
  "text" : "sl00pm.sii24.po10"
}

// results in

{
  "tokens" : [
    {
      "token" : "sl00pm.sii24",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "po10",
      "start_offset" : 13,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

whereas

GET _analyze
{
  "analyzer" : "standard",
  "text" : "slzq85.sii24.po10"
}

// results in

{
  "tokens" : [
    {
      "token" : "slzq85",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "sii24",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "po10",
      "start_offset" : 13,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

soufian.eldouqe · March 25, 2019, 8:49am

Yes I have the same results here :

sl00pm.sii24 gets analyzed as a single token whereas slzq85.sii24 gets analyzed as two separate tokens.

Do you have any idea why?

soufian.eldouqe · March 25, 2019, 10:34am

I have posted my question in other blogs and I got a logical answer :

" The reason for that behavior is where the analyzer breaks words. The standard analyzer breaks words according to the rules laid out in UAX #29. Rules WB6 and WB11, in particular, are the ones to take note of here.

Basically, it will not break on a letters with a '.' in the middle (ex: "ab.cd"), or on numbers with a '.' in the middle (ex: "12.34"), but it will break on numbers and letters separated by a '.' (ex: "12.cd").

So in your index, "sl00pm.soo85" is indexed as a single token, but "slzq85.soo85" is separated into two tokens: "slz85" and "soo85".

The standard analyzer is designed to work best on text. Words and sentences. For an identifier like you are looking at, you might try a different analyzer, perhaps PatternAnalyzer."

Thank you very much Lukas for your answers

system · April 22, 2019, 10:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana search only works with one character Kibana	6	456	January 3, 2019
Search query issues in Kibana Kibana	4	693	July 5, 2017
Strange search results Elasticsearch	5	1166	July 6, 2017
Kibana cannot search the string that input in Discover page Kibana	2	1039	September 4, 2017
How to do case-insensitive search for a field value? Kibana	8	21159	March 16, 2018

Query Lucene on Kibana Discover not working as intended?

Related topics