Query Lucene on Kibana Discover not working as intended?

Hello,

I
I'm trying to search servers names on "host" attribut on my logstash index

I'm searching on the Discover tab in Kibana.

When I tape sl00pm in the search bar I got No results found and when I add " * " and search for sl00pm* I got this :


I don't understand why.

But when I do the same manip on another server name slzq85 I got this :


And this is what I'm expecting.

Here is the definition of my Index on LogStash :

{
"logstash-2019.03.20": {
"aliases": {},
"mappings": {
"apache-access": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"match": "message",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"omit_norms": true,
"type": "string"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"raw": {
"ignore_above": 256,
"index": "not_analyzed",
"type": "string"
}
},
"index": "analyzed",
"omit_norms": true,
"type": "string"
}
}
}
],
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "keyword"
},
"date": {
"type": "text",
"norms": false,
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
},
"host": {
"type": "text",
"norms": false,
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
},
"settings": {
"index": {
"refresh_interval": "5s",
"number_of_shards": "5",
"provided_name": "logstash-2019.03.20",
"creation_date": "1553036402235",
"number_of_replicas": "1",
"uuid": "mCSFLYGETPm6qbgOwShHog",
"version": {
"created": "5060399"
}
}
}
}
}

And the version :
version": {
"number": "5.6.3",
"lucene_version": "6.6.1"
},

Could you please tell me why my results aren't good?

I want to add that I'am using mapping types and having the same attribut in different mapping types of my Index but with the same definition as above

regards

Hmm, that is indeed strange. Have you tried doing something like host: s100pm? If so, do you get the same result?

Thank you for your answer
I tried this but I'm still having the same results.

Hmm, in looking at the standard analyzer, it looks like for some reason sl00pm.sii24 gets analyzed as a single token whereas slzq85.sii24 gets analyzed as two separate tokens. You can use the Elasticsearch analyze API to verify this:

GET _analyze
{
  "analyzer" : "standard",
  "text" : "sl00pm.sii24.po10"
}

// results in

{
  "tokens" : [
    {
      "token" : "sl00pm.sii24",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "po10",
      "start_offset" : 13,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

whereas

GET _analyze
{
  "analyzer" : "standard",
  "text" : "slzq85.sii24.po10"
}

// results in

{
  "tokens" : [
    {
      "token" : "slzq85",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "sii24",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "po10",
      "start_offset" : 13,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

Yes I have the same results here :

sl00pm.sii24 gets analyzed as a single token whereas slzq85.sii24 gets analyzed as two separate tokens.

Do you have any idea why?

I have posted my question in other blogs and I got a logical answer :

" The reason for that behavior is where the analyzer breaks words. The standard analyzer breaks words according to the rules laid out in UAX #29. Rules WB6 and WB11, in particular, are the ones to take note of here.

Basically, it will not break on a letters with a '.' in the middle (ex: "ab.cd"), or on numbers with a '.' in the middle (ex: "12.34"), but it will break on numbers and letters separated by a '.' (ex: "12.cd").

So in your index, "sl00pm.soo85" is indexed as a single token, but "slzq85.soo85" is separated into two tokens: "slz85" and "soo85".

The standard analyzer is designed to work best on text. Words and sentences. For an identifier like you are looking at, you might try a different analyzer, perhaps PatternAnalyzer."

Thank you very much Lukas for your answers

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.