Unable to query .keyword fields from an index created using the default logstash template


(GuillaumeN) #1

Hi all,

Using an ELK v5.3 stack, I'm trying to load and search syslogs from archived files.
I'm using a simple logstash mapping file and my logs all to to a logstash-YYY.MM.DD index. This index uses the default elasticsearch template bundled with logstash.

I need to run not_analyzed queries on these logs once they are loaded, so I'm trying to use the .keyword fields that were created with the logstash template. They are properly created for any string type field.

My issue is these fields are always empty, which prevents me from running any search against their content.
I'm guessing my current logstash config file (adapted from logstash documentation) prevents those fields from being populated but I don't know how I can make them work.

Do you have any suggestion about this issue?

My current logstash config file is as follows:

input {
file {
path => "/data/allinone.log-201701*"
exclude => "*.bz2"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{GREEDYDATA:syslog_message}" }
}
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss", "MMM d HH:mm:ss" ]
}
}

output {
elasticsearch {
hosts => ["10.199.151.182:9200"]
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}

Thanks a bunch!


.keyword empty
(Aaron Mildenstein) #2

Do you know that the template is there? Did you upgrade from an older version of Elasticsearch?

Please paste the output of:

curl -XGET http://10.199.151.182:9200/_template/logstash?pretty

between triple-backticks:

```
PASTE HERE
```

(GuillaumeN) #3

Hi Aaron and thanks for your answer.

This is a fresh install on a RHEL 7.
The template file is located there:
/usr/share/logstash/vendor/bundle/jruby/1.9/gems/logstash-output-elasticsearch-6.2.6-java/lib/logstash/outputs/elasticsearch/elasticsearch-template-es5x.json

Here is the ouput of the command:

[root@elk-1 ~]# curl -XGET http://10.199.151.182:9200/_template/logstash?pretty
{
  "logstash" : {
    "order" : 0,
    "version" : 50001,
    "template" : "logstash-*",
    "settings" : {
      "index" : {
        "refresh_interval" : "5s"
      }
    },
    "mappings" : {
      "_default_" : {
        "dynamic_templates" : [
          {
            "message_field" : {
              "path_match" : "message",
              "mapping" : {
                "norms" : false,
                "type" : "text"
              },
              "match_mapping_type" : "string"
            }
          },
          {
            "string_fields" : {
              "mapping" : {
                "norms" : false,
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                }
              },
              "match_mapping_type" : "string",
              "match" : "*"
            }
          }
        ],
        "_all" : {
          "norms" : false,
          "enabled" : true
        },
        "properties" : {
          "@timestamp" : {
            "include_in_all" : false,
            "type" : "date"
          },
          "geoip" : {
            "dynamic" : true,
            "properties" : {
              "ip" : {
                "type" : "ip"
              },
              "latitude" : {
                "type" : "half_float"
              },
              "location" : {
                "type" : "geo_point"
              },
              "longitude" : {
                "type" : "half_float"
              }
            }
          },
          "@version" : {
            "include_in_all" : false,
            "type" : "keyword"
          }
        }
      }
    },
    "aliases" : { }
  }
}

Also, here is the structure of the resulting index:

[root@elk-1 ~]# curl -XGET http://10.199.151.182:9200/logstash*?pretty
{
  "logstash-2017.01.03" : {
    "aliases" : { },
    "mappings" : {
      "_default_" : {
        "_all" : {
          "enabled" : true,
          "norms" : false
        },
        "dynamic_templates" : [
          {
            "message_field" : {
              "path_match" : "message",
              "match_mapping_type" : "string",
              "mapping" : {
                "norms" : false,
                "type" : "text"
              }
            }
          },
          {
            "string_fields" : {
              "match" : "*",
              "match_mapping_type" : "string",
              "mapping" : {
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                },
                "norms" : false,
                "type" : "text"
              }
            }
          }
        ],
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "include_in_all" : false
          },
          "@version" : {
            "type" : "keyword",
            "include_in_all" : false
          },
          "geoip" : {
            "dynamic" : "true",
            "properties" : {
              "ip" : {
                "type" : "ip"
              },
              "latitude" : {
                "type" : "half_float"
              },
              "location" : {
                "type" : "geo_point"
              },
              "longitude" : {
                "type" : "half_float"
              }
            }
          }
        }
      },
      "logs" : {
        "_all" : {
          "enabled" : true,
          "norms" : false
        },
        "dynamic_templates" : [
          {
            "message_field" : {
              "path_match" : "message",
              "match_mapping_type" : "string",
              "mapping" : {
                "norms" : false,
                "type" : "text"
              }
            }
          },
          {
            "string_fields" : {
              "match" : "*",
              "match_mapping_type" : "string",
              "mapping" : {
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                },
                "norms" : false,
                "type" : "text"
              }
            }
          }
        ],
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "include_in_all" : false
          },
          "@version" : {
            "type" : "keyword",
            "include_in_all" : false
          },
          "geoip" : {
            "dynamic" : "true",
            "properties" : {
              "ip" : {
                "type" : "ip"
              },
              "latitude" : {
                "type" : "half_float"
              },
              "location" : {
                "type" : "geo_point"
              },
              "longitude" : {
                "type" : "half_float"
              }
            }
          },
          "host" : {
            "type" : "text",
            "norms" : false,
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              }
            }
          },
          "message" : {
            "type" : "text",
            "norms" : false
          },
          "path" : {
            "type" : "text",
            "norms" : false,
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              }
            }
          },
          "syslog_hostname" : {
            "type" : "text",
            "norms" : false,
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              }
            }
          },
          "syslog_message" : {
            "type" : "text",
            "norms" : false,
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              }
            }
          },
          "syslog_timestamp" : {
            "type" : "text",
            "norms" : false,
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              }
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "refresh_interval" : "5s",
        "number_of_shards" : "5",
        "provided_name" : "logstash-2017.01.03",
        "creation_date" : "1493676873961",
        "number_of_replicas" : "1",
        "uuid" : "IwH5i6wRRa2tlKJZqyhAYg",
        "version" : {
          "created" : "5030099"
        }
      }
    }
  }
}

Search/count for all documents for which field X contains a specific substring, with hyphens
(Aaron Mildenstein) #4

Thanks for following up on the mapping as well. Can you show me the output of a query where the .keyword is empty?


(Aaron Mildenstein) #5

I ask, because at issue is the fact that your _source will never contain a .keyword field.

https://www.elastic.co/guide/en/elasticsearch/reference/5.3/keyword.html#keyword

A field to index structured content such as email addresses, hostnames, status codes, zip codes or tags.

They are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value.


(GuillaumeN) #6

Thanks for the link Aaron.

After doing some research in the documentation, here is the adapted version of a sample query I use and it's result:

[root@elk-1 ~]# curl localhost:9200/logstash*/logs/_search?pretty -d'
> {
>     "query": {
>         "constant_score" : {
>             "filter" : {
>                 "term" : { "syslog_message.keyword" : "-5-" }
>             }
>         }
>     }
> }'
{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

I've also tried using syslog_message, same result.

I'll update the thread title as this is more a query issue than empty fields.


(Aaron Mildenstein) #7

So that's the entire value of the field syslog_message? It has to be an exact match. Can you show me an example of one of the full events?


(GuillaumeN) #8

Oh! Now I get it.
No, it's not the entire field value, it's a substring.
Sorry for misleading you.

So, back to square one I guess, how can I search for all syslog_message containing the exact substring "-5-" ?


(Aaron Mildenstein) #9

In order to do that, the field syslog_message has to be type text, so it is analyzed. However, hyphens are likely to be stop words, so they're excluded from the message. You might need to do a custom analyzer in order to search for -5- in text.

Since this is more properly a search question now, you should probably ask for that specifically in the Elasticsearch forum as a different thread.


(GuillaumeN) #10

Thanks a lot for your help Aaron, I'll head over to the ES forum.

Have a great one!


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.