we're using the Elastic Stack to store server logs. Currently, the field message is mapped as text field. Today, I tried to search for "oom-kill", but I couldn't get a search to work that matched exacly that. It only worked searching for either oom or kill, the - wasn't searchable. I guess that's because the text field is analyzed before indexing. So, I searched for a recommendation on how to map the message field. It seems like there are two, either "match_only_text" or "wildcard". Which of these two do I need, so I can actually find stuff like "oom-kill" e.g. with a query like oom*kill?
Hi! This article can help explain when to pick which kind of mapping within the ones you're wondering about.
TLDRuse wildcard or keyword; use wildcard queries, just searching "oom-kill" will work if that's the full text of the filed and "oom-kill" will work if the log message is longer and you want to find a part within it.
With both text and match_only_text fields you will always get both "oom kill" and "oomkill" when you search for those words due to the indexing policy applied. So these would not apply to your case unless you want to look at configuring the analyzer for your needs. See here
To make sure that the "-" is also taken into account for a precise result you can use the keyword or wildcard types. Then searching for "oom-kill" will not return message "oom kill", and searching for "oom kill" will not return "oom-kill".
If the "oom-kill" part is however found in the middle of a longer log, you do still have to use the format with *. So for example if the log is "this log has oom-kill in it" you can search for "oom-kill".
I did some quick experimenting for you that might help visualize the cases:
PUT field-type-test
{
"mappings": {
"_meta": {
"created_by": "Iulia Feroli"
},
"properties": {
"message1": {
"type": "text"
},
"message2": {
"type": "match_only_text"
},
"message3" : {
"type" : "wildcard"
},
"message4" : {
"type" : "keyword"
}
}
}
}
GET field-type-test/_search
POST field-type-test/_doc/1
{
"message1": "test for text-field search",
"message2": "test for match-field search",
"message3": "test for wild-card search",
"message4": "test for key-word search"
}
POST field-type-test/_doc/3
{
"message1": "test for text field search",
"message2": "test for match field search",
"message3": "test for wild card search",
"message4": "test for key word search"
}
GET field-type-test/_search
{
"query": {
"wildcard": {
"message4" : {
"value": "*key-word*"
}
}
}
}
GET field-type-test/_search
{
"query": {
"wildcard": {
"message3" : {
"value": "*wild-card*"
}
}
}
}
Those two searches at the end are the only ones who don't also bring up the results without the "-"in the sequence.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.