Search/count for all documents for which field X contains a specific substring, with hyphens

Hi all,

I'm trying to count the number of documents which fields contain a specific substring containing hyphens.

The documents were loaded using logstash to logstash-x indices, they are basically archived syslog messages.
The indices were created using the bundled logstash elasticsearch template.
More information on my index structure can be found here: https://discuss.elastic.co/t/unable-to-query-keyword-fields-from-an-index-created-using-the-default-logstash-template/84185/3?u=guillaumen

Here are some example of syslog_message fields values contained within my indices:

823475 bd [local7.notice] 1043983: Jan 3 00:02:01.748 EST: %MAB-5-SUCCESS: Authentication successful for client (1111111) on Interface Fa1/0/39 AuditSessionID 0A85450A00001499671B43F1
1121685 bd [local7.notice] 2083133: Jan 3 00:02:01.009 EST: %AUTHMGR-5-START: Starting 'dot1x' for client (2222222) on Interface Fa1/0/4 AuditSessionID 0AAD13FE0000462E415DA56D
912046 bd [local7.notice] 1236504: Jan 3 00:02:02.122 EST: %AUTHMGR-5-START: Starting 'mab' for client (33333333) on Interface Fa1/0/11 AuditSessionID 0A85350A0000966810D8F206

I would like to be able to search for all the documents which syslog_message field contains the exact substring "-5-", for instance those 3 above should be found.

Any idea on how I could achieve this?

Since your field is mapped with a keyword field, the path of least resistance would be a simple wildcard. Here's an example:

PUT syslog
{
  "mappings": {
    "data": {
      "properties": {
        "syslog_message": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

POST syslog/data
{
  "syslog_message": "823475 bd [local7.notice] 1043983: Jan 3 00:02:01.748 EST: %MAB-5-SUCCESS: Authentication successful for client (1111111) on Interface Fa1/0/39 AuditSessionID 0A85450A00001499671B43F1"
}
POST syslog/data
{
  "syslog_message": "1121685 bd [local7.notice] 2083133: Jan 3 00:02:01.009 EST: %AUTHMGR-5-START: Starting 'dot1x' for client (2222222) on Interface Fa1/0/4"
}
POST syslog/data
{
  "syslog_message": "AuditSessionID 0AAD13FE0000462E415DA56D912046 bd [local7.notice] 1236504: Jan 3 00:02:02.122 EST: %AUTHMGR-5-START: Starting 'mab' for client (33333333) on Interface Fa1/0/11 AuditSessionID 0A85350A0000966810D8F206"
}

GET /syslog/data/_search
{
  "query": {
    "wildcard": {
      "syslog_message.keyword": {
        "value": "*-5-*"
      }
    }
  }
}

It's not the fastest query in the world since that leading wildcard is going to be expensive, but it'll get the job done without having to reindex anything.

2 Likes

Thanks a lot Zachary!
It works perfectly :grinning:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.