How to search for a first occurrence of a term

caub · September 21, 2018, 9:50am

I'd like to search in the last 5 minutes, for values in the err_msg field that occured for the first time ever. And repeat this search every 5 minutes, so it should be as efficient as possible

I wonder how to shape this in one query

So far I've been doing:

GET /filebeat*/_search?size=0
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "stream": "stderr"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-5m"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "errors": {
      "terms": {
        "field": "err_msg",
        "size": 10
      }
    }
  }
}

followed by multiple queries for each err_msg in the response, then keeping only the err_msg with no hits

GET /filebeat*/_search?size=0
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "stream": "stderr"
          }
        },
        {
          "match": {
            "err_msg": err_msg
          }
        },
        {
          "range": {
            "@timestamp": {
              "lt": "now-1d"
            }
          }
        }
      ]
    }
  }
}

It feels like it could be in one query, that's why I'm asking for a bit of help

I don't think it has to be an aggregation, a search could work, but I don't know how, maybe as a scripted search?

Mark_Harwood · September 21, 2018, 11:44am

In a cluster with time based indices and lots of potential error types this will be hard. A “new” index will not have visibility of the content in old indices and vice versa

caub · September 21, 2018, 11:49am

err_msg is a keyword, and it is only the first 160 chars of the original error message (.slice(0, 160)) . After having ran a stack for more than a month, I got less than 20 different err_msg with that query:

GET /filebeat*/_search?size=0
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "stream": "stderr"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "now-300d"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "errors": {
      "terms": {
        "field": "err_msg"
      }
    }
  }
}

Mark_Harwood · September 21, 2018, 12:43pm

In which case something like this might work. This is finding the first uses of tags on StackOverflow (note there are thousands of tags so I limit them in this example using the include param)

GET so/_search
{
  "size": 0,
  "aggs": {
	"tag": {
	  "terms": {
		"field": "tag",
		"include": [
		  "logstash",
		  "java",
		  "kibana"
		],
		"order": {
		  "firstSeen": "asc"
		}
	  },
	  "aggs": {
		"firstSeen": {
		  "min": {
			"field": "creationDate"
		  }
		}
	  }
	}
  }
}

Your client would have to do the work to filter out the dates > 5 minutes ago but the bulk of the heavy lifting is done in this request.

caub · September 21, 2018, 6:19pm

Thanks your suggestion works

GET /filebeat*/_search?size=0
{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "stream": "stderr"
          }
        }
      ]
    }
  },
  "aggs": {
    "errors": {
      "terms": {
        "field": "err_msg",
        "order": {
          "firstSeen": "asc"
        }
      },
      "aggs": {
        "firstSeen": {
          "min": {
            "field": "@timestamp"
          }
        }
      }
    }
  }
}

I was still wondering if we could rather have a "2-level" query, like what I posted originally, but written in one query. Where the first level queries very recent errors in the last 5m, then the second level, will query for a possible second match for these error, before now-5m. Because this way seems more scalable, I think, since most of the time, there are no errors in the last 5m, and even the second level search can be efficient

system · October 19, 2018, 6:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES query to find the earliest occurrence of message for each host Elasticsearch	4	7347	July 5, 2017
Watchers: querying for multiple strings in timeframe Elasticsearch elastic-stack-alerting	2	990	August 20, 2020
Querying log data to get unique error messages Elasticsearch	1	701	July 24, 2020
Add filter in query Elasticsearch	4	651	July 6, 2017
Problem for using regexp in DSL Elasticsearch	2	314	September 10, 2020

How to search for a first occurrence of a term

Related topics