Elastic API query _grokparsefailure tags

Hello,
I'm currently trying to get all logs of an index which have a "_grokparsefailure" tag assigned to it. Next step would be deleting these logs, but I'm not sure if this is going to work anyways:

curl -X GET "http:-------------:PORT/_mget?pretty" -H 'Content-Type: application/json' -d'
{
  "docs": [
    {
      "_index": "system_syslog-2020.08.03",
      "_type": "_doc",
      "_source.tags": [
      "_grokparsefailure"
	]
    }
  ]
}
'

This doesn't work but it's more or less what I have been trying with.
Is _mget the correct approach? Or is get or msearch better? How can I get access to the "tags"?

How are you ingesting the data?

With Logstash you would do something like this in the filter.

if [tags] !~ "_grokparsefailure" {
           drop { }
}

Im searching for a way to query elasticsearch, not to change my logstash pipeline. Or am I missing something?

Didn't test this but I would try a delete by query.

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

POST /my-index-000001/_delete_by_query
{
  "query": {
    "match": {
      "_source.tags": "_grokparsefailure"
    }
  }
}

Something like that.

I read about _delete_by_query, this was my plan deleting the logs.
But is there a way I can get the logs and look at them before I'm going to delete them?
I'm thinking of a scenario in the future where I have to delete some specific logs and I want to look at them first. Ideally this query would be in the same format as the _delete_by_query.

Change _delete_by_query to _search and POST to GET until it produces the results you want to delete.

I tried with some like the following:

curl -X GET "http://some-host:some-port/some-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "_source.tags": "_grokparsefailure"
    }
  }
}
'

But I still get everything within the specified index, not only the matching ones the query should give me. I also tried with a custom/temporary index using the _delete_by_query and it deleted the whole index instead of only the matching ones.
Also the search output is only like 5 or 6 logs and not all. I think I'm missing something very essential here.

Would that work?

GET /some-index/_search
{
  "query": {
    "term": {
      "tags": "_grokparsefailure"
    }
  }
}

Or depending on your mapping:

GET /some-index/_search
{
  "query": {
    "term": {
      "tags.keyword": "_grokparsefailure"
    }
  }
}

Solved:
I had trouble with two issues:

  1. the whole temporary index got deleted and searched because i didn't know i had to escape '/' and the standard matching is an 'OR' instead of 'AND' so something like this will work on paths etc.
GET /some-index/_search
{
  "query": {
    "match": {
      "log.file.path": {
        "query": "\\/var\\/log\\/secure",
        "operator": "and"
      }
    }
  }
}
  1. my search didn't get me the correct numbers because they are simply limited, so i needed to use "track_total_hits": true, to get the correct value:
GET /some-index/_search
{
  "track_total_hits": true,
  "query": {
     "term": {
         "tags": "_grokparsefailure"
      }
   }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.