Inconsistent results when searching/deleting documents containing quotes

ES question

in my document i have ....

"message": "10.42.224.236 - 26/May/2022:06:15:58 +0000 "GET /index.php" 200"

and i would like to match from GET to 200.
If i query with

GET /index_prod*/_search
{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "multi_match": {
            "type": "phrase",
            "query": "\"GET /index.php\" 200",
            "lenient": true
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  },
  "sort" : [
    { "@timestamp" : "desc" }
  ]
}

I get matches

hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },

and i am sure there are more than 10k because in Kibana i see many more but if i try to delete

POST /index_prod*/_delete_by_query
{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "multi_match": {
            "type": "phrase",
            "query": "\"GET /index.php\" 200",
            "lenient": true
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

I get

  "total" : 0,
  "deleted" : 0,

I have also trying to put the multi_match part inside the must part but that makes no difference. What am i doing wrong?

The _search query still returns the over 10000 docs??
Don't you execute the _delete_by_query twice to get 0 result?

@Tomo_M no i do not run the delete twice. The delete just does not work and it deletes nothing

Hmm. It worked for me.
I have no idea about what is the difference...

PUT test_delete

POST test_delete/_doc/
{
  "message": "10.42.224.236 - 26/May/2022:06:15:58 +0000 \"GET /index.php\" 200"
}

GET test_delet*/_search
{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "multi_match": {
            "type": "phrase",
            "query": "\"GET /index.php\" 200",
            "lenient": true
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

POST test_delet*/_delete_by_query
{
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "multi_match": {
            "type": "phrase",
            "query": "\"GET /index.php\" 200",
            "lenient": true
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

GET test_delete/_search
{
  "query":{
    "match_all":{}
  }
}

I dont know what to say other than it doesnt here and deletes are not working at all.

My version of ES is

{
name: "71a.....cd77",
cluster_name: "50994......ses",
cluster_uuid: "o2yzk......eRvToA",
version: {
number: "7.10.2",
build_flavor: "oss",
build_type: "tar",
build_hash: "unknown",
build_date: "2022-02-10T09:41:23.620550Z",
build_snapshot: false,
lucene_version: "8.7.0",
minimum_wire_compatibility_version: "6.8.0",
minimum_index_compatibility_version: "6.0.0-beta1"
},
tagline: "You Know, for Search"
}

It is an instance on AWS. Would this behave differently from yours?

Could you share the whole queries (_search and _deleet_by_query) and their responses?

I tried in on-premise environment of 7.16.

I think i got somewhere by playing around. If i run the delete queries with the * in the name of the index it does not run them or finds matches

I have indexes named as

/vendor_myapp_prod-filebeat-7.14.0-2022.05
/vendor_myapp_prod-filebeat-7.14.0-2022.04
etc

i can share the queries but at this point that i spotted it working when the full index name is used I am wondering if there is something in configuration that does not allow it to delete on a index name with wildcards. _search works with wildcards

Sounds great to hear you find that. As document, <target> of _delete_by_query sould supports wildcards. The behavior is strange and maybe beyond me. I suppose you may organize the situation and report it as a bug.

1 Like

I tried this sequence of commands and for the delete works in this case, so I am not really sure why it would not in my actual live indexes.

POST test_delete-filebeat-7.14.0-2022.05/_doc/
{
  "message": "10.42.224.236 - 26/May/2022:06:15:58 +0000 \"GET /index.php\" 200",
  "@timestamp" : "2022-05-30T12:56:07.985Z"
}

POST test_delete-filebeat-7.14.0-2022.04/_doc/
{
  "message": "10.42.224.236 - 26/May/2022:06:15:58 +0000 \"GET /index.php\" 200",
  "@timestamp" : "2022-04-30T12:56:07.985Z"
}


GET /test_delete*/_search
{
  "query": {
    "bool": {
      "must": [{
          "multi_match": {
            "type": "phrase",
            "query": "\"GET /index.php\" 200",
            "lenient": true
          }
        }],
      "filter": [
        
      ],
      "should": [],
      "must_not": []
    }
  },
  "sort" : [
    { "@timestamp" : "desc" }
  ]
}


POST /test_delete*/_delete_by_query
{
  "query": {
    "bool": {
      "must": [{
          "multi_match": {
            "type": "phrase",
            "query": "\"GET /index.php\" 200",
            "lenient": true
          }
        }],
      "filter": [ ],
      "should": [],
      "must_not": []
    }
  }
}

I actually added the first record twice so the result of the delete is

{
  "took" : 28,
  "timed_out" : false,
  "total" : 3,
  "deleted" : 3,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.