Delete by query with plugin not work for me

Hello Guys,
I have the last ELK version, and I installed the plugin delete-by-query and my forwarder is Logstash.
I had read the documentation 2.3 in ELK site.

First
TO check if my struct is the same as example Twitter, I found one server by ID:
curl -XGET 'http://localhost:9200/logstash-2016.02.23/Nxlogs/AVMPTRrp28MHpUptVm7C?pretty'

This respond (short version):

   {
"_index" : "logstash-2016.02.23",
  "_type" : "Nxlogs",
  "_id" : "AVMPTRrp28MHpUptVm7C",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "EventReceivedTime" : "2016-02-23 13:03:42",
    "SourceModuleName" : "iis",
    "SourceModuleType" : "im_file",
    "date" : "2016-02-23",
    "time" : "17:30:49",
     "hostname" : "server1.xxx.com"
}

Second:
Now, when I tried delete by hostname:
curl -XDELETE 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_query?q=hostname:server1.xxx.com

Respond:
{"found":false,"_index":"logstash-2016.02.23","_type":"Nxlogs","_id":"_query","_version":1,"_shards":{"total":2,"successful":2,"failed":0}}

Found : false

I tried with other some ways with the same result:
curl -XDELETE 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_query' -d '{ "term" : { "hostname" : "server1.xxx.com" }}

curl -XDELETE 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_query' -d '{ "query": { "term" : { "hostname" : "server1.xxx.com" } }}'

curl -XDELETE 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_query?pretty' -d '{ "query": { "query_string" : { "default_field": "hostname","query": "server1.xxx.com" }}}}'

Someone has ideas?.

Thank you.

Hi Juan,

that's interesting. I'd first look at the mapping for this index (for starters: I'd expect "hostname" to be a not_analyzed field).

I don't think this is a problem with the delete by query plugin. I'd guess that you also cannot find the document with a plain term query.

Daniel

Hello Daniel
Thank you for you answer.
I looked the index config in Kibana and the field is analized

Can I check my index with other tool?. Or any other plugin to check this?.

Thank you.

Hi Juan,

sure, the easiest way is to use the Mapping API of Elasticsearch. This should return the mapping for your type in this specific index:

curl http://localhost:9200/logstash-2016.02.23/_mapping/Nxlogs?pretty

I understand why the term query does not return a result as it only really works as intended for not_analyzed fields. However, the query string query should match. Can you try just to issue the query?

curl 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_search?pretty' -d 
'{
    "query": {
        "query_string": {
           "default_field": "hostname",
           "query": "server1.xxx.com"
        }
        
    }
}'

This should actually return some results.

Changing the mapping

I suggest also you change your mapping so all string fields that you want store exactly as they are not_analyzed (see mapping docs).

As you use logstash, you have to change your mapping template and the mapping of all existing indices. To apply the mapping to all existing indices use the multi-index syntax so you just have issue a single request and not one for each index.

After all mappings have been corrected you need to reindex your existing data (as the mapping affects only newly indexed data). As you are on Elasticsearch 2.3 you can use the reindex API for that. As all these operations write to your index I suggest you try this first on a test / playground system and if you want to be sure also create a backup of your data.

Finally, you should be able to find the server in question using a term query:

curl 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_search' -d
'{
    "query": {
        "term": {
           "hostname": {
              "value": "server1.xxx.com"
           }
        }
    }
}'

If you find the server using this query, you should also be able to delete the data with the delete by query plugin.

Daniel

Hello Daniel,
Thank you for answer me again.

The first curl command returned the following (only show you the hostname):

"hostname" : {
            "type" : "string",
            "norms" : {
              "enabled" : false
            },
            "fielddata" : {
              "format" : "disabled"
            },
            "fields" : {
              "raw" : {
                "type" : "string",
                "index" : "not_analyzed",
                "ignore_above" : 256
              }
            }
          },

The second curl command search, return results about the host, for example:

 "_index" : "logstash-2016.02.23",
      "_type" : "Nxlogs",
      "_id" : "AVNEFBk3zTqQ6j3gBtmd",
      "_score" : 3.0796967,
      "_source" : {
        "EventReceivedTime" : "2016-02-23 19:01:21",
        "SourceModuleName" : "iis",
        "SourceModuleType" : "im_file",
        "date" : "2016-02-23",
        "time" : "00:01:19",
        "EventTime" : "2016-02-23T00:01:19Z",
        "SourceName" : "IIS",
        "hostname" : "server1.xxx.com",
        "type" : "Nxlogs",
        "tags" : [ "windows", "logs" ],
      }
    } ]

I'm going to change the field index with not analyzed by analized, but I have the same problem in the new index. I need fix it forever, how can I change this field forever? in logstash? or by my client NXlog?.

Thank you.

Hi Juan,

that mapping is good news. The "hostname" field has a "raw" subfield that is not analyzed. This means you don't have to change anything in your mapping.

You should be able to delete the relevant entries with the following statement:

curl -XDELETE 'http://localhost:9200/logstash-2016.02.23/Nxlogs/_query?pretty' -d
'{
    "query": {
        "term": {
           "hostname.raw": {
              "value": "server1.xxx.com"
           }
        }
    }
}'

I've create a minimal example that you can try out in Sense and that works for me.

PUT /logs
{
   "mappings": {
      "nxlog": {
         "properties": {
            "hostname": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed",
                     "ignore_above": 256
                  }
               }
            }
         }
      }
   }
}


POST /logs/nxlog/1
{
   "hostname": "server1.xxx.com"
}

POST /logs/nxlog/2
{
   "hostname": "server2.xxx.com"
}

DELETE /logs/nxlog/_query
{
    "query": {
        "term": {
           "hostname.raw": {
              "value": "server1.xxx.com"
           }
        }
    }
}

The last request produces:

{
   "took": 0,
   "timed_out": false,
   "_indices": {
      "_all": {
         "found": 1,
         "deleted": 1,
         "missing": 0,
         "failed": 0
      },
      "logs": {
         "found": 1,
         "deleted": 1,
         "missing": 0,
         "failed": 0
      }
   },
   "failures": []
}

And indeed the data for server1.xxx.com are gone. If we run a match_all query, we get:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 1,
      "hits": [
         {
            "_index": "logs",
            "_type": "nxlog",
            "_id": "2",
            "_score": 1,
            "_source": {
               "hostname": "server2.xxx.com"
            }
         }
      ]
   }
}

Daniel

Hello Daniel,
It didn't work.
I created a new index with your example:

The mapping for the new index:

{
  "logs": {
    "mappings": {
      "nxlog": {
        "properties": {
          "hostname": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

I found 1 server:

GET logs/_validate/query
{
    "query": {
        "term": {
           "hostname.raw": {
              "value": "server1.xxx.com"
           }
        }
    }
}

The answer:

    {
      "valid": true,
      "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
      }
    }

I tried delete the server:

DELETE /logs/nxlog/_query
{
    "query": {
        "term": {
           "hostname.raw": {
              "value": "server1.xxx.com"
           }
        }
    }
}

and the answer:

{
  "found": false,
  "_index": "logs",
  "_type": "nxlog",
  "_id": "_query",
  "_version": 1,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  }
}

So I dont know what is the problem.
Maybe some issues in the config?.

Thank you.

Hi Juan,

can you please verify that you have the delete-by-query plugin installed? You can run:

GET /_cluster/stats

in Sense. Right at the bottom of the response you should see the installed plugins. If the delete-by-query plugin is not listed, it is not installed. Then you have to install it first via:

sudo bin/plugin install delete-by-query

Don't forget that you need to install the plugin on every node in the cluster and restart the nodes as the new plugin is only picked up after a restart.

Daniel

Hello Daniel
Yes I have installed as well the plugin, I tried installed again:

ERROR: plugin directory /usr/share/elasticsearch/plugins/delete-by-query already exists. To update the plugin, uninstall it first using 'remove delete-by-query' command

I updated the ELK version:

Exception in thread "main" java.lang.IllegalArgumentException: Plugin [delete-by-query] is incompatible with Elasticsearch [2.3.1]. Was designed for version [2.2.0]

So I installed the new version 2.3.1. I uninstalled the plugin and I installed again, now everyting works as well.
I didn't know what happend but the plugin working as well, now I can delete by hostname.raw.

Thank you very much for you patience.

Hi Juan,

You're right: The plugin version always has to match your Elasticsearch version.

But I'm glad I could help and great that it finally worked out for you! :slight_smile:

Daniel