Delete_by_quey API deleted documents which didn't match search criteria


(Amruth) #1

Hi,

Here is the query I used,

POST twitter/_delete_by_query
{
  "query": { 
    "match": {
      "source_host": "server02.local"
    }
  }
}

But it deleted all the documents with source_host:server01.local,
source_host:server03.local.

I don't understand the logic here. I lost valuable information. Can someone please explain why it happened? Also is there anyway to recover deleted documents?

Thanks


(Christian Dahlqvist) #2

A match query is a full-text search query, and the string submitted will be analysed and matched against the field. If you instead want a query that exactly matches the string, you need to use a term query on a keyword field. Unless you have a backup to restore from, there is as far as I know no way to recover the deleted documents.


(Amruth) #3

Thanks Christian. Since I don't have backup of data I will need to reindex data. I have seen multiple approaches for reindexing. Can you please suggest the best process to reindex?

I have a single cluster with 3 data nodes. I am maintaining indexes on monthly basis i.e "index-%{+YYYY.MM}". So do I need to reindex for all the indices? Please suggest me the process for my scenario.

Thanks


(Christian Dahlqvist) #4

If you no longer have the data in Elasticsearch you will need to reindex from an external source. Where are you looking to reindex from?


(Amruth) #5

I will need to reindex from sql server. May I know the reason why I got field conflict issue? There is a field called "duration" in SQL server which I am ingesting to ES monthly index. For september month it was of type float and for october it is of type integer(that's what I see in Kibana conflict between the types integer and float for the same field). Why is it this way?


(Christian Dahlqvist) #6

If you are relying on dynamic mappings instead of explicitly defining them through an index template, Elasticsearch will determine the type based on the first time it sees the field, which is why it may differ between indices.


(Amruth) #7

I didn't understand this. I am using dynamic indexing(index-%{+YYYY.MM}). I don't understand what you meant by dynamic mappings.


(Christian Dahlqvist) #8

Elasticsearch uses dynamic mappings to determine how to index a field if you have not provided explicit mappings, e.g through an index template. It does this based on the value it first sees. You can tell the Logstash Elasticsearch output plugin to use a specific template through the template parameter.


(Amruth) #9

Hi Christian,

I have gone through Index template topic. But my question is how do I define all the fields which existed in index-2017.09 should have the same type in index-2017.10. If I have 50 fields in my document, do I need to declare data type for all 50? Can you please explain me?

Thanks


(Christian Dahlqvist) #10

You will need to specify the mapping for each field separately. You can however get the current mapping for the index you want to use as a template and create an index template based on this.


(Amruth) #11

So, I queried GET index-2017.10/_mapping and used this result to paste into PUT _template/sample. Now I understand what mapping to use all over indices.

But for reindexing, how do I copy data from all monthly indices to new indices. Also, is it possible to use same index names? Because it's at organizational level and if I change Index name it doesn't sound good. Please suggest me a way. (Assume all my data is in ES)


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.