Slowness in Elasticsearch DELETE and UNLOAD operation


(Neuron Ring) #1

Hi,,

I have an Elastic cluster with two nodes and about 53 indices. There are two large indices one with 1.5M [ size: 93.93MB] documents and the other one has 2.3M [Size 354.61MB] documents in it. Not really huge size..
Not seeing any issues with search or _bulk PUT operation.

UNLOAD

But when i take a back up of the index using elasticdump it is insanely slow. Takes about 6 hours to complete the unloading of documents using elasticdump.

Initially i thought it was because of the write operation. The index receives new documents every 2 mins. So the unload operation never finishes. it keeps unloading every document as the new ones are coming in..

Then, i stopped the scripts which POSTs the new docs. Still it is slow, the Unload takes hours and hours to complete.

DELETE

In addition to unload, i am doing a scheduled maintenance on my indexes everyday. Deleting [using delete_by_query API] all the documents older than 60 days. The indexes with smaller size complete in few secs upto to few mins. But the big ones with 1.5M docs in it, is dead slow, took about 4 hours and still running.

Am i doing something wrong here? Is it recommended to store millions of documents in one index..
Please advice. It is taking a greater hit on the performance.


(Mark Walkom) #2

If you are using time based data, use time based indices, then you just delete any indices older than 60 days.

I can't comment on elasticdump though.


(Neuron Ring) #3

My index names are not time based. I have one index which has all the data, So i am purging the documents based on "@timestamp" .

like this.:

range" : {
"timestamp" : {
"lt" : "now-60d"
}


(Mark Walkom) #4

And that's why I am suggesting you change things.


(Neuron Ring) #5

So is it an expected behavior of elasticsearch ? Cant it handle millions of documents in one index. ?


(Mark Walkom) #6

It can, but DBQ is highly inefficient and there are better ways to logically separate your data to get the same functionality.


(Neuron Ring) #7

Okay. Thank you. I like the idea of having the indexes by date thus limiting the data in each index. But the problem is , if i use BULK API to create indexes dynamically, the "string" fields in the mapping gets created as "Analyzed" by default. That will result in creating more storage space allocated for the analysis.
So , as initial setup, i create and index and mappings specifying the "Not_analyzed" and then send the data to it,
I dont want any of my "terms" in my mapping to be analyzed. How do i created one index per day having all the fields as "not_analyzed"


(Mark Walkom) #8

So use a template that matches your index pattern - https://www.elastic.co/guide/en/elasticsearch/reference/5.1/indices-templates.html#indices-templates :slight_smile:


(Neuron Ring) #9

OK. Thank You ! I would rather like to use the _rollover api in ES 5.0


(Mark Walkom) #10

That a great idea! :smiley:


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.