Slowness in Elasticsearch DELETE and UNLOAD operation

Neuron_Ring · January 4, 2017, 7:55pm

Hi,,

I have an Elastic cluster with two nodes and about 53 indices. There are two large indices one with 1.5M [ size: 93.93MB] documents and the other one has 2.3M [Size 354.61MB] documents in it. Not really huge size..
Not seeing any issues with search or _bulk PUT operation.

UNLOAD

But when i take a back up of the index using elasticdump it is insanely slow. Takes about 6 hours to complete the unloading of documents using elasticdump.

Initially i thought it was because of the write operation. The index receives new documents every 2 mins. So the unload operation never finishes. it keeps unloading every document as the new ones are coming in..

Then, i stopped the scripts which POSTs the new docs. Still it is slow, the Unload takes hours and hours to complete.

DELETE

In addition to unload, i am doing a scheduled maintenance on my indexes everyday. Deleting [using delete_by_query API] all the documents older than 60 days. The indexes with smaller size complete in few secs upto to few mins. But the big ones with 1.5M docs in it, is dead slow, took about 4 hours and still running.

Am i doing something wrong here? Is it recommended to store millions of documents in one index..
Please advice. It is taking a greater hit on the performance.

warkolm · January 4, 2017, 8:05pm

If you are using time based data, use time based indices, then you just delete any indices older than 60 days.

I can't comment on elasticdump though.

Neuron_Ring · January 4, 2017, 8:09pm

My index names are not time based. I have one index which has all the data, So i am purging the documents based on "@timestamp" .

like this.:

range" : {
"timestamp" : {
"lt" : "now-60d"
}

warkolm · January 4, 2017, 8:10pm

And that's why I am suggesting you change things.

Neuron_Ring · January 4, 2017, 8:12pm

So is it an expected behavior of elasticsearch ? Cant it handle millions of documents in one index. ?

warkolm · January 4, 2017, 8:16pm

It can, but DBQ is highly inefficient and there are better ways to logically separate your data to get the same functionality.

Neuron_Ring · January 4, 2017, 9:10pm

Okay. Thank you. I like the idea of having the indexes by date thus limiting the data in each index. But the problem is , if i use BULK API to create indexes dynamically, the "string" fields in the mapping gets created as "Analyzed" by default. That will result in creating more storage space allocated for the analysis.
So , as initial setup, i create and index and mappings specifying the "Not_analyzed" and then send the data to it,
I dont want any of my "terms" in my mapping to be analyzed. How do i created one index per day having all the fields as "not_analyzed"

warkolm · January 5, 2017, 12:13am

So use a template that matches your index pattern - Index Templates | Elasticsearch Guide [5.1] | Elastic

Neuron_Ring · January 5, 2017, 10:56pm

OK. Thank You ! I would rather like to use the _rollover api in ES 5.0

warkolm · January 5, 2017, 11:07pm

That a great idea!

system · February 2, 2017, 11:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best way to delete large number of documents Elasticsearch	4	4702	September 28, 2020
Lots of deleted documents above 40% Elasticsearch	41	6178	September 26, 2017
Elasticsearch 1.7.5.1: delete document has no impact on the index size Elasticsearch	3	537	April 3, 2017
Delete document from elasticsearch Elasticsearch	4	338	February 9, 2021
Delete by query deletes only 1000 documents, then quits Elasticsearch	8	392	February 5, 2024

Slowness in Elasticsearch DELETE and UNLOAD operation

Related topics