7 seconds to index document once i get close to 2million documents

(Neil Wattam) #1


I'm very new to elasticsearch and running as 3 node cluster using docker. I have a large amount of data around 250 million records which is currently stored between xml files and a mysql database and found that elasticsearch would be create to use for my needs.

So i began following some video tutorials online, and manage to create/index 1,000,0000 fine, this was relatively quick too, using the localhost:9200 end point. I played around with a few queries and everything all ok. So i began to index (not sure if right term) a few more million records with the plan to do some more testing. However once i had around 1.6million documents indexed it seem to grind to a halt. Indexing a single document was/is taking around 7 seconds to do, where as before i could send 100 a second quiet easy. But the reading/queries seems fine.

Im using raid5 with 3x8TB drives, binding the volume data to the raid drive instead of the default /var/lib/docker folder.

I have done some reading but not 100% sure what i should be looking for.

(Christian Dahlqvist) #2

What is the average size of your documents? Are you using the bulk API? Are you using custom document IDs?

(Neil Wattam) #3

the documents range from 5-16kb in size, they are an xml document and im just storing them as a string at the moment, as they are currently files on my hard drive.

Im using a custom id for example "/customer/listing/182939403" the ID is represented in the document too.

Im not using the bulk api at the moment, i wrote a crude script which looped through my existing data set, which was writing/indexing 157 per second on average. As soon as i noticed it slowed to a grinding halt i stop the script. I then went back to the Kibana Dev tools and indexed a single test document which just had a small amount of text. Its at this point i realised it was the indexing of the document which has slowed down massively. At first i wondered if it was my raid5 or something. But when i did a search query in kibana it was instant no lag at all.

(Christian Dahlqvist) #4

How large is the index? How many shards do you have?

(system) #5

