I'm very new to elasticsearch and running as 3 node cluster using docker. I have a large amount of data around 250 million records which is currently stored between xml files and a mysql database and found that elasticsearch would be create to use for my needs.
So i began following some video tutorials online, and manage to create/index 1,000,0000 fine, this was relatively quick too, using the localhost:9200 end point. I played around with a few queries and everything all ok. So i began to index (not sure if right term) a few more million records with the plan to do some more testing. However once i had around 1.6million documents indexed it seem to grind to a halt. Indexing a single document was/is taking around 7 seconds to do, where as before i could send 100 a second quiet easy. But the reading/queries seems fine.
Im using raid5 with 3x8TB drives, binding the volume data to the raid drive instead of the default /var/lib/docker folder.
I have done some reading but not 100% sure what i should be looking for.
the documents range from 5-16kb in size, they are an xml document and im just storing them as a string at the moment, as they are currently files on my hard drive.
Im using a custom id for example "/customer/listing/182939403" the ID is represented in the document too.
Im not using the bulk api at the moment, i wrote a crude script which looped through my existing data set, which was writing/indexing 157 per second on average. As soon as i noticed it slowed to a grinding halt i stop the script. I then went back to the Kibana Dev tools and indexed a single test document which just had a small amount of text. Its at this point i realised it was the indexing of the document which has slowed down massively. At first i wondered if it was my raid5 or something. But when i did a search query in kibana it was instant no lag at all.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.