I am indexing a csv file on a Dell system(Ubuntu 16) with HDD hardisk and it takes 3 hours to index it. However I also indexed the same csv file on Macpro with SSD hardisk and it took 15 minutes to index the same. Is this the expected behaviour in performance with respect to SSD vs HDD. or Do the need to tune HDD machine further?
That probably depends on how you are indexing and what type of HDD it is. Are you letting Elasticsearch automatically assign document IDs? Do the machines have the same amount of CPU and RAM? Is there any other load, e.g. queries, while indexing is going on? How much data are you indexing and how many shards are you indexing into? Is the indexing process running on the same machine?
I assign the document IDs when I loop over the csv file(increment the ID by 1 each time over the loop) RAM is 8 GB in both and CPU is 4 core in both. The only difference I see between two machines is Harddisk (SSD vs HDD).
If you assign the document ID, each insert is really an update as Elasticsearch first have to check if the document already exists. I would expect an SSD to handle this better than a HDD, especially for larger indices/shards. You should probably see less of a difference if you tried a run allowing Elasticsearch to assign the IDs.
I removed the id parameter and I see Elasticsearch assigns ID itself something like _id": "3zMYHGQBFtlWsnaQ4ymB", However by this I don't see any performance boost.
Have you followed the advice given here? What type of HDD are you using?
I am looking how I can configure these tuning methods on ES.
Harddisk : I am using Dell Latitude 5480 laptop which has 1TB , Hybrid, OPAL SED options hard drive
The numbers you are seeing make sense. That hybrid hard drive will only make write bursts faster. Sustained writes cause the drive to bypass the SSD portion and you are basically left with a spinning disk... even worse a 5400 RPM disk.
With a 5400 RPM disk, you are looking at a 50-75 write IOPS max. Even the cheapest consumers SSDs will do over 10,000 sustained write IOPS, and the higher quality drives such as in the Macbook Pro can sustain over 20,000 write IOPS for longer periods.
So yes... the storage can have that large of an impact.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.