Bulk insertion is too slow using php

Hi,

I am creating the index using bulk insertion and mapping created before inserting the data. I have more than 20 lakhs of data with in one index and one type. After inserting the 2 of 3 lakhs of data bulk insertion becoming slow. For the first 1 lakhs of data insertion is much faster and after that the process is becoming slow.

Is there any tips for speed up the insertion.

Currently using elasticsearch-5.6.3 version.

Already did following this for speed up the process :

  1. Index id is not added in the insertion data.
  2. Set refresh_interval = -1
  3. Tested with bulk insertion with 2500 rows of data.
  4. All other setting as default

What is the specification of your Elasticsearch cluster?

What does CPU and disk I/O look like while indexing?

How many parallel processes do you have indexing into Elasticsearch?

I am using php sdk and I am running only one process for inserting the whole data. PHP is not supporting threading, thats why I am using the single process.

I am able to insert the whole data, but taking to much time for inserting. Also I am setting php memory as 512 MB throuogh my php code.

It is generally recommended to use several parallel workers, so it might be worth partitioning up the data and try using multiple PHP process. You can also use Logstash, which supports parallelism out of the box.

ok. But I have to fetch data from different tables and merge all those values. After that only I can insert to elasticsearch. Thats why I am not able to use logstash.

Now it is talking around 2 hours to insert 20 lakhs of data. Do you this this is very high for elasticsearch? As application prespective its too high.

In order to get better parallelism, I would recommend you try using Logstash with a JDBC input plugin to read the data. This will read data as quickly as possible and use multiple threads to process and write the data based on a single input.

The throughput you are quoting is very low, but I suspect it may primarily be due to the way you are retrieving and ingesting data and not necessarily a reflection of what Elasticsearch can handle.

Also please try to use units that are globally recognised. I believe lakhs is very particular to Asia.

ok. Thank you for the information. I will do a feasibility study between logstash and my application.

But the system is getting slower after inserting 500 thousand data. I am curious to know that is the data size will affect the insertion speed or not?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.