Bulk insertion is too slow using php

shemilr · November 21, 2017, 10:48am

Hi,

I am creating the index using bulk insertion and mapping created before inserting the data. I have more than 20 lakhs of data with in one index and one type. After inserting the 2 of 3 lakhs of data bulk insertion becoming slow. For the first 1 lakhs of data insertion is much faster and after that the process is becoming slow.

Is there any tips for speed up the insertion.

Currently using elasticsearch-5.6.3 version.

Already did following this for speed up the process :

Index id is not added in the insertion data.
Set refresh_interval = -1
Tested with bulk insertion with 2500 rows of data.
All other setting as default

Christian_Dahlqvist · November 21, 2017, 11:36am

What is the specification of your Elasticsearch cluster?

What does CPU and disk I/O look like while indexing?

How many parallel processes do you have indexing into Elasticsearch?

shemilr · November 21, 2017, 12:47pm

I am using php sdk and I am running only one process for inserting the whole data. PHP is not supporting threading, thats why I am using the single process.

I am able to insert the whole data, but taking to much time for inserting. Also I am setting php memory as 512 MB throuogh my php code.

Christian_Dahlqvist · November 21, 2017, 12:57pm

It is generally recommended to use several parallel workers, so it might be worth partitioning up the data and try using multiple PHP process. You can also use Logstash, which supports parallelism out of the box.

shemilr · November 21, 2017, 1:10pm

ok. But I have to fetch data from different tables and merge all those values. After that only I can insert to elasticsearch. Thats why I am not able to use logstash.

Now it is talking around 2 hours to insert 20 lakhs of data. Do you this this is very high for elasticsearch? As application prespective its too high.

Christian_Dahlqvist · November 21, 2017, 1:18pm

In order to get better parallelism, I would recommend you try using Logstash with a JDBC input plugin to read the data. This will read data as quickly as possible and use multiple threads to process and write the data based on a single input.

The throughput you are quoting is very low, but I suspect it may primarily be due to the way you are retrieving and ingesting data and not necessarily a reflection of what Elasticsearch can handle.

Also please try to use units that are globally recognised. I believe lakhs is very particular to Asia.

shemilr · November 22, 2017, 5:09am

ok. Thank you for the information. I will do a feasibility study between logstash and my application.

But the system is getting slower after inserting 500 thousand data. I am curious to know that is the data size will affect the insertion speed or not?

system · December 20, 2017, 5:09am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow large document insertion Elasticsearch	2	436	July 6, 2017
Faster inserts Elasticsearch	3	743	July 6, 2017
Bulk inserting is slow Elasticsearch	14	16241	July 6, 2017
Bulk Insert Throughput Issues Elasticsearch	2	337	July 6, 2017
Optimizing ES settings for bulkinserts Elasticsearch	3	475	July 6, 2017

Bulk insertion is too slow using php

Related topics