Bulk updation of data in elastic search via direct import


(Dipanjan) #1

Hi There !,

Our application back-end is made of Symfony 2.5.x. We are using MySQL 5.x and Elastic search 1.7.x for storing our data.

In our application, we store data of different companies in their individual database i.e. multi-tenant approach. Similarly we store companies data in their respective aliases in elastic search.

We had to enhance default fos:elastica:populate command with our custom command so that it checks every company database and sync it with corresponding elastic search alias data. This command is shipped with FosElasticaBundle.

All this is working fine. No problem so far. Till now we have almost 2000K records in elastic search.

Now, due to one of the requirement, we need to update mapping and store another property in elastic search i.e. store value of another column in elastic search.

Last time we add a new column for all existing aliases and perform bulk updating, 340K records were update on an average in per hour. So this time it might take 6 hours to complete the process.

Is there any way to speed up the process to say within 2 hours.

Our MySQL database is in AWS RDS, application code in Heroku and elastic search is in Found Elastic search provider.

I guess since FosElasticaBundle will first fetch data from MySQL and then puss it to Elastic search factors like latency between MySQL and Application and between Application to ElasticSearch will be there and it cannot be reduced too much. Similarly changing speed of processing of input JSON to elastic search engine is also out-of-scope for us.

Is there any way, say for example we create a CSV file with flat records and upload directly to elastic search server and it do the processing by reading from that CSV?

Any help/idea will be much appreciated.

Thanks in advance !


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.