What is the best and fatest way to insert json data into elasticsearch

Hi,

I have many json files (about 200 per day, each file contains more than 300 000 lines) that i need to insert into elasticsearch index.

I'm looking for a way to say to elasticsearch to get each json file in a folder, and put each row into an index mapping the correct fields. And I must do it fast (best will be less than 30 min.)

Is it possible to "industrialize" this process with filebeat/ logstash or another one ?

I tried a simple bulk insert using elasticsearch .net and nest api but it takes too many times to do it (about 3/4 min per file)

I did some test with filebeat but all I achieved is to insert the json data into the field message of a log..In my case, this is not what I'm looking for.

What is the size and specification of your cluster? Have you identified what is the bottleneck?

My cluster is 1 shard / 1 replica and the size is about 600Mo/1gb when I index 20 files

I tried to upgrade the number of shards and replicas to 4/4 and the time for indexing 20 json files with bulk insert .NET is reduce to the half.

I advanced in my test :

I used json decode in filebeat and it works. Now, I can see my data (Name,Unit,Value,geo and Time) but I also see all another logs fields that I don't want in my mapping.

Also, The geo data is not recognize as geo point type. Is there something to add in config ?

I tried to do my own template but it seems that filebeat don't take it. btw: the bulk with filebeat :

20 json files with total lines around 5 millions takes 19 minutes !! IS it normal ? for me, it seems very very slow..

Same files with bulk insert in c# is about 3/4 minutes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.