I have many json files (about 200 per day, each file contains more than 300 000 lines) that i need to insert into elasticsearch index.
I'm looking for a way to say to elasticsearch to get each json file in a folder, and put each row into an index mapping the correct fields. And I must do it fast (best will be less than 30 min.)
Is it possible to "industrialize" this process with filebeat/ logstash or another one ?
I tried a simple bulk insert using elasticsearch .net and nest api but it takes too many times to do it (about 3/4 min per file)
I did some test with filebeat but all I achieved is to insert the json data into the field message of a log..In my case, this is not what I'm looking for.
What is the size and specification of your cluster? Have you identified what is the bottleneck?
My cluster is 1 shard / 1 replica and the size is about 600Mo/1gb when I index 20 files
I tried to upgrade the number of shards and replicas to 4/4 and the time for indexing 20 json files with bulk insert .NET is reduce to the half.
I advanced in my test :
I used json decode in filebeat and it works. Now, I can see my data (Name,Unit,Value,geo and Time) but I also see all another logs fields that I don't want in my mapping.
Also, The geo data is not recognize as geo point type. Is there something to add in config ?
I tried to do my own template but it seems that filebeat don't take it. btw: the bulk with filebeat :
20 json files with total lines around 5 millions takes 19 minutes !! IS it normal ? for me, it seems very very slow..
Same files with bulk insert in c# is about 3/4 minutes.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.