I have many json files (about 200 per day, each file contains more than 300 000 lines) that i need to insert into elasticsearch index.
I'm looking for a way to say to elasticsearch to get each json file in a folder, and put each row into an index mapping the correct fields. And I must do it fast (best will be less than 30 min.)
Is it possible to "industrialize" this process with filebeat/ logstash or another one ?
I tried a simple bulk insert using elasticsearch .net and nest api but it takes too many times to do it (about 3/4 min per file)
I did some test with filebeat but all I achieved is to insert the json data into the field message of a log..In my case, this is not what I'm looking for.
My cluster is 1 shard / 1 replica and the size is about 600Mo/1gb when I index 20 files
I tried to upgrade the number of shards and replicas to 4/4 and the time for indexing 20 json files with bulk insert .NET is reduce to the half.
I advanced in my test :
I used json decode in filebeat and it works. Now, I can see my data (Name,Unit,Value,geo and Time) but I also see all another logs fields that I don't want in my mapping.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.