Index a large dataset into elasticsearch


I've been playing around with elasticsearch and graph for a while and a lot is very promising!

I'm stuck when it comes to parsing JSON-datasets containing millions of rows with the bulk api. I have to add a header to each line which is doable when the dataset isn't big so I can do it manually, but if it's a large dataset then I don't know the options on how to do it. Can't find it in the O'Reilly book and I've googled a lot but I haven't found a definitive answer on how experienced Elasticsearch users deal with indexing large datasets with the bulk api and having to add headers to each line. Do you use programming languages or any other solutions?

Any help would be appreciated. I can read some code but I'm not a programmer by the way.

You can use Logstash for this. It has a codec as well as filter for parsing JSON data, especially if it is one JSON object per line, and should be relatively easy to set up.

Sounds good, thanks for your quick reply! I will give it a go.