Queries on Elastic Search Configuration and Bulk Import

ASA · August 5, 2014, 2:25pm

I am trying to setup elasticsearch with around 5 million records. Each document has 150 KV pairs. I am using ES 1.2.1 on Ubuntu 12.04 with 4GB RAM and 40GB Disk space. I have used all the default configurations of ES for creating index, inserting documents so on and so forth.

Few problems while doing this are:

I was able to insert maximum of 30K records from a JSON file using bulk api. I also observed that it works smooth for a file size around 15-20MB only. Can anyone specify the reason, upper bound and optimal size for bulk import?
A JSON file used for bulk api contains thousands of records. So every time before actual data I had to write a specification line. For example,
{"index":{"_index":"indexName","_type":"testName","_id":"someValue"}}
{"field1": "value1","field2":"value2".....}
{"index":{"_index":"indexName","_type":"testName","_id":"someValue"}}
{"field1": "value1","field2":"value2".....}
{"index":{"_index":"indexName","_type":"testName","_id":"someValue"}}
{"field1": "value1","field2":"value2".....}......
Isn't this cumbersome? I mean if I have to insert 100 records, I have to add 100 specification lines in the file as well?
I successfully inserted some 3,30,000 records by repeatedly inserting 30K records at once. But then I tried doing this concurrently and started running 5 threads at a time. ES Crashed!!! Out of memory exceptions was the reason. I restarted the ES and found that now only 2,07,000 records are present. Out of 5 shards only 2 were successful which means data vanished! It is serious issue and can break the application.
Can anyone help me on ideal sharding and memory requirements for such a huge size of data? Also how we can specify these settings at the time of index creation and modify after index creation ?
Now after this crash, when I search for a particular record with id 'x', ES returns me the data but when I am trying to retrieve same document with Get, it fails! What might be gone wrong?

Help is much appreciated. Thanks in advance.

Topic		Replies	Views
Configuration for a future cluster with 30b documents Elasticsearch	3	384	August 1, 2019
Looking for advice on bulk loading Elasticsearch	6	884	July 6, 2017
Elasticsearch bulk size/performance Elasticsearch	2	19114	July 5, 2017
ES Recommended Configuration? Elasticsearch	3	928	July 6, 2017
Machine requirements for Elasticsearch Elasticsearch	8	856	July 5, 2017

Queries on Elastic Search Configuration and Bulk Import

Related topics