Hi all,
I have 4 node ES running
ElasticSearch : 1.5.2
OS : RHEL 6.x
Java : 1.7
CPU : 16 cores
2 machines : 60 GB RAM, 10 TB disk
2 machines : 120 GB RAM, 5 TB disk
I also have a 500 node hadoop cluster and am trying to index data from
Hadoop which is in Avro Format
Daily size : 1.2 TB
Hourly size : 40-60 GB
elasticsearch.yml config
cluster.name: zebra
index.mapping.ignore_malformed: true
index.merge.scheduler.max_thread_count: 1
index.store.throttle.type: none
index.refresh_interval: -1
index.translog.flush_threshold_size: 1024000000
discovery.zen.ping.unicast.hosts: ["node1","node2","node3","node4"]
path.data:
/hadoop01/es,/hadoop02/es,/hadoop03/es,/hadoop04/es,/hadoop05/es,/hadoop06/es,/hadoop07/es,/hadoop08/es,/hadoop09/es,/hadoop10/es,/hadoop11/es,/hadoop12/es
bootstrap.mlockall: true
indices.memory.index_buffer_size: 30%
index.translog.flush_threshold_ops: 50000
index.store.type: mmapfs
Cluster Settings
$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "zebra",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 21,
"active_shards" : 22,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"number_of_pending_tasks" : 0
}
Pig Script:
avro_data = LOAD '$INPUT_PATH' USING AvroStorage ();
temp_projection = FOREACH avro_data GENERATE
our.own.udf.ToJsonString(headers,data) as data;
STORE temp_projection INTO 'fpti/raw_data' USING
org.elasticsearch.hadoop.pig.EsStorage ('es.resource =
fpti/raw_data','es.input.json=true', 'node1,node2,node3,node4',
'mapreduce.map.speculative=false','mapreduce.reduce.speculative=false','es.batch.size.bytes=512mb','es.batch.size.entries=1');
When i run the above, there are around 300 mappers none of them complete
and every time the job fails with the below error. There is some documents
that gets indexed though.
Error:
2015-05-20 15:40:20,618 [main] ERROR
org.apache.pig.tools.grunt.GruntParser - ERROR 2999: Unexpected internal
error. Could not write all entries [1/8448] (maybe ES was overloaded?).
Bailing out...
The job however finishes when the data size is few thousands
Please let me know what else i can do to increase my indexing throughput
regards
#sudhir
--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6312a8b6-bde7-40d6-bbf0-8b3fccf7cd12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.