when i use the elasticsearch 1.7.1 the java job finished ablout one hour;
then i upgreade es to 2.3.2 the java job finished more than three hours.
Anyone can help me, thanks very much.(i used the bulk)
Can you provide more details about the bulk requests? How many documents are you sending per bulk request? What is the approximate size of each bulk request (in MB)? Are there any other relevant details that you can share?
Would you mind if I add to this discussion. I am kind of facing the same issue.
@jasontedor
An indexing job that used to finish in about 4 hours on ES 1.4.5 is taking about 9 hours on ES 2.3.3
Then, I tried profiling each indexing request on ES 1.4.5 and ES 2.3.3
I have 2 EC2 instances with exact same config, one for ES 1.4.5 and one for ES 2.3.3.
I indexed very simple 100 documents using Python code running from each server and hitting the localhost ES server.
With refresh = -1, No. of shards=1, No. of replicas=0, the times for ES 1.4.5 are abt 1ms per indexing request while they are about 3.5ms for ES 2.3.3
Am I doing something wrong, or this is kind of expected ?
Can you share the contents of a single document, made anonymous if necessary (I just want to see the rough shape and content)? Can you share the mapping for sketch? What settings do you have set on the nodes host1 and host2? How much RAM do they have and how much is allocated to the Elasticsearch heap? Finally, what is your storage media, spinning or solid-state (and is it local to the hosts)?
the document i can not see it , i just konw it about custemer name,phone number,adreess and something like that.
here is the elasticsearch config:
index.cache.field.type: "soft"
index.cache.field.max_size: 50000
index.cache.field.expire: 10m
index.number_of_shards: 4
indices.memory.index_buffer_size: 30%
bootstrap.mlockcall: true
here is the java config:
es.batch.size=200
es.action.timeout=50000
the ES:
ES_MAX_MEM=10g
ES_MIN_MEM=10g
ES_HEAP_SIZE=20g
java code:
public static final String ES_INDEX = "es.index";
public static final String ES_TYPE = "es.type";
public static final String ES_HDP_SHARE = "es.cluster.share";
public static final String ES_BATCH_SIZE = "es.batch.size";
public static final String ES_ACTION_TIMEOUT = "es.action.timeout";
public static final String TEMP_FILE_PATH = "temp_file_path";
public static final String TAG_SCHEMA = "tag.schema";
public static final String TAG_TABLES = "tag.tables";
public static final String DEFAULT_ES_INDEX = "dm";
public static final String DEFAULT_ES_TYPE = "sketch";
public static final String DEFAULT_ES_HDP_SHARE = "false";
public static final String DEFAULT_ES_BATCH_SIZE = "8192";
public static final String DEFAULT_ES_ACTION_TIMEOUT = "5000";
public static final String DEFAULT_TEMP_FILE_PATH = "/tmp/sample";
I generated the documents using a this simple code:
for i in range(1,101):
body = getDocument(i, 'd'*i, 'name', 'a'*i)
t_s = datetime.datetime.now()
http.request('PUT', 'localhost:9200/'+index_name+'/'+doc_type+'/'+str(i), body=json.dumps(body))
t_e = datetime.datetime.now()
logfile.write('indexed document id : %d in %s\n'% (i, str(t_e - t_s)))
For both ES installations I used the default configs. Only changes made were related to No.of shards and No.of replicas.
Both machines are EC2 m4.4xlarge machines, with 64GB RAM, EBS backed.
Since the data is so small, and there are no refreshes, I am thinking the storage media should not matter much. I any case, the media is exactly same for both instances.
I do not see fync happening after every request. I indexed 100 docs, with refresh=-1, and after indexing I did not make any refresh using code.
At this moment the translog had entries, but no .cfs or .cfe files were created.
I then manually hit the Refresh API and could see .cfs file being created but the segment had attributes as search=true and commit=false.
Then I did a Flush and could see the segment attribute changed to commit=true.
AFAIK, this is the expected behavior and I could see the same happening in ES 1.4.5 and ES 2.3.3 So I doubt that an fsync is being done at each indexing request.
And you mean to say that the translog was not getting fsynced per request in ES 1.4.5 but is being done in ES 2.3.3. And because of this there is the penalty per indexing request. Makes sense..
But if the translog is not being fsynced per request in ES 1.4.5, how do we ensure that nothing is lost (and is recoverable) if the server crashes ?
Came across index.gateway.local.sync setting for ES 1.4.5 and index.translog.durability setting for ES 2.3.3. They kind of explain what you mentioned.
Thanks @jasontedor.
I tried updating index.translog.durability to async. The GET /_settings API shows the updated settings but still I am incurring 100 translog.operations (as shown by Index Status API) for 100 indexing requests.
I actually wanted to mention that my per index times are not coming down much ( comparable to those of ES 1.4.5) but I guess that is the most I can get.
Right, so the reason that I was asking so many questions earlier is because my first thought on the initial post was that this was due to the fsync change but I wasn't convinced that it accounts for a change of this magnitude (one hour to three hours, or 3x).
Can you grab hot threads on the nodes while you're doing the bulk ingestion?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.