Elasticsearch upgrade from 1.7.1 to 2.3.2 then create index very slow

BTW I still think that increasing the number of docs per bulk request could help a lot.

For example, I'm indexing using bulk size of 10000 docs and I'm able to index around 12k docs per second locally.

Remember that fsync is per request not per doc.

There is a similar issue reported here

@dadoonet: sure bulk request would help but we are actually receiving high amount of updates so incremental indexing matters. For nightly indexing bulk indexing is the right option as you pointed out. Thanks for your mention of 12k docs per sec. I now have something to compare with.

@jasontedor: About the fsync of translog.
As I mentioned earlier, on ES 1.4.5 with default config options I got ~1ms per doc.
With ES 2.3.3 with default config (index.translog.durability = request) I got ~3ms per doc.
With ES 2.3.3 with default config (index.translog.durability = async) I got ~1.6ms per doc.
Sure with async the times came down but not to the level of ES 1.4.5.
Also, I will post the hot threads related to my bulk ingestion at the earliest...

Yes, it probably would help but then it's not an apples-to-apples comparison. I really want to understand why the performance drop is so steep here, 3x seems too high to be explained by the per-request fsyncs alone. It could be, I don't know, maybe the disks are really slow. But I want to make sure that there is nothing else here so we should just keep things constant.

You have nothing to compare with because you don't know the size of the documents, how many nodes there are, how many client threads are writing, what the underlying hardware is, whether or not there are dynamic mapping updates occurring, and many other extremely relevant variables.

Right, and we would expect that to be the case if there's more to the performance drop here than just the per-request fsyncs.

i also to changed this config ,but the time is also 3h :sob:

also i want to know why everything is same but the es version, the indexing time will be 3x. i want to fix it , or i had to use the old es version es-1.7.1 not the 2.3.2 :confounded:

@tingking23 I wonder if this is due to some difference in how we index things. I wonder if you could go back to your 1.x version and create a new index with all the mappings but don't index documents into it. Then upgrade this to 2.x and run your indexing test? I really wanna know if we changed something in mappings that make these things go nuts? we do a lot of different things based on the version the index was created on.

@tingking23 Do you have hot threads output that you can share?

Are you making refresh = -1 while doing indexing ?
Also, confirm whether the index.translog.durability = async has really been set or there was some issue while setting. I can't think of anything else for now... why you are not getting the gain though..

Hot threads while doing bulk indexing of real data won't be possible for now as we are presently incremental indexing heavy and a 5-6hr nightly indexing job is fine for now.
However, I can write a bulk indexing code for dummy data and post the hot threads. I believe that should be fine.
Hot threads are related to a moment. How many samples of hot_thread responses would be good for you ?

@tingking23 Can you also try what @s1monw suggested?

@neeraj Are you using a geo-shape field?

If so, I think this is the same as https://github.com/elastic/elasticsearch/issues/17907

Try changing your geo-shape mapping to specify:

"distance_error_pct": 0.025

See https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html for the docs

this my code to create index:
json = "{"settings":{"refresh_interval":"-1","number_of_replicas":"0"},"mappings":{""
+ pro.getProperty(Conf.ES_TYPE)
+ "":{"date_detection":false,"_all":{"enabled":false},"properties":{"t02_cust_relation":{"properties":{"cust_relation":{"type":"nested","properties":{"r_id_nbr":{"type":"string"},"r_phone_nbr":{"type":"string"},"r_source":{"type":"string"},"r_time":{"type":"string"},"r_type":{"type":"string"}}}}},"t01_cust_base":{"properties":{"cust_id_nbr":{"type":"string","index":"not_analyzed"}}}}}}}";

    Transformation.doPut(url, json);

this is the code to update the cluster settings:

url = "http://" + url_node + ":9200/_cluster/settings";
json = "{"transient":{"indices.store.throttle.type" : "none"}}";
Transformation.doPut(url, json);

i'm not using the geo-shape field

i checked the different version mapping , there are the same mapping when es1.7.1 or es2.3.2

i try it also , but it does not work.
i changged the API bulk to bulkprocess and the time come down to 1.5h,
but it also slow than es1.7.1 . old es1.7.1 use almost 1h.