Indexing Speed Elastic Search

I am indexing 300 documents in following indexes

Index A: has a synonym_filter with 25000 synonyms in the Index itself

Index B: the synonym_filter refers the synonyms from a file

I observed a drastic difference in indexing speed for both in indices.
When synonyms were refered from a file, the indexing was way faster

Index B

real 0m0.799s

user 0m0.007s

sys 0m0.023s

Index A

real 0m10.077s

user 0m0.008s

sys 0m0.023s

When in "Index A" the synonyms are spread across multiple syn_filters with ~5000 synonyms each, the indexing takes ~1 min for one document!!!

Configuration: Elasticsearch is running on localhost itself, so no network issues in play.

Very interesting. I'm sure @jpountz and @danielmitterdorfer will be happy to reproduce and find the reason.
If confirmed, may be we should update our documentation to advice people using that on disk.

I guess your cluster is stable so you are not updating the cluster state frequently, right?

The fact is that the synonym file is not updated so it's probably served by the file system cache. Bonus: no needs to parse the cluster state and extract synonym list for every request or keep that data structure in heap. (I didn't look at the code so just assumptions here).

yes, the cluster(which is actually a single node) is stable and I am indexing sequentially in both the Indices.

Just for info: the synonyms I am using are wordnet_verb_synonyms,
and I am working on:-
ES1.7.5

Ha! Can you run similar tests on 5.1?

Checked the same in ES 5

When the synonyms are referenced from the file:
real 0m0.200s
user 0m0.005s
sys 0m0.006s

When the synonyms are referred from the filter:
real 0m0.545s
user 0m0.005s
sys 0m0.005s

So even in ES5 the time taken to index documents is slower by ~2x, However not as slow as observed on ES1.7.5

I suspect your benchmark includes the time for index creation and hits the following issue which will be fixed in 5.2: https://github.com/elastic/elasticsearch/pull/22249.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.