Elasticsearch indexing (DB->ES) decreases dramatically when source is not co-located on same host


(M1k3ga) #1

We are using Elasticsearch (ES) for GUI search in our products.
Years ago we integrated Elasticsearch-1.7.
Now we utilize ES 6.4.
Now we have to re-index our data.
The source data (DB) is a located on a relational database (Oracle).
So far so good.

I do performance measurements within a docker (compose) environment.
First scenario is that ES, DB and the indexer are co-located on the same host machine.
ES-6.4 is as four times as fast as ES-1.7 for the same data amount to index.

Second scenario is that the Database is located on another host machine than ES.
The amount of time to index the data flips from 1:4 (ES64 : ES17) to 2:1,
meaning ES-6.4 is as half as fast as ES-1.7 for the same amount of data.

Does anyone have similar experience?
Any ideas what's happening there?

I find it quite confusing that ES-1.7 is double as fast es ES-6.4 when the database is not on the same host as the Elasticsarch instance.


(David Turner) #2

Yes, this is surprising to me too. Are you doing everything else the same between the two versions? If, for instance, the 6.4 import involved more round-trips (e.g. smaller batches) then one might expect this. Can you describe how the import process works in more detail? Can you give absolute performance numbers rather than ratios?


(M1k3ga) #3

The batch size of documents to index is the same in both scenarios (750 docs per chunk).
There was nothing changed in the implementation itself between the scenarios.
The only change is the location of the database (holds for both ES versions).
Does ES communicate in a different way (sockets?) when located on the same machine?
We use the High Level REST client for ES-6.4.


(David Turner) #4

No, it's TCP all the way.

And the transport client for 1.7? I wonder if that explains things. The REST client might not be parallelising as much, and the extra latency in the machine-to-machine situation is slowing things down. Can you try larger batches and/or making more requests in parallel?


(M1k3ga) #5

I thought of the different communication channels as well.
But i haven't tried it yet. But increasing the batch size could be a good idea. i will try.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.