To increase search performance, I tried to add a replica to my cluster.
Initially, I measured a response time of around 700ms for a specific request with a single node with a primary shard.
After adding a node and a replica shard to the cluster, it takes averagely 2000ms to get the result of the request (so it almost tripled).
I'm using these configs for the replica : cluster.name: findmyfpstore node.name: fmfs_r1
node.master: false
network.host: ...
http.port: ...
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ['...']
index.number_of_shards: 1
index.number_of_replicas: 1
Am I doing something wrong ? Feel free to tell me if you need more informations, I'm a beginner at ElasticSearch.
Replicas do not help with latency, they can only help with throughput by replicating the data on more nodes.
If you have lots of data, then the slow down of your queries is expected since it means that each shards has less filesystem cache to work with (since there are more shards overall).
By having lots of data, do you mean having lots of documents ? If you're referring to this, I only have 829 documents.
I tried with a server that is in the same datacenter but it still doesn't improve the response time of the queries (a bit longer).
Excuse me but I don't understand your answer very well, I only have one shard per node (primary on one node, replica on the other one).
With such a small dataset, it is very hard to reason about what the hot spot might be. I am not sure we can help much here. Even with replicas and 2000ms is a response time, that is still only 2ms per search request, which should be fast enough for most use-cases?
I am reluctant to try to optimize this case since for such a small dataset it would probably be easier to hold everything in RAM and perform a brute-force scan to find matches all the time.
Does this mean setting the "index.store.type" to "memory" ?
Can you provide more details about "performing a brute-force scan to find matches all the time", please ? Does it require to change anything ?
Is there anything else to know about this ? Thanks !
I just mean that with such a small data set, everything should be fast and using Elasticsearch is a bit overkill. Does the slow down incurred by the addition of new shards matter to you?
Mainly, I'm using ElasticSearch to try it and for the geo queries.
Yes, it would matter a bit (it is quite important to push the performance as much as I can) but I only need one primary shard in my case if I understand correctly what you said previously.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.