Performance issue with Elastic


#1

we are using elastic for very efficient search but it seems to be taking ~800ms for one simple search.. this is when we hit directly using elastic API..

we have our own cluster with 3 nodes where one is as master and other two are data node. we have only one index with one data type as all items are same.. we have around 8 million records in that. we have 5 primary and 2 replica shards.

My query is:

having 8 million in one index/type might cause this?Should I consider splitting it?
we always hit master node to write and read, should I consider calling data node for reading?
anything else I should consider of doing it different to have better performance?
Thanks much in advance.


(Mark Walkom) #2

That's not good, see https://www.elastic.co/guide/en/elasticsearch/guide/2.x/important-configuration-changes.html#_minimum_master_nodes

Unlikely.

Yes, always use all nodes.

What version, what OS, what JVM, what does the mapping and query look like?


(Sambit Kabi) #3

There is a lot of things performance depends on. What is your set up?
Are your nodes running in one machine or different machines?

The search performance also to some extent depends on the number of shards you use.
Are you performing a wildcard search with wildcard at the beginning?


#4

we dont have VM.. each node is a complete unix machine.. i have three nodes .. all geographically located...
i have 5 primary and one replica shard for my index.. another qus is.. when I search, will elastic look in all nodes? data is stored in shards and shards are distributed in all nodes..I am doubting that this could be issue, as hosts are not in same region and if it goes to other node there might be latency..

there is not much wild card, we use term more where we search with start with and end with?


#5

its unix box, with java-8 and with 250 RAM. in mapping most of the fields are string only..

query is really huge with lot of OR condition.. based on input we decide some specific field and some generic search..


(Sambit Kabi) #6

well ES will hit all shards in a round robin fashion but there are ways to control this behaviour using routing.
You can configure the ES to search in local shards using shard allocation awareness. Go through the below link, it might help you in your set up. I would suggest running ES on only one machine and get the reading for the query time.

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html


#7

Even on single node, its taking arround 500 ms for simple request..


(Sambit Kabi) #8

I think you should see less time for the same search second time because of caching.


(Christian Dahlqvist) #9

For each query Elasticsearch will hit one copy of the each required shard, irrespective on which node it resides. You can make queries try to use local copies first by specifying _local preference. Having clusters distributed geographically can result in performance and stability issues, which is why it is not recommended.


#10

Thank you very much.. This explains alot.. this is what even I have been doubting on..
have one more query, if I run multiple elastic processes on same host then will it add any advantage?


(Christian Dahlqvist) #11

Not unless you have very large hosts.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.