Elasticsearch performance is not increasing by adding new nodes

hosseinjdk · May 15, 2016, 11:09am

we have installed Elasticsearch Cluster on VMware Virtual machine nodes, on the other side we have a java application that run queries on elasticsearch using Transport Client.
one of the indexes contains 500,000,000 Documents and is distributed over 15 shards in three nodes.
when we run a query using java API in different states(like adding new nodes to cluster, removing nodes, increased the number of shards per index, etc), the query took time is the same and when we run a query on index that is distributed on 3 nodes, the took time is same as it is running on 1 node.
furthermore when the query runs, It seems that query is running on all the nodes because the CPU Usage is increasing immediately on all the nodes but the took time is not changing. m question is, why increasing the number of nodes has no effect on search query performance? how can I optimize it? Please help
=>number of cpu cores per node : 80 core
=>Memory per node : 64 GB
=>heap size : 30 GB

ddorian43 · May 15, 2016, 1:42pm

You have to also add::

the full mapping used
the full query used
how many docuements does it return
do you use routing
query profile
how much time the query takes
index size

anhlqn · May 17, 2016, 12:43am

From one the the ElasticON videos, search query is slower with more shards per index. Since you have only 3 nodes, why would you need 15 shards? What is the total size of your data in ES?

hosseinjdk · May 17, 2016, 4:47am

index size is 113 GB.
it doesn't matter what kind of query we run, every type of query that runs on this cluster has the same result, It means the time the query takes on one node is approximately same as it runs on 3 nodes, that's the main problem. the following query is an example:

    QueryBuilder query = boolQuery()
                   .filter(termsQuery("DEBTOR",ibns))
                   .filter(rangeQuery("CREATIONDATETIME").from(dateFrom)
                   .to(dateTo)
                   .includeLower(true)
                   .includeUpper(true));

    SortBuilder sort = SortBuilders.fieldSort("CREATIONDATETIME").order(SortOrder.DESC);

    SearchResponse searchResponse = ElasticOperation.transportClient
            .prepareSearch("index_name")
            .setSearchType(SearchType.QUERY_THEN_FETCH)
            .setSize(1000)
            .setQuery(query)
            .addSort(sort)
            .execute()
            .actionGet();

Christian_Dahlqvist · May 17, 2016, 5:35am

Each query is broken up and executed in parallel across all shards involved in the query across the nodes. For each shard the query is however executed in a single threaded. Query latency is therefore affected by the number of shards as well as the size of them. Benchmarking different size and number of shards will allow you to find the optimum for your hardware and cluster configuration.

If you commonly query with a specific parameter as a filter, e.g. 'DEBTOR', you may want to consider optimising for this and use routing at index and query time to allow only a single shard to be searched for each query. Although this allows you to optimise for a specific type of query, you can still execute queries that do not filter on the routing parameter by querying all shards of the index.

Topic		Replies	Views
Query response time does NOT improve by adding additional nodes Elasticsearch	3	560	July 6, 2017
Cluster Horizontal Scaling Elasticsearch	5	1262	December 26, 2017
Concurrent Search in elasticsearch Elasticsearch	7	2135	July 5, 2017
Elasticsearch same query has different performance Elasticsearch	5	775	June 21, 2018
Different number of nodes/replicas/shards doesnt change performance Elasticsearch	10	727	July 5, 2017

Elasticsearch performance is not increasing by adding new nodes

Related topics