Elasticsearch performance is not increasing by adding new nodes

we have installed Elasticsearch Cluster on VMware Virtual machine nodes, on the other side we have a java application that run queries on elasticsearch using Transport Client.
one of the indexes contains 500,000,000 Documents and is distributed over 15 shards in three nodes.
when we run a query using java API in different states(like adding new nodes to cluster, removing nodes, increased the number of shards per index, etc), the query took time is the same and when we run a query on index that is distributed on 3 nodes, the took time is same as it is running on 1 node.
furthermore when the query runs, It seems that query is running on all the nodes because the CPU Usage is increasing immediately on all the nodes but the took time is not changing. m question is, why increasing the number of nodes has no effect on search query performance? how can I optimize it? Please help
=>number of cpu cores per node : 80 core
=>Memory per node : 64 GB
=>heap size : 30 GB

You have to also add::

  1. the full mapping used
  2. the full query used
  3. how many docuements does it return
  4. do you use routing
  5. query profile
  6. how much time the query takes
  7. index size

From one the the ElasticON videos, search query is slower with more shards per index. Since you have only 3 nodes, why would you need 15 shards? What is the total size of your data in ES?

index size is 113 GB.
it doesn't matter what kind of query we run, every type of query that runs on this cluster has the same result, It means the time the query takes on one node is approximately same as it runs on 3 nodes, that's the main problem. the following query is an example:

    QueryBuilder query = boolQuery()

    SortBuilder sort = SortBuilders.fieldSort("CREATIONDATETIME").order(SortOrder.DESC);

    SearchResponse searchResponse = ElasticOperation.transportClient

Each query is broken up and executed in parallel across all shards involved in the query across the nodes. For each shard the query is however executed in a single threaded. Query latency is therefore affected by the number of shards as well as the size of them. Benchmarking different size and number of shards will allow you to find the optimum for your hardware and cluster configuration.

If you commonly query with a specific parameter as a filter, e.g. 'DEBTOR', you may want to consider optimising for this and use routing at index and query time to allow only a single shard to be searched for each query. Although this allows you to optimise for a specific type of query, you can still execute queries that do not filter on the routing parameter by querying all shards of the index.