Hello eveyone, I have 15.1 Million documents and I have such a problem with elasticsearch multi threading, I'm trying to search 46 aggregations(count, cardinality, date_histogram) together and it takes about 20-45 sec. That's far too long for us. I think My expectation is about 2-5 sec. Any help is much appreciated.
Cluster: 4Core, 16GB RAM Server.
I use Python low-level client library called elasticsearch-py
Query and Code Execution Time:
full es query
part of the code, view full code
start = time.time()
client.search(index=['nginx*'], doc_type=None, body=main_query)
print('Standard Search Query Execution Time: ', time.time() - start)
Output: Standard Search Query Execution Time: 19.81995415687561s
start = time.time()
client.msearch(body=msearch_query)
print('Standard Multi Search Query Execution Time: ', time.time() - start)
Output: Standard Multi Search Query Execution Time: 19.14345407485962s
start = time.time()
with ThreadPoolExecutor(50) as ex:
ex.map(lambda q: client.search(*q), iterable)
print('Search Query in Multi Threading Execution Time: ', time.time() - start)
Output: Search Query in Multi Threading Execution Time: 20.80817937850952s
start = time.time()
jobs = [Thread(target=client.search, args=arg) for arg in iterable]
# start threads
for job in jobs:
job.start()
# join threads
for job in jobs:
job.join()
print('Search Query in Standard Multi Threading Execution Time: ', time.time() - start)
Output: Search Query in Standard Multi Threading Execution Time: 21.506370544433594s
elasticsearch configuration -> elasticsearch.yml
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
node.master: true
node.data: true
http.host: 0.0.0.0
network.host: 0.0.0.0
script.painless.regex.enabled: true
http.max_initial_line_length: 10K
cloud:
gce:
project_id: myproj-es
zone: europe-west1-b
discovery:
zen.hosts_provider: gce
zen.minimum_master_nodes: 2
I was trying to create multi threading and improve the search query performance but without effect, but I have a reason for that
ex. when i try multi threading i see that every new one request needs to wait the previous one finished and why? or how can i solve this problem? how can i configure elasticsearch so that new request don't wait previous one finished?
Python Multi Threading Examples
start = time.time()
jobs = [Thread(target=client.search, args=arg) for arg in iterable]
# start threads
for job in jobs:
job.start()
# join threads
for job in jobs:
job.join()
print('Search Query By Standard Multi Threading Execution Time: ', time.time() - start)
Output: Search Query By Standard Multi Threading Execution Time: 21.506370544433594s
start = time.time()
with ThreadPoolExecutor(50) as ex:
ex.map(lambda q: client.search(*q), iterable)
print('Search Query By Multi Threading Execution Time: ', time.time() - start)
Output: Search Query By Multi Threading Execution Time: 20.80817937850952s
What is the solution? or How can I write an optimal option in this situation?
Is it possible to simultaneously run a lot of aggregation together? As shown in the above example
thanks in advance