Elasticsearch python async client multiple requests

Hi,

I was trying to use the Elasticsearch python client to make async queries to Elasticsearch. I'm trying to run aggs, and have a few partitions that I want to fetch results from. Here is my query:

{
	'query': {
		'bool': {
			'filter': [{
				'range': {
					'@timestamp': {
						'gte': 'now-1d',
						'lte': 'now'
					}
				}
			}, {
				'bool': {
					'must_not': [{
						'terms': {
							'src.ip': ['10.x.x.x']
						}
					}]
				}
			}],
			'must': [{
				'terms': {
					'event.id.keyword': ['4625']
				}
			}]
		}
	},
	'aggs': {
		'src__ip': {
			'terms': {
				'field': 'src.ip',
				'size': 100,
				'include': {
					'partition':  i,
					'num_partitions': 20
				},
				'min_doc_count': 1
			},
			'aggs': {
				'src__user': {
					'terms': {
						'field': 'src.user.keyword'
					}
				},
				'selector': {
					'bucket_selector': {
						'buckets_path': {
							'val': 'src__user._bucket_count'
						},
						'script': 'params.val>5'
					}
				}
			}
		}
	},
	'from': 0,
	'size': 0
}

Now, since I have the query divided in 20 partitions, I wanted to fetch the results of these 20 partitions in parallel. I plan to use the Elasticsearch async client to do this. Here is my code:


from datetime import datetime
from elasticsearch import AsyncElasticsearch
import asyncio

def asesclient():
    client = AsyncElasticsearch('https://elk.com:9200', http_auth=('elastic', 'elastic'), verify_certs=False, request_timeout=60, timeout=60)
    return client

aes = asesclient()
i = 0
a = datetime.now()

async def es(i):
    for i in range(20):
        aes = asesclient()
        q = # Query posted above
        res = await aes.search(index="index-*", body=q)
        print(res)
        return res

async def main():
    await asyncio.gather(es(0), es(1), es(2), es(3), es(4), es(5).....)

asyncio.run(main())
print((datetime.now()-a).seconds, " seconds taken for execution")

I'm still new to using asyncio and i'm not sure if what I am trying to do makes sense. So, here are the two questions I have:

  1. Will querying multiple partitions at the same time work in Elasticsearch?
  2. I tried using the implementation of loop shown on the docs, but it waited for a response from Elasticsearch for each request cause of the await. Is this due to the GIL in Python? Is there any way to improve the speed?

Unable to test in detail, cause, once I run a query, the next time the same query gives the output almost immediately. This seems to be due to some sort of caching of results from ES. Any help would be appreciated!

References:


Thanks :beers:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.