Hi Team,
ES Version: 6.5.4
Python Version : 3
I am writing a dash app to generate reports that uses elasticsearch as a backend/data source, to connect my app to elasticsearch I use elasticsearch-dsl
python module. When I tried to query data for single day which consists of 13322
docs, it takes long time to display the results even though using scroll
api.
The python script queries the data and convert from json to pandas data frame.
The elastic search query is structured using Kibana's dev-tool
.
Query:-
realm_query = {
'aggs': {
'2': {
'date_histogram': {
'field': 'EventTime',
'interval': '1h',
'time_zone': 'Europe/London',
'min_doc_count': 1
},
'aggs': {
'3': {
'significant_terms': {
'field': 'AuthStatus.keyword',
'size': 2
}
}
}
}
},
'size': 0,
'_source': {
'excludes': []
},
'stored_fields': [
'*'
],
'script_fields': {},
'docvalue_fields': [
{
'field': '@timestamp',
'format': 'date_time'
},
{
'field': 'EventTime',
'format': 'date_time'
}
],
'query': {
'bool': {
'must': [
{ 'terms': { 'server ': ['server_name'] }}
],
'filter': [
{
'range': {
'EventTime': {
"gte": 1601506800000,
"lte": 1601593199999,
"format": "epoch_millis"
}
}
}
]
}
}
}
python script:-
response = es_client.search(
index = "server-*",
scroll = "10s",
size = 500,
body = realm_query
)
#response['hits']
#print("Filtered Query : \n", parsed_query)
# Counter values
counter = 0
sid = response['_scroll_id']
scroll_size = response['hits']['total']
print("Scroll size : ", scroll_size)
"""
STORE THE ELASTICSEARCH INDEX'S FIELDS IN A DICT
"""
## create an empty dictionary for Elasticsearch fields
fields = {}
while (scroll_size > 0):
#print("Scrolling...")
page = es_client.scroll(scroll_id = sid, scroll = '10s')
#print("Hits: ", len(page['hits']['hits']))
sid = page['_scroll_id']
#Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
elastic_docs = response["hits"]["hits"]
# Iterate the date into fields.
for num, doc in enumerate(elastic_docs):
# Data contains in _source field
source_data = doc["_source"]
#radius_type = doc["_source"]["RadiusType"]
# _source field is a dictonary, so iterate through the dict
for key, val in source_data.items():
try:
fields[key] = np.append(fields[key], val)
except KeyError:
fields[key] = np.array([val])
print("Scroll Size {} ".format(scroll_size))
counter = counter + 1
"""
Transform dictionary to pandas dataframe
"""
data_es_df = pd.DataFrame(fields)
print("Total Pages : {}".format(counter))
Output Time:-
May I know how Kibana queries the data from elasticsearch and whether it is do able using python script.
Any reference and document would be helpful.
Best,
Yash