I need to get from Elasticsearch specific fields for some number of hosts.
This is my current approach where I prefer the list of "host.hostname" fields and "size" based on the size of list.
I added some aggregation to the query and now in the response from aggregation, there is always all the information. But this one query is 8-10 times slower than 4 previous queries.
fields_list = ["monitor.ip", "tags", "host.bo", "host.hostname", "monitor.name"]
body = """{
"_source": [XXXFIELDSLISTXXX],
"query": {
"terms": {
"monitor.name": [XXXNAMESXXX]
}
},
"sort": { "@timestamp" : "desc" },
"size": 1,
"aggs": {
"my-agg-name": {
"multi_terms": {
"terms": [XXXFIELDSDICTXXX],
"size": XXXSIZEXXX
}
}
}
}
"""
body = body.replace("XXXNAMESXXX", ",".join([f'"{name}"' for name in list(name_set)]))
body = body.replace("XXXSIZEXXX", str(len(name_set)))
body = body.replace("XXXFIELDSLISTXXX", ",".join([f'"{field_name}"' for field_name in list(fields_list)]))
body = body.replace("XXXFIELDSDICTXXX", ",".join(['{"field":'+f'"{field_name}"'+"}" for field_name in list(fields_list)]))
print(body)
rv = self.es_con.search(index=self.HEARTBEAT_INDEX_NAME, body=body, request_timeout=20)
Yes, there are multiple documents for the "host.hostname"(now I use "monitor.name") and it's always unique(for field monitor.name). Yes I need the last document because I am using information contained in "tags" field.
To be honest I am not really good (yet) with elastic queries so I was using whatever was working.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.