Best way to retrieve multiple fields for a set of field values

Hello,

I need to get from Elasticsearch specific fields for some number of hosts.
This is my current approach where I prefer the list of "host.hostname" fields and "size" based on the size of list.

"""
        {
        "_source": ["host.hostname", "host.ip", "host.os.name", "host.bo"],
        "query": {
            "terms": {
               "host.hostname": [
                  XXXHOSTHOSTNAMESXXX
               ]
            }
        },   
        "sort": { "@timestamp" : "desc" },
        "size": XXXSIZEXXX
        }
        """

This was working and I thought It was ok but sometimes It retrieves information, not for all hosts, and some others are just doubled or tripled.

Could you please advise how I can assure that the required field will be retrieved for all hosts in one search?

I am on 7.17-1

This is the full function:

    def set_other_host_information(self, notifications_object):
        hostname_set = set()
        for obj in notifications_object:
            host_hostname = obj.host_hostname
            hostname_set.add(host_hostname)
        body = """
        {
        "_source": ["host.hostname", "host.ip", "host.os.name", "host.bo"],
        "query": {
            "terms": {
               "host.hostname": [
                  XXXHOSTHOSTNAMESXXX
               ]
            }
        },   
        "size": XXXSIZEXXX
        }
        """

        body = body.replace("XXXHOSTHOSTNAMESXXX", ",".join([f'"{hostname}"' for hostname in list(hostname_set)]))
        body = body.replace("XXXSIZEXXX", str(len(hostname_set)))
        print(body)
        rv = self.es_con.search(index=self.METRICBEAT_INDEX_NAME, body=body, request_timeout=20)
        print(rv)
        for data in rv["hits"]["hits"]:
            eho = ElasticDocumentObject(data)
            print(data)
            for object in notifications_object:
                if object.host_hostname == eho.host_hostname:
                    object.eho = eho

Basicaly what I want to achieve is retrieve fields:

"_source": ["host.hostname", "host.ip", "host.os.name", "host.bo"],

For given "host.hostname" from notification
I need to ensure that I will get information for all hosts.

I could not think/find anything better so for now I am just doing a request for every host.hostname

    def set_other_host_information(self, notifications_object):
        hostname_set = set()
        rv_data_dict_list = []
        for obj in notifications_object:
            host_hostname = obj.host_hostname
            hostname_set.add(host_hostname)
        for hostname in hostname_set:
            body = """
            {
            "_source": ["host.hostname", "host.ip", "host.os.name", "host.bo"],
            "query": {
                "term": {
                   "host.hostname": "XXXHOSTHOSTNAMESXXX"
                }
            },
            "sort": { "@timestamp" : "desc" },
            "size": 1
            }
            """

            # body = body.replace("XXXHOSTHOSTNAMESXXX", ",".join([f'"{hostname}"' for hostname in list(hostname_set)]))
            body = body.replace("XXXHOSTHOSTNAMESXXX", hostname)
        # body = body.replace("XXXSIZEXXX", str(len(hostname_set)))
            rv = self.es_con.search(index=self.METRICBEAT_INDEX_NAME, body=body, request_timeout=4)
            rv_data_dict_list.extend(rv["hits"]["hits"])
        for data in rv_data_dict_list:
            print("--->",data)
            eho = ElasticDocumentObject(data)
            for notification_object in notifications_object:
                if notification_object.host_hostname == eho.host_hostname:
                    notification_object.eho = eho

I added some aggregation to the query and now in the response from aggregation, there is always all the information. But this one query is 8-10 times slower than 4 previous queries.

 fields_list = ["monitor.ip", "tags", "host.bo", "host.hostname", "monitor.name"]
        body = """{
            "_source": [XXXFIELDSLISTXXX],
            "query": {
                "terms": {
                   "monitor.name": [XXXNAMESXXX]
                   }
                   
            },   
            "sort": { "@timestamp" : "desc" },
            "size": 1,
            "aggs": {
            "my-agg-name": {
              "multi_terms": {
            "terms": [XXXFIELDSDICTXXX], 
                "size": XXXSIZEXXX
              }
            }
          }
          }
        """

        body = body.replace("XXXNAMESXXX", ",".join([f'"{name}"' for name in list(name_set)]))
        body = body.replace("XXXSIZEXXX", str(len(name_set)))
        body = body.replace("XXXFIELDSLISTXXX", ",".join([f'"{field_name}"' for field_name in list(fields_list)]))
        body = body.replace("XXXFIELDSDICTXXX", ",".join(['{"field":'+f'"{field_name}"'+"}" for field_name in list(fields_list)]))
        print(body)
        rv = self.es_con.search(index=self.HEARTBEAT_INDEX_NAME, body=body, request_timeout=20)

I tried aggregations but it was 20 times slower. I stack with making one request per host.

body = """{
            "_source": [XXXFIELDSLISTXXX],
            "query": {
                "terms": {
                   "monitor.name": [XXXNAMESXXX]
                   }
                   
            },   
            "sort": { "@timestamp" : "desc" },
            "size": 1,
          "aggs": {
            "my-agg-name": {
              "multi_terms": {
            "terms": [XXXFIELDSDICTXXX], 
                "size": XXXSIZEXXX
              }
            }
          }
          }
        """

Are there multiple documents for each 'host.hostname'? Do you need latest document for each 'host.hostname'?

Why you need multi_terms aggregation? It is different from the situation above.

If you need latest document per 'host.hostname', you can use top_hits aggregation after terms aggregation.

1 Like

Hello Tomo,

thank you for your reply.

Yes, there are multiple documents for the "host.hostname"(now I use "monitor.name") and it's always unique(for field monitor.name). Yes I need the last document because I am using information contained in "tags" field.

To be honest I am not really good (yet) with elastic queries so I was using whatever was working.

I will try out your suggestion:

I just found another option by chance.
Collapse search results should be another choice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.