Keeping Track of changes to host configuration object in Metricbeat

Greetings!

I'm trying to look at changes to the "host" field produced by metricbeat. I'd like to run a process every day to see if this "host" field for some specific system changed. A typical metricbeat record that contains the desired host field looks something like this:

{
  "_index": "metricbeat-7.3.1-2019.09.10-000001",
  "_type": "_doc",
  ...
},
...
},
"host": {
  "name": "(servername)",
  "containerized": false,
  "hostname": "(hostname)",
  "architecture": "x86_64",
  "os": {
    "kernel": "3.10.0-954.10.1.dl7.x86_64",
    "codename": "Core",
    "platform": "centos",
    "version": "7 (Core)",
    "family": "redhat",
    "name": "CentOS Linux"
  },
  "id": "211d31bcdb1fdfdereffer5664fsdfhjjs"
},
...
  }
}

So far I have tried to:

  1. Grab all of the records for a day and compare these host fields as dictionaries. The problem with doing this is that metricbeat generates several thousand records that each have to be searched. It's really inefficient to do that when we have very large indexes and the code depends on processing data that's temporally ordered. See: Get oldest / newest document in *beat

  2. I tried to write an aggregation to grab the entire host object then bin the data into "servername" bins. I couldn't get this to work either. See: https://stackoverflow.com/questions/59461511/aggregation-of-host-json-object-out-of-metricbeat-on-elasticsearch

  3. I've installed Elastalert and have been trying to get it to alert on host changes. It hasn't worked so far. The examples provided with the Elastalert package work for really simple stuff but that's all I can get it to do.

When I run the query, I'd like to see something like this as output:

"stat_date": 2020-01-16 09:58,
"host_configurations":
  {
    server_id: "server1",
    "host": {
          "os.name": "Linux xxx",
          "kernel.version":  u893,
          ....
  },
  {
    server_id: "server1",
    "host": {
          "os.name": "Linux xxx",
          "kernel.version":  u895,
          ....
  },
  ...

This would tell me that sometime that day the kernel.version changed from u893 to u895. Or I could see if someone added RAM or more disk space. That's it. From there the rest of the code is easy. I don't care what time it happened or how many times. I just need to know that sometime that day there was a host configuration change.

This seems like it should be a 'relatively' simple thing to do but I can't get anything to work. I'd appreciate any suggestions anyone has. Thank you!

Hi @EricJohnson :slight_smile:

Have you checked the Scroll option retrieving only the fields you need? https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-body.html#request-body-search-scroll I'm not fully sure but that would be my initial attempt. Iterate using the scroll API over each document, comparing them in-memory (in my process) with the "expected" result.

Maybe I'm missing somethig :sweat_smile:

Thanks for the reply, Mario. Yes - I did that too. What I found is that it takes several minutes.

For instance, to run this:

esr = Elasticsearch(
	hosts=[conf.get('source').get('host')],
	http_auth=(conf.get('source').get('username'), conf.get('source').get('password')),
	scheme='https',
	port=conf.get('source').get('port'),
	use_ssl=True,
	ca_certs=certifi.where(),
	timeout=60
)

res = helpers.scan(
    client=esr,
    size=documentSize,
    scroll='60s',
    clear_scroll=True,
	query={'query':
		{'bool': {
			'must': [
				{"term": {'serverId': thisId}},
				{"exists": {"field": "host"}},
				{'term': {'@timestamp': lastDate}},
			]
		}
	}
	},
	index="metricbeat-*",
	_source=['host', 'serverId', '@timestamp'])

I did a loop with something like this to keep track:

  for i in res:
	counter += 1
	if i['_source']['host'] not in hostConfigs:
		hostConfigs.append(i['_source']['host'])
		uniqueCounts += 1
	elif i['_source']['host'] in hostConfigs:
		duplicateCounts += 1

We get this out (code skipped for brevity):

time required:
0:06:35.563270

(total number of documents processed)

counter: 1692946

(number of unique host configurations found)

unique: 1

(number of times we found duplicate host configurations)

duplicates: 1692945

An array containing all of the unique host configurations looks like this (on the search date only one unique value was found so it only puts it in that array once):

    [   {   'architecture': 'x86_64',
            'containerized': False,
            'hostname': 'hostname-xxx',
            'id': 'id-xxx',
            'name': 'server-name-xxx',
            'os': {   'codename': 'Santiago',
                  'family': 'redhat',
                  'kernel': '2.6.32-754.24.3.el6.x86_64',
                  'name': 'Red',
                  'platform': 'redhat',
                  'version': '6.10 (Santiago)'}}]

So it works -- but this is the first case where it has to check all 1,692,946 documents and iterate through each host configuration. It takes six and a half minutes... There's gotta be a better way to see that every one of those 1,692,946 host configurations were exactly the same.

Thanks again for the suggestion. Any other ideas?

You have a lot of documents but, according to what you wrote on Stackoverflow, you have just 9000 per day. You should narrow your query to retrieve those 9000 only, instead of all of them, then you can iterate with the scroll API.

Another idea, going out of the box a bit, is to setup a processing pipeline using Ingest node https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html You can pre-process every event in real-time and set a "changed": true field on them if they have changed. This is very convenient because you can also have more than one pipeline, update them, etc.

Thanks again for the reply. Yes - we used to have only 9000 /day... But we turned on more stuff to watch more system properties. That kind of exhibits precisely what the issue is. I can't aggregate over the 'host' field if I don't know exactly what's in it -- and I don't want to retrieve every document when I can never know how many documents there are. The tools I have right now aren't scaling well yet.

Doing something with a pipeline like that sounds like a good way to do this and perhaps deal with some other issues. I'll look into that. Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.