Hi ES team, newbie ES user. We have a use case where we want to store signals from various hosts inside ES (as documents of course) and query them later based on a specific field value. Each event might have different fields. Let's say:
event1:
ID: <hostname> field1: <test> field2: "abc" field3: "def"
event2:
ID: <hostname> field4: <test> field5: <test> field6: <test>
We want to consolidate all signals into one doc like:
ID: <hostname> field1: <test> field2: "abc" field3: "def" field4: <test> field5: <test> field6: <test>
Consolidating docs here since we can perform filtering based on specific field. Like GET all docs where field2="abc". I assume this is normal?
Now another event comes in
event3: (note field 2 value is same; also this is a partial update to the doc?)
ID: <hostname> field1: <test> field2: "abc" field3: <test>
Here are my questions:
-
Is this a good thing to do inside ES? I'm talking about consolidating fields like this with Update API, since we do not define schema upfront.
-
I read the Update API docs that when we send partial updates to ES with a specific ID (hostname in this case), internally, ES retrieves the doc > change the doc's field > persist the doc > re-indexes this new doc > removes the old doc. Is this understanding correct? Also, is there a version change and how quickly is the old doc deleted? How resource intensive is this?
-
Importantly, when we send the something like
event3
, where there is no change to the doc's fields, what is the internal process there? Is the document still re-indexed or not? I read on stackoverflow that"result": "noop"
is returned and no re-indexing is done. Is this understanding correct? Does this also work for the bulk API?
Sorry for so many questions. Appreciate your help!
EDIT1:
We have a group of services that executes changes on all our hosts (app deploy, reboots, etc.). We want to track these events/changes. The idea here is to emit an event from these set of services when a change/event happens.
event1:
timestamp: <time> ID: <hostname> appVersion: <test>
event2:
timestamp: <time> ID: <hostname> health: <test>
event3:
timestamp: <time> ID: <hostname> osInstalled: <test>
As you can see, each of these events have a time and ID associated to them, but emit different fields (because they are separate agents; long story, but that's how it is setup right now).
What we want is 2 things:
-
We want to fetch all hostnames that have a specific appVersion. But we also want to display their health and osInstalled field on the UI. Another scenario, we want to fetch all hosts based on health field (healthy or unhealthy hosts), but also want to display their other fields. There can be many filters like this
-
We also want to get an event log of each host. So bascially, "fetch all events of host1" will return all raw/unmerged events
We were thinking to have 2 indexes.
- "Status" index where we keep updating each doc with these events fields, so there is one single place where we have all fields consolidated and can be filtered and queried as mentioned in "1." above
- "History" index where we just log these events, so write only