Update doc only if doc changes? noop?

Hi ES team, newbie ES user. We have a use case where we want to store signals from various hosts inside ES (as documents of course) and query them later based on a specific field value. Each event might have different fields. Let's say:

event1:
ID: <hostname> field1: <test> field2: "abc" field3: "def"

event2:
ID: <hostname> field4: <test> field5: <test> field6: <test>

We want to consolidate all signals into one doc like:
ID: <hostname> field1: <test> field2: "abc" field3: "def" field4: <test> field5: <test> field6: <test>
Consolidating docs here since we can perform filtering based on specific field. Like GET all docs where field2="abc". I assume this is normal?

Now another event comes in

event3: (note field 2 value is same; also this is a partial update to the doc?)
ID: <hostname> field1: <test> field2: "abc" field3: <test>

Here are my questions:

  1. Is this a good thing to do inside ES? I'm talking about consolidating fields like this with Update API, since we do not define schema upfront.

  2. I read the Update API docs that when we send partial updates to ES with a specific ID (hostname in this case), internally, ES retrieves the doc > change the doc's field > persist the doc > re-indexes this new doc > removes the old doc. Is this understanding correct? Also, is there a version change and how quickly is the old doc deleted? How resource intensive is this?

  3. Importantly, when we send the something like event3, where there is no change to the doc's fields, what is the internal process there? Is the document still re-indexed or not? I read on stackoverflow that "result": "noop" is returned and no re-indexing is done. Is this understanding correct? Does this also work for the bulk API?

Sorry for so many questions. Appreciate your help!

EDIT1:

We have a group of services that executes changes on all our hosts (app deploy, reboots, etc.). We want to track these events/changes. The idea here is to emit an event from these set of services when a change/event happens.

event1:
timestamp: <time> ID: <hostname> appVersion: <test>

event2:
timestamp: <time> ID: <hostname> health: <test>

event3:
timestamp: <time> ID: <hostname> osInstalled: <test>

As you can see, each of these events have a time and ID associated to them, but emit different fields (because they are separate agents; long story, but that's how it is setup right now).

What we want is 2 things:

  1. We want to fetch all hostnames that have a specific appVersion. But we also want to display their health and osInstalled field on the UI. Another scenario, we want to fetch all hosts based on health field (healthy or unhealthy hosts), but also want to display their other fields. There can be many filters like this

  2. We also want to get an event log of each host. So bascially, "fetch all events of host1" will return all raw/unmerged events

We were thinking to have 2 indexes.

  1. "Status" index where we keep updating each doc with these events fields, so there is one single place where we have all fields consolidated and can be filtered and queried as mentioned in "1." above
  2. "History" index where we just log these events, so write only

Why do you want that? It makes ingest an order of magnitude harder.

All these events do not know all the fields, since they are emitted from different sources. Like publisher1 only knows field1..3, and publisher2 knows field4..6. The only common field in them is the ID (hostname)

We have a UI which would filter all host based on a specific field value and then show all the fields (field1..6) associated with that host. e.g. show me all hosts with a specific role (role here is part of event emitted by publisher1)

Are you saying it is harder because update require read>create new>reindex>delete old? What other alternative do we have then?

What sort of signals are they?
Do they have timestamps?
Why do they need to be collapsed if they are from different publishers?

If we can understand what you have and what you are trying to do with a bit more clarity, we may be able to suggest solutions.

@warkolm updated my post. See "EDIT1"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.