Ideas? Evaluating enums and associating array values

Hey all. I'm new to the whole Elastic Stack and have been replacing some spaghetti-code in-house logging solutions recently. Something that was very easy when scripting a custom website was fetching and displaying different categories of logs in exactly the format I wanted, and while Elasticsearch has deftly handled a large amount of my organization's needs, some of our data is reported in a way that is completely incomprehensible on a document-by-document basis.

I tagged this post with "runtime" and "transforms" as runtime fields and dynamic transforms seem to be the key to what I need to accomplish, but I'm in waaay over my head and I'm really just looking for someone to point me in the right direction on how I should tackle this problem.

My goal is to group documents together based on a field whose value they all share, display them in time-order, and evaluate their values into readable information. Here is a mockup of what my ingested data looks like:

doc_0: {
  internal_id: 200,
  linked_ids: [1001, 1003],
  linked_names: ["Object 1", "Object 3"],
  enum_category: [8, 9],
  enum_entry: [1, 2]
}
doc_1: {
  internal_id: 200,
  linked_ids: [1001, 1003, 1002],
  linked_names: ["Object 1", "Object 3", "Object 2"],
  enum_category: [3, 6, 9],
  enum_entry: [1, 2, 3]
}
doc_2: {
  internal_id: 200,
  linked_ids: [1003, 1002],
  linked_names: ["Object 3", "Object 2"],
  enum_category: [-1, 6],
  enum_entry: [-1, 8]
}

Here is what, ideally, the data would be formatted like for the end user, after querying for "internal_id : 200":

formatted_doc_0: {
  Object 1 (1001): Category EIGHT entry ONE,  # Enums are evaluated to human-readable labels and associated with objects at the same index in other array fields
  Object 3 (1003): Category NINE entry TWO
},
formatted_doc_1: {
  Object 1 (1001): Category THREE entry ONE,
  Object 3 (1003): Category SIX entry TWO,
  Object 2 (1002): Category NINE entry THREE  # Objects maintain their order in the report despite nonsequential values
},
formatted_doc_2: {
  Object 3 (1003): Invalid,  # Validation of values must occur, however the value space, including invalid values, is defined and finite
  Object 2 (1002): Category SIX entry EIGHT
}

I imagine this is a tall order, however given the vast number of configurable options in the Elastic Stack, I imagine there's some way to make this happen. I don't need working Painless code or anything, I just need a good shove towards what aspects of the stack I should be researching in order to make this happen. Even if the shuffling-around of the linked values isn't feasible, it's absolutely necessary that I can evaluate these enums into useful values in search results, and last time I tried to figure out how runtime fields work, I accidentally destroyed my testing data.

Thank you all for any assistance!

I don't think this is a transform problem, because your docs already contain all information necessary to answer the query.

TL/DR

In a nutshell there are 2 types of operations: map and reduce.

A map operation takes 1 document and produces 1 document out of it.
A reduce operation takes n documents and produces 1 document out of it, e.g. it groups or aggregates.

Your 3 example docs already contain the condensed info. The reduction already happened upstream. Who put together Object 1 and Object 3, the category and entry enums? A Transform would be useful to form such documents, e.g. you receive single documents/events and want to create doc_x from it.

But you have 3 input docs and expect 3 output docs. The formatted_doc_x documents are simply a different form of their raw counter parts.

Or do you want 1 document which contains something like:

{
  "docs": [
    formatted_doc_0: {...},
    formatted_doc_1: {...},
    formatted_doc_2: {...}, 
  ]
}

In this case you could use transform. However, it does not seem to be necessary. This could be represented in the result set.

I suggest to have a look at ingest pipelines, so you can format the docs as you ingest them into elasticsearch. Given the complicated requirements, I think you need a script processor.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.