Metric aggregations on custom fields that use flattened field type

Hello,

I am looking for a way to index our documents in such a way that gives me the ability to run metric aggregations on custom number fields. Here's an example data structure we are indexing:

{
  name: 'My Element',
  fields: {
    UBI1h9vH1836n5JqNaGb: 'a string value',
    0L4ctTlt3agO1BLyI1WK: 'another string value',
    0wzYL8hIKUNEu5uIPt3V: 42,
    165N8Urzd8QOMgki898o: 10.25,
  },
}

We concatenate all custom text fields into an _allText property, which is a text mapping and allows for full text search across those custom text fields. I mention this because this was a great solution for working around the limitations of flattened while still providing full text search capabilities on those values. We have fields as a flattened field type to prevent mappings explosion, since across our index we have 1000s of custom user fields.

{
  // 'text'
  name: 'My Element',

  // 'flattened'
  fields: {
    UBI1h9vH1836n5JqNaGb: 'a string value!',
    0L4ctTlt3agO1BLyI1WK: 'another string value',
    0wzYL8hIKUNEu5uIPt3V: 42,
    165N8Urzd8QOMgki898o: 1021
  },

  // 'text' (concatenated values from all custom string fields)
  _allText: 'a string value! another string value',
}

What I would like to do next is index our documents in such a way that allows for aggregations on those custom number fields. The flattened field type does not allow metric aggregations. In this specific example I would like to do metric aggregations on those numeric 42 and 1021 values. For example, a sum operation that totals up every fields.0wzYL8hIKUNEu5uIPt3V in a query containing 100 documents.

Any ideas on how I should approach this?

There is an open issue for supporting numerics in flattened field type. See Support for a fully numeric flattened field · Issue #61550 · elastic/elasticsearch · GitHub

I don't have any concrete idea to fix this than to maybe try out nested and structure the numbered values like this:

field: [
  { key: 0wzYL8hIKUNEu5uIPt3V, value:  42 },
  { key: 165N8Urzd8QOMgki898o, value:  1021 }
]

I have not fully thought this through, but maybe you are able to retrieve the same numeric infos you need using nested filters in the aggs. Might be worth a try.

I hadn't thought to try nested. I'll do some testing to see if that could work. Will keep an eye on the GH issue as well. Thanks @spinscale this is very helpful.

Hey @spinscale have another thought on this. Would a painless script work here too? I've never used those before but the examples make a lot of references to metric aggregations: Painless examples for transforms | Elasticsearch Guide [8.1] | Elastic. If I need to support dynamic IDs, seems like a script might work, but I am not aware of painless' limitations.

Side note - I put together a very basic example of nested aggregation and its working well so far.

Hey,

indeed, a scripted metric aggregation should work, as it allows you to create arbitrary data from each document and do all the calculations by doing the proper filtering via painless.

Another relatively new feature are runtime fields - but I suppose that is not dynamic enough in your case with arbitrary keys/values.

One thing to keep in mind: When using aggregations the data is extracted from the index data structure, not from the _source, so you would probably still end up with some mapping?

--Alex

Not ready to dive into painless yet but good to know it may be an option.

That's a good thing to keep in mind with the additional mappings. Adding a new nested mapping containing the key/value pairs is definitely acceptable for us.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.