Metric aggregations on custom fields that use flattened field type

amkoehler · March 25, 2022, 7:59pm

Hello,

I am looking for a way to index our documents in such a way that gives me the ability to run metric aggregations on custom number fields. Here's an example data structure we are indexing:

{
  name: 'My Element',
  fields: {
    UBI1h9vH1836n5JqNaGb: 'a string value',
    0L4ctTlt3agO1BLyI1WK: 'another string value',
    0wzYL8hIKUNEu5uIPt3V: 42,
    165N8Urzd8QOMgki898o: 10.25,
  },
}

We concatenate all custom text fields into an _allText property, which is a text mapping and allows for full text search across those custom text fields. I mention this because this was a great solution for working around the limitations of flattened while still providing full text search capabilities on those values. We have fields as a flattened field type to prevent mappings explosion, since across our index we have 1000s of custom user fields.

{
  // 'text'
  name: 'My Element',

  // 'flattened'
  fields: {
    UBI1h9vH1836n5JqNaGb: 'a string value!',
    0L4ctTlt3agO1BLyI1WK: 'another string value',
    0wzYL8hIKUNEu5uIPt3V: 42,
    165N8Urzd8QOMgki898o: 1021
  },

  // 'text' (concatenated values from all custom string fields)
  _allText: 'a string value! another string value',
}

What I would like to do next is index our documents in such a way that allows for aggregations on those custom number fields. The flattened field type does not allow metric aggregations. In this specific example I would like to do metric aggregations on those numeric 42 and 1021 values. For example, a sum operation that totals up every fields.0wzYL8hIKUNEu5uIPt3V in a query containing 100 documents.

Any ideas on how I should approach this?

spinscale · March 28, 2022, 8:37am

There is an open issue for supporting numerics in flattened field type. See Support for a fully numeric flattened field · Issue #61550 · elastic/elasticsearch · GitHub

I don't have any concrete idea to fix this than to maybe try out nested and structure the numbered values like this:

field: [
  { key: 0wzYL8hIKUNEu5uIPt3V, value:  42 },
  { key: 165N8Urzd8QOMgki898o, value:  1021 }
]

I have not fully thought this through, but maybe you are able to retrieve the same numeric infos you need using nested filters in the aggs. Might be worth a try.

amkoehler · March 28, 2022, 2:35pm

I hadn't thought to try nested. I'll do some testing to see if that could work. Will keep an eye on the GH issue as well. Thanks @spinscale this is very helpful.

amkoehler · March 29, 2022, 9:34pm

Hey @spinscale have another thought on this. Would a painless script work here too? I've never used those before but the examples make a lot of references to metric aggregations: Painless examples for transforms | Elasticsearch Guide [8.1] | Elastic. If I need to support dynamic IDs, seems like a script might work, but I am not aware of painless' limitations.

Side note - I put together a very basic example of nested aggregation and its working well so far.

spinscale · March 30, 2022, 7:49am

Hey,

indeed, a scripted metric aggregation should work, as it allows you to create arbitrary data from each document and do all the calculations by doing the proper filtering via painless.

Another relatively new feature are runtime fields - but I suppose that is not dynamic enough in your case with arbitrary keys/values.

One thing to keep in mind: When using aggregations the data is extracted from the index data structure, not from the _source, so you would probably still end up with some mapping?

--Alex

amkoehler · March 30, 2022, 2:52pm

Not ready to dive into painless yet but good to know it may be an option.

That's a good thing to keep in mind with the additional mappings. Adding a new nested mapping containing the key/value pairs is definitely acceptable for us.

system · April 27, 2022, 2:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doing aggregations using flattened_fields is slow Elasticsearch	2	515	April 7, 2021
Aggregation with nested properties Elasticsearch	2	404	November 20, 2019
Aggregations do not add up float values Elasticsearch	1	1311	July 5, 2017
Custom Aggregations Elasticsearch	16	4870	November 4, 2022
Dynamic custom fields with numeric and date range queries Elasticsearch	1	371	July 29, 2021

Metric aggregations on custom fields that use flattened field type

Related topics