Elastic - flattening multiple fields

JimJ · October 8, 2021, 1:29pm

Hi everybody,

In one of the Elastic indices, I have a problem with some documents.
I receive JSON from filebeat and parse them with the json plugin.
But some of them contain a lot of fields.

Example response.0.field1->field28, response.1.field1->field28, response.2.field1->field28, ...

This goes up to more than response.400.*

I increased the index.mapping.total_fields.limit setting it to 5000, then to 8000 and now to 12000.

But I doubt somebody is using these fields.

So, to get rid of this and to go back to a more normal situation, I am thinking of flattening all response.0-999 fields.

Can I use a wildcard or regex as field name to mark fields as flattened in a template ?
Something like

"response.[0-9]*" : {
        "type": "flattened"
      }

Or must I be explicit and include all possible names in it ?

Another option is to prune response.000 fields with 000 upper than 10, thing I can do in logstash after parsing the json in black listing them.

Thank you for your attention and responses.

JimJ · October 14, 2021, 8:12am

Hi everybody,

No answer after a week most probably means it is not possible to use regex to flatten fields in a template.

I look at the other solution using logstash prune filter to blacklist/drop fields with names like response.[1-9][0-9]* but following this bug, prune fails to work on nested fields:

github.com/logstash-plugins/logstash-filter-prune

Failure when using prune on a nested field

opened 08:58PM - 05 Aug 16 UTC

3ygun

bug discuss

## Issue Pruning nested fields within `whitelist_names` & `blacklist_names` d…oesn't work. ## Need? I have a large amount of fields (>100) which are being processed. I want to prune down to a subset of those fields. However, the data looks as follows with a nested field having the same name as a non-nested field: ``` { ... "blam": "blam message", "bar": "bar message", ... "foo": { ... "blam": "foo blam message", "bar": "foo bar message", ... }, ... } ``` If I want to keep only `[foo][bar]` and `[blam]` how do I reference them while removing the rest? ## Test I have the test: `{"hi":"hello","test":{"bar":"1","ble":2}}` I want to run: ``` input { stdin { codec => json } } filter { prune { interpolate => true # Lets remove test.bar! blacklist_names => [ "[test][bar]" ] } } output { stdout { codec => rubydebug { metadata => true } } } ``` But it fails to do anything: ![prune on nested field](https://cloud.githubusercontent.com/assets/5482320/17450279/2c5c584a-5b14-11e6-8182-08f22219c231.PNG) I don't know if it's related to this issue: [Update remaining plugins with Event API](https://github.com/elastic/logstash/issues/5309)

Any idea/suggestion to remove nested fields based on their names + regexp ?

Mark_Harwood · October 14, 2021, 9:04am

Can I use a wildcard or regex as field name to mark fields as flattened in a template ?

Could you define this mapping:

and use an ingest processor or client code to convert docs
from "response.22.field1": "foo"
to "response" : { "22.field1" : "foo" }
or "response" : { "22": {"field1" : "foo" }}

JimJ · October 14, 2021, 1:50pm

Hi Mark,

I get problems with some of the response's field, not all of them.

I got text and parse it using json filter creating a tmp_json object.
My json looks like this:

"tmp_json" : {
  "entity" : {
    "response" : {
      "status" : "value",
      "error" : "value2",
      "0" : { "field-0" : "value", ... }
      "1" : { "field-0" : "value", ... }
      "2" : { "field-0" : "value", ... }
      "3" : { "field-0" : "value", ... }
      ...
      "999" : { "field-0" : "value", ... }
    }
  }
}

This ends up to a mapping explosion.
I do not want to flatten the whole response field or re-parse it.
I would like to remove some nested fields with names like response.xx with xx greater than 9.

I tried this piece of ruby, it runs through the json but it does not remove fields

ruby {
    code => "
        begin
            keys = event.get('[tmp_json][entity][response]').to_hash.keys
            keys.each{|key|
                if ( key =~ /[1-9][0-9]/ )
                    event.remove(key)
                end
            }
        end
        "
}

JimJ · October 14, 2021, 1:56pm

Okay.

I replaced this:

event.remove(key)

with this:

event.remove('[tmp_json][entity][response][' + key + ']')

and it works.

I'll rework the regexp to be more specific.

Mark_Harwood · October 14, 2021, 2:01pm

Ah ok. I thought from your first example you had field names with dots in it.
You have proper objects but they have a mix of fields you want mapped and those you don't held at the same level.
Pretty sure that will require some custom code in your client or an ingest processor to tidy up.

Not a ruby expert I'm afraid. Probably a question for another forum.

system · November 11, 2021, 2:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.