REINDEX : identify fields to remove with a regular expression?

TheEvilDonut · January 24, 2022, 7:34pm

Hello.

Here's the situation. A source shipping logs to our cluster has messed up and sent us a large number of documents containing undesirable fields. The thing is that the fields names are almost random, there is no way to list them all. There is a common pattern to all of them but also each of them differ in some way.

I am searching for a way to reindex the documents so I can remove said fields. However so far all the ways I have found imply that the name of the fields to be removed must be known. Is there any way to maybe create a pipeline that could isolate the fields using a regular expression or any other way I am not seeing?

Any suggestion you might have will be very welcome.

Tomo_M · January 25, 2022, 1:00am

Have you tried Ingest Pipeline with script processors? It could be a solution. Regular expression is supported in painless.

warkolm · January 25, 2022, 1:17am

I've raised Remove all fields other than a defined list · Issue #83010 · elastic/elasticsearch · GitHub to see if we can add an option to define a list of fields to keep, and then remove all the others, for the remove ingest processor.

It doesn't help you now sorry!

Tomo_M · January 25, 2022, 1:43am

Or if you can list up all the fields you need and set mappings for them, set dynamic: false on new index could be an option.

TheEvilDonut · January 25, 2022, 11:55am

That is precisely what I am trying to do. However what I am not seeing is how to isolate the fields to be removed without specifically naming each one by name and removing them. What I am not finding is how to, say, loop on them by using a regular expression since there are all different in name save for one common pattern that repeats itself in their name.

TheEvilDonut · January 25, 2022, 11:56am

Thank you! Much appreciated!

TheEvilDonut · January 25, 2022, 11:57am

I cannot list them, which is actually the exact problem with my situation.

TheEvilDonut · January 25, 2022, 12:14pm

Oh wait. I see what you mean here. Sorry my currently uncafeinated stated caused me to misread you post.

Will explore this option for sure.

TheEvilDonut · February 1, 2022, 1:03pm

Well that worked. Hopefully a more flexible solution will exist in a not-too-far future but your suggestion worked for the time being. Thank you good sir!

system · March 1, 2022, 1:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Delete a field matching a regex Elasticsearch	5	2091	July 11, 2018
How to delete a field from an index while reindexing? Elasticsearch	7	13407	December 13, 2018
Reindex w/ a regex Elasticsearch	2	280	March 23, 2023
Remove specific fields from index pattern Elasticsearch	2	1875	August 19, 2020
Using the Remove processor for ingest node Elasticsearch	8	3941	July 5, 2017

REINDEX : identify fields to remove with a regular expression?

Related topics