Return just some fields using Transform

jcruz · January 11, 2024, 7:44pm

Hi there!

I'm setting up a latest transform, and I would like to know if is it possible to return just some fields (from the original index) into the new transform index. I've tried to copy those needed fields and then remove the fields and _source fields using an ingest pipeline, but they are required to transform run.

Thanks in advance!

Melissa_Alvarez · January 11, 2024, 8:28pm

Hi! Would you be up for sharing the transform config?

If the transform is failing right now - to get a bit more insight into what might be going wrong could you check the messages tab on the Stack Management page for errors and share the error?

For latest transforms, the destination index is created with dynamic mappings so it's important to ensure the mappings for your destination index match the source index before you start your transform. You could use index templates (an example in this blog) or the create index api.

Once that's done - it should be possible to create an ingest pipeline to remove unnecessary fields - perhaps using the remove processor.

The transforms UI provides a dropdown selector for Destination ingest pipeline which will contain the pipeline you create for removing the fields.

przemekwitek · January 12, 2024, 7:46am

+1 to all Melissa said above.
Ingest pipeline should do the work.

I would like to know if is it possible to return just some fields (from the original index) into the new transform index.

Currently it is not supported directly in the backend, see [Transform] Support specifying a subset of fields in the `latest` transform · Issue #101795 · elastic/elasticsearch · GitHub

jcruz · January 12, 2024, 2:11pm

Hi @Melissa_Alvarez , thank you for your reply!

Actually the transform is not failing, it's working fine and returning a full copy of the latest document based on the unique field. What I would like to do is return just a few subset of fields from the original document, not the whole document. As you mentioned, I've tried to create a ingest pipeline to remove the majority of fields, but I got some errors while trying to remove the entire _source and the entire fields, after copy the necessary fields to a new_custom_field.

Thanks @przemekwitek , that feature should be exactly what I am looking for.

About the Ingest Pipeline, how could I create a ingest pipeline to remove almost all fields, except some selected fields? I imagine that It would be possible through a loop into all fields with some 'IF' condition to exclude the needed fields, however this approach concern me about too many extra processing work.

What I intend to do is keep track of all hostnames that is sending logs from the beats agent, so I can get when some host is for a long time without sending any data. So I do not need the entire document from beats, only the agent.hostname and a few others.

leandrojmp · January 12, 2024, 2:31pm

Do not remove the _source as this will lead to many issues, for example without the _source Kibana does not work correctly for your index and you cannot even see the data on it, so you need to keep the _source.

Unfortunatelly there is no prune processor where you can specify just some fields to be stored, on this similar post someone implemented the prune filter using painless, that could be used in an ingest pipeline, but as mentioned this can be resource intensive.

The easiest solution is to namely specify all the fields you want to remove.

jcruz · January 12, 2024, 2:53pm

Thanks @leandrojmp! To not store the _source field, I thought about setting the mapping _source.enabled: false. But I will still get the "fields" field indexed with all the data.

I think the best way is, as you mentioned, to specify the fields I want to remove using the remove field processor.

However, how can I determine that when I have many fields that must be dropped, this will also be resource intensive, given that the remove field processor will be run multiple times?

leandrojmp · January 12, 2024, 3:05pm

I mentioned this because Discover needs the _source to work, I had a similar issue last year when I set _source to false and then no data was show anymore on Discover, you can check the Elastic explanation here.

You would need to test and see if this impacts the performance, I don't think that the remove processor will impact anything, you can also test the script on the previous linked post, it may not impact in your case since Elastic has improved a lot in the past years.

jcruz · January 12, 2024, 7:24pm

Thank you @leandrojmp. I've tried do disable the _source field and it really do not show any data on Discovery. I've read your post and I agree that documentation do not mention it.

Thank you again for your help!

system · February 9, 2024, 7:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transforming documents after they have been retrieved? Elasticsearch	3	295	March 27, 2022
Is it possible for ES to drop every field that's not in mapping upon indexing Elasticsearch	5	1760	August 9, 2021
How to delete a field from an index while reindexing? Elasticsearch	7	13407	December 13, 2018
Using the Remove processor for ingest node Elasticsearch	8	3941	July 5, 2017
Ingest pipeline: copy all fields that contains a word to a single new field Elasticsearch	2	412	August 11, 2023

Return just some fields using Transform

Related topics