I'm setting up a latest transform, and I would like to know if is it possible to return just some fields (from the original index) into the new transform index. I've tried to copy those needed fields and then remove the fields and _source fields using an ingest pipeline, but they are required to transform run.
Hi! Would you be up for sharing the transform config?
If the transform is failing right now - to get a bit more insight into what might be going wrong could you check the messages tab on the Stack Management page for errors and share the error?
For latest transforms, the destination index is created with dynamic mappings so it's important to ensure the mappings for your destination index match the source index before you start your transform. You could use index templates (an example in this blog) or the create index api.
Once that's done - it should be possible to create an ingest pipeline to remove unnecessary fields - perhaps using the remove processor.
The transforms UI provides a dropdown selector for Destination ingest pipeline which will contain the pipeline you create for removing the fields.
Actually the transform is not failing, it's working fine and returning a full copy of the latest document based on the unique field. What I would like to do is return just a few subset of fields from the original document, not the whole document. As you mentioned, I've tried to create a ingest pipeline to remove the majority of fields, but I got some errors while trying to remove the entire _source and the entire fields, after copy the necessary fields to a new_custom_field.
Thanks @przemekwitek , that feature should be exactly what I am looking for.
About the Ingest Pipeline, how could I create a ingest pipeline to remove almost all fields, except some selected fields? I imagine that It would be possible through a loop into all fields with some 'IF' condition to exclude the needed fields, however this approach concern me about too many extra processing work.
What I intend to do is keep track of all hostnames that is sending logs from the beats agent, so I can get when some host is for a long time without sending any data. So I do not need the entire document from beats, only the agent.hostname and a few others.
Do not remove the _source as this will lead to many issues, for example without the _source Kibana does not work correctly for your index and you cannot even see the data on it, so you need to keep the _source.
Unfortunatelly there is no prune processor where you can specify just some fields to be stored, on this similar post someone implemented the prune filter using painless, that could be used in an ingest pipeline, but as mentioned this can be resource intensive.
The easiest solution is to namely specify all the fields you want to remove.
I mentioned this because Discover needs the _source to work, I had a similar issue last year when I set _source to false and then no data was show anymore on Discover, you can check the Elastic explanation here.
You would need to test and see if this impacts the performance, I don't think that the remove processor will impact anything, you can also test the script on the previous linked post, it may not impact in your case since Elastic has improved a lot in the past years.