ES scripting help

Ramon_Zaro · August 16, 2017, 2:48am

Hi all,
I am developing a filesearch solution using fscrawler to push data into ES 5.5.0

the issue I am facing is a fields explosion (it's crossing 1000 fields in no time) as fscrawler creates new fields dynamically, 95% of which are "meta.*" fields.
details of the issue are here if anyone is interested : Filesearch solution using ES 5.5.0

the solution I can see is using the remove processor in ingest node to get rid of the meta.* fields.

I have tried directly using remove on meta.* fields but that throws a javalang exception.

The only way around seems to be using script processor to extract the meta.* fields and then using the remove to get rid of them.
thing is, I have no experience of this kind of thing. how do I access the fields in ingest node in the first place ?
any pointers would be much appreciated.

spinscale · August 16, 2017, 1:16pm

if you shared what you did, along with a sample document, pipeline and your setup (mapping) it would be tremendously helpful.

You can also configure the properties as part of the pipeline configuraiton, see https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/using-ingest-attachment.html

Ramon_Zaro · August 28, 2017, 3:23am

sample document : could be anything, mostly .doc, docx, .pdf, .xls, .txt and some image/audio/video files.

because of this large variation in filetype I want just file.properties and content.

I am using ingest node from fscrawler to ES, as explained here : https://github.com/dadoonet/fscrawler#using-ingest-node-pipeline

fscrawler creates mappings automatically, as it encounters new fields.

I want to write a script to pick all "meta.*" fields it creates and get rid of them using 'remove' processor before ingesting the data in ES.

is that enough to go on or am I missing something ?

system · September 25, 2017, 3:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fscrawler - change the index mapping，reduce redundant field or object Elasticsearch	5	224	April 20, 2023
Using the Remove processor for ingest node Elasticsearch	8	3941	July 5, 2017
Fscrawler injest node pipeline Elasticsearch	2	528	November 13, 2017
Circuit_breaking_exception when creating an ingest pipeline Elasticsearch	2	864	April 24, 2018
Include specific fields from document using Ingest pipeline Elasticsearch ingest-pipeline	2	424	March 1, 2021

ES scripting help

Related topics