ES scripting help


(Ramon) #1

Hi all,
I am developing a filesearch solution using fscrawler to push data into ES 5.5.0

the issue I am facing is a fields explosion (it's crossing 1000 fields in no time) as fscrawler creates new fields dynamically, 95% of which are "meta.*" fields.
details of the issue are here if anyone is interested : Filesearch solution using ES 5.5.0

the solution I can see is using the remove processor in ingest node to get rid of the meta.* fields.

I have tried directly using remove on meta.* fields but that throws a javalang exception.

The only way around seems to be using script processor to extract the meta.* fields and then using the remove to get rid of them.
thing is, I have no experience of this kind of thing. how do I access the fields in ingest node in the first place ?
any pointers would be much appreciated.


(Alexander Reelsen) #2

if you shared what you did, along with a sample document, pipeline and your setup (mapping) it would be tremendously helpful.

You can also configure the properties as part of the pipeline configuraiton, see https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/using-ingest-attachment.html


(Ramon) #3

sample document : could be anything, mostly .doc, docx, .pdf, .xls, .txt and some image/audio/video files.

because of this large variation in filetype I want just file.properties and content.

I am using ingest node from fscrawler to ES, as explained here : https://github.com/dadoonet/fscrawler#using-ingest-node-pipeline

fscrawler creates mappings automatically, as it encounters new fields.

I want to write a script to pick all "meta.*" fields it creates and get rid of them using 'remove' processor before ingesting the data in ES.

is that enough to go on or am I missing something ?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.