I wish to index a number of research papers so that I may analyse such features as: word proximity, sentence length, word count, spelling variations etc.
I wish to index each document as a single unit of text together with its filename.
The information below appears applicable, but I would appreciate advice on the content of '_settings.json' and '_settings_folder.json'
Define explicit mapping/settings per job
Let’s say you created a job named job_name
and you are sending documents against an elasticsearch cluster running version 6.x
.
If you create the following files, they will be picked up at job start time instead of the default ones:
~/.fscrawler/{job_name}/_mappings/6/_settings.json
~/.fscrawler/{job_name}/_mappings/6/_settings_folder.json