I have tried this:
-
I have used tika with the python code i shared and it takes the data as '.keyword', but it doesn't show the count of individual words in a pdf file.
-
I have used fscrawler, it takes the data as content and not as '.keyword' format, so even the field doesn't show in visualization tab.
- Using ingest plugin, am still working on it, am not exactly finding a way to index a pdf file, am going through lot of issues. Will work on that.
You asked me to provide a script but from the types I went through doesn't require them. All I need to do is give the directory name in which files are stored, then it will do the work for me !
I have been working on this for days now, and I lost my belief that elasticsearch will be able to individually count the words in a pdf file.
Can you please give me some references where someone had did it really, becaause I don't want to waste anymore time on this !
You are my only hope. Please help !
Regards,
Manas
