File uploaded using FS Crawler - only the name of file is stored in content filed

smankon · April 10, 2020, 2:51pm

HI,
I was able to upload various kinds of files using FS Crawler to ES. However when I upload a bam files (which is a huge files generated in WGS pipelines), only the name of the file is getting stored in the content filed.
I would like to know if there is a way ES can read the contents of the file and index the same so that we can search based in the content.
Are there any limitation (or specific files for which the content is not extracted)?. Any pointers or help in this is much appreciated.

Thanks and Regards

Sarath Mnaikonda

dadoonet · April 10, 2020, 3:49pm

Is the BAM type one of those supported formats?

smankon · April 10, 2020, 4:55pm

I could not see the .bam file specifically. But it is a binary file and i assumed that it would work. I had two questions here:

There is not error generated or no warning shown that the file format is not supported or something similar.
I am able to see the metadata of the file and why is it that the file name is stored in the content (field) of the document.

Appreciate if you clarify these two points. I am working with my team to get more details to see if there are any alternative options we can consider (like a different file format or a different file itself)

dadoonet · April 10, 2020, 6:42pm

FSCrawler tried its best to extract the most information as possible. Some metadata are generated by FSCrawler.
The rest is extracted by Apache Tika.
If Tika can not extract content, it is just ignored.

system · May 8, 2020, 6:42pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to access filename from index created by fscrawler Elasticsearch	2	374	July 14, 2019
To store and search a File content using elastic search Elasticsearch	12	3218	July 13, 2020
Index binary files (PDF, ...) Elasticsearch	20	3861	July 5, 2017
Index Db content and linked Filesystem content Elasticsearch	3	669	September 11, 2017
Recommended workflow for indexing many binary docs Elasticsearch	4	759	July 6, 2021

File uploaded using FS Crawler - only the name of file is stored in content filed

Related topics