File uploaded using FS Crawler - only the name of file is stored in content filed

HI,
I was able to upload various kinds of files using FS Crawler to ES. However when I upload a bam files (which is a huge files generated in WGS pipelines), only the name of the file is getting stored in the content filed.
I would like to know if there is a way ES can read the contents of the file and index the same so that we can search based in the content.
Are there any limitation (or specific files for which the content is not extracted)?. Any pointers or help in this is much appreciated.

Thanks and Regards

Sarath Mnaikonda

Is the BAM type one of those supported formats?

I could not see the .bam file specifically. But it is a binary file and i assumed that it would work. I had two questions here:

  1. There is not error generated or no warning shown that the file format is not supported or something similar.
  2. I am able to see the metadata of the file and why is it that the file name is stored in the content (field) of the document.

Appreciate if you clarify these two points. I am working with my team to get more details to see if there are any alternative options we can consider (like a different file format or a different file itself)

FSCrawler tried its best to extract the most information as possible. Some metadata are generated by FSCrawler.
The rest is extracted by Apache Tika.
If Tika can not extract content, it is just ignored.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.