Hi,
I found following list of formats supported by Tika
http://tika.apache.org/1.14/formats.html#Full_list_of_Supported_Formats
What all file formats are supported by Ingest attachment plugin? Does it support all formats supported by Tika?
Thanks
Hi,
I found following list of formats supported by Tika
http://tika.apache.org/1.14/formats.html#Full_list_of_Supported_Formats
What all file formats are supported by Ingest attachment plugin? Does it support all formats supported by Tika?
Thanks
No. Only a subset are supported.
Mainly open office documents, office documents but Visio and PDF documents.
Adding to this that FSCrawler project supports all format as it's running outside an elasticsearch node.
@dadoonet
thanks for the prompt response.
Is this the complete set of supported types?
MS Office docs: .doc, .docx, .xls, .xlsx, .ppt, .pptx
TXT docs: .rtf, .txt, .csv
oOo docs: .odt, .sxw, .ods, .sxc, .odp, .sxi
PDF docs: .pdf
I'm just gathering the list of supported types. Please add/remove the supported types accordingly.
Only way to have a precise answer is to test it.
But here the list of all what we are testing: https://github.com/elastic/elasticsearch/tree/master/plugins/ingest-attachment/src/test/resources/org/elasticsearch/ingest/attachment/test/tika-files
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.