Full list of supported document formats by ES

Hi,

I found following list of formats supported by Tika
http://tika.apache.org/1.14/formats.html#Full_list_of_Supported_Formats

What all file formats are supported by Ingest attachment plugin? Does it support all formats supported by Tika?

Thanks

No. Only a subset are supported.

Mainly open office documents, office documents but Visio and PDF documents.

Adding to this that FSCrawler project supports all format as it's running outside an elasticsearch node.

@dadoonet
thanks for the prompt response.
Is this the complete set of supported types?

MS Office docs: .doc, .docx, .xls, .xlsx, .ppt, .pptx
TXT docs: .rtf, .txt, .csv
oOo docs: .odt, .sxw, .ods, .sxc, .odp, .sxi
PDF docs: .pdf

I'm just gathering the list of supported types. Please add/remove the supported types accordingly.

Only way to have a precise answer is to test it.

But here the list of all what we are testing: https://github.com/elastic/elasticsearch/tree/master/plugins/ingest-attachment/src/test/resources/org/elasticsearch/ingest/attachment/test/tika-files

@dadoonet: Thank you, this information should be enough for me :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.