i'm using ES 6.5.4 and ingest-plugin . It works fine with pdf , txt or any other file types, but breaks for .docx and .doc files.
The index gets created , but unable to parse and search .
Below is the output .
"attachment":{"content_type":"application/x-tika-ooxml","content_length":0}}
Thanks for the reply . My docx file doesn't contain any image or special representations . its a simple , plain text file in docx . I would like to add one more piece of info . I have 2 different versions of ES in different servers . There are no issues with docx in ES 6.5.0 , but i'm facing issue with ES 6.5.4 .
{
"docs" : [
{
"doc" : {
"_index" : "index",
"_type" : "_doc",
"_id" : "id",
"_source" : {
"attachment" : {
"date" : "2019-03-04T12:03:00Z",
"language" : "et",
"content_type" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"author" : "Ananthmurthy Rao",
"content" : """
1. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac faucibus odio.
Vestibulum neque massa, scelerisque sit amet ligula eu, congue molestie mi. Praesent ut varius sem. Nullam at porttitor arcu, nec lacinia nisi. Ut ac dolor vitae odio interdum condimentum. Vivamus dapibus sodales ex, vitae malesuada ipsum cursus convallis. Maecenas sed egestas nulla, ac condimentum orci. Mauris diam felis, vulputate ac suscipit et, iaculis non est. Curabitur semper arcu ac ligula semper, nec luctus nisl blandit. Integer lacinia ante ac libero lobortis imperdiet. Nullam mollis convallis ipsum, ac accumsan nunc vehicula vitae. Nulla eget justo in felis tristique fringilla. Morbi sit amet tortor quis risus auctor condimentum. Morbi in ullamcorper elit. Nulla iaculis tellus sit amet mauris tempus fringilla.
""",
"content_length" : 816
}
},
"_ingest" : {
"timestamp" : "2019-03-04T14:46:00.089544Z"
}
}
}
]
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.