Ingest plugin .docx issue

Hi ,

i'm using ES 6.5.4 and ingest-plugin . It works fine with pdf , txt or any other file types, but breaks for .docx and .doc files.
The index gets created , but unable to parse and search .
Below is the output .
"attachment":{"content_type":"application/x-tika-ooxml","content_length":0}}

can you please help me resolving this issue ?

Could you share your .docx document? I'd like to try it.

Thanks for the reply . My docx file doesn't contain any image or special representations . its a simple , plain text file in docx . I would like to add one more piece of info . I have 2 different versions of ES in different servers . There are no issues with docx in ES 6.5.0 , but i'm facing issue with ES 6.5.4 .

But could you share it then?

hi , can you tell me how can to share the file ?

May be there? https://filebin.ca/

https://filebin.ca/4Z2J1tObkmTt/test123.docx

please use the above the file

I tried your document on a 6.6.1 version with:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "test",
    "processors": [
      {
        "attachment": {
          "field": "data"
        }
      },
      {
        "remove": {
          "field": "data"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_type": "_doc",
      "_id": "id",
      "_source": {
        "data": "***BASE 64 CONTENT***"
      }
    }
  ]
}

This gave:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "id",
        "_source" : {
          "attachment" : {
            "date" : "2019-03-04T12:03:00Z",
            "language" : "et",
            "content_type" : "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "author" : "Ananthmurthy Rao",
            "content" : """
1. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc ac faucibus odio. 

Vestibulum neque massa, scelerisque sit amet ligula eu, congue molestie mi. Praesent ut varius sem. Nullam at porttitor arcu, nec lacinia nisi. Ut ac dolor vitae odio interdum condimentum. Vivamus dapibus sodales ex, vitae malesuada ipsum cursus convallis. Maecenas sed egestas nulla, ac condimentum orci. Mauris diam felis, vulputate ac suscipit et, iaculis non est. Curabitur semper arcu ac ligula semper, nec luctus nisl blandit. Integer lacinia ante ac libero lobortis imperdiet. Nullam mollis convallis ipsum, ac accumsan nunc vehicula vitae. Nulla eget justo in felis tristique fringilla. Morbi sit amet tortor quis risus auctor condimentum. Morbi in ullamcorper elit. Nulla iaculis tellus sit amet mauris tempus fringilla.
""",
            "content_length" : 816
          }
        },
        "_ingest" : {
          "timestamp" : "2019-03-04T14:46:00.089544Z"
        }
      }
    }
  ]
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.