Elasticsearch-mapper-attachments errors logs

gregv · September 30, 2015, 9:24am

Hi,

I am testing attachment with elasticsearch-mapper-attachments, a simple job to import document with curl :

i encountered some errors and warn :

[2015-09-29 23:49:53,013][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-09-29 23:50:15,416][WARN ][org.apache.fontbox.util.FontManager] Font not found: Tahoma
[2015-09-29 23:53:47,390][WARN ][org.apache.pdfbox.pdfparser.BaseParser] Specified stream length 127 is wrong. Fall back to reading stream until 'endstream'.

How can i add in elasticsearch.log, the filename regarding the error ?

Thanks

dadoonet · September 30, 2015, 9:41am

Elasticsearch mapper attachments is only getting here a BASE64 binary content. It does not know at all that it comes from a file (so a filename) or from a blob within your database or from a URL... Whatever...

So I'm afraid there is no way for doing that.

You should consider doing that on the client. So when you send a file, you know its filename and you can probably read the response from elasticsearch and knows that something goes wrong with file X.

gregv · September 30, 2015, 9:59am

Hi Dadoonet,

Thanks for your reply, of course i used base64 encoded with an ETL,
88 262 (44go) documents successfully parsed,indexed on 88 933 files.

When i am checking my client logs the response of curl still : "created":true
{"_index":"repo","_type":"attachment","_id":"AVAZBfl3Iw8zpJxWbnIQ","_version":1,"created":true}

Thanks.

dadoonet · September 30, 2015, 10:19am

Indeed. It's because we ignore errors by default.
You can change that with this.

gregv · September 30, 2015, 10:39am

My bad ! Missing this part ! Thanks. Reimporting docs...

gregv · October 1, 2015, 11:21am

Hi,

I created index with the index.mapping.attachment.ignore_errors set to false.

{
"test2" : {
"settings" : {
"index" : {
"index" : {
"mapping" : {
"attachment" : {
"ignore_errors" : "false",
"indexed_chars" : "-1"
}
}
},
"creation_date" : "1443609269356",
"number_of_shards" : "1",
"number_of_replicas" : "0",
"version" : {
"created" : "1070199"
},
"uuid" : "8GM9Ud9yQg6MJJbdSZuTAQ"
}
}
}
}
I encountered some errors but all response from elasticsearch are "created":true

[2015-09-30 23:31:00,763][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-09-30 23:45:34,565][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException

dadoonet · October 1, 2015, 11:40am

Can you open an issue, ideally with a link to a file we could reuse within a test ?

Topic		Replies	Views
Mapper attachment plugin fails to index document Elasticsearch	4	1547	July 5, 2017
Save base64 file ------mapper-attachments Elasticsearch	11	1296	July 5, 2017
Error putting base64 converted string into Elasticsearch Elasticsearch	4	3065	July 5, 2017
Need Help:Attachment ype in elastic search Elasticsearch	1	409	July 6, 2017
Attachment Mapper and Searching Elasticsearch	7	894	July 5, 2017

Elasticsearch-mapper-attachments errors logs

Related topics