I am testing attachment with elasticsearch-mapper-attachments, a simple job to import document with curl :
i encountered some errors and warn :
[2015-09-29 23:49:53,013][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-09-29 23:50:15,416][WARN ][org.apache.fontbox.util.FontManager] Font not found: Tahoma
[2015-09-29 23:53:47,390][WARN ][org.apache.pdfbox.pdfparser.BaseParser] Specified stream length 127 is wrong. Fall back to reading stream until 'endstream'.
How can i add in elasticsearch.log, the filename regarding the error ?
Elasticsearch mapper attachments is only getting here a BASE64 binary content. It does not know at all that it comes from a file (so a filename) or from a blob within your database or from a URL... Whatever...
So I'm afraid there is no way for doing that.
You should consider doing that on the client. So when you send a file, you know its filename and you can probably read the response from elasticsearch and knows that something goes wrong with file X.
Thanks for your reply, of course i used base64 encoded with an ETL,
88 262 (44go) documents successfully parsed,indexed on 88 933 files.
When i am checking my client logs the response of curl still : "created":true
{"_index":"repo","_type":"attachment","_id":"AVAZBfl3Iw8zpJxWbnIQ","_version":1,"created":true}
[2015-09-30 23:31:00,763][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
[2015-09-30 23:45:34,565][ERROR][org.apache.pdfbox.filter.FlateFilter] FlateFilter: stop reading corrupt stream due to a DataFormatException
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.