Could you open an issue in mapper attachment project and add all details?
Can you see any dump file in elasticsearch dir?
--
David 
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 9 juil. 2014 à 07:36, aurelien bax picolo74@gmail.com a écrit :
Hi,
Here is the full log (today) : Log
this log contains other kind of error i made like typo on curl.. not revelant for the indexing problem.
Most files are less than 2Mo. I had a problem with a 80Mo .rtf file but the file was corrupted.
I'm not able to attach documents :
- high confidentiality
- Elasticsearch (parsers) does not produce interesting log, no filename, document reference or any usefull infos, i c'ant find wich file made crash the server. And the process does not run on a local server.
- I did not handle errors correctly and now i can't determine it. And i can't re index all file now

If i can find one, i'll post it.
But, like i said, i understand that a parser can crash because the file is too big, corrupted... but Elasticsearch should not crash too ?
Thank you.
Le mercredi 9 juillet 2014 15:45:04 UTC+11, David Pilato a écrit :
Could you gist the full logs?
Do you have some "big" attachments?
Could you copy some failing attachments to bintray or any other service and paste the link here?
--
David 
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 9 juil. 2014 à 05:42, aurelien bax pico...@gmail.com a écrit :
Hi,
i'm trying to index 11 000 documents (pdf, word...).
My conf :
elasticSearch 1.2.1 , elasticsearch-river-jdbc-1.2.1.1-plugin.zip, elasticsearch-mapper-attachments/2.0.0 on a Debian server.
I'm using elasticSearch-php. I don't think that posting my code is usefull.
I'm obliged to make small batches (from 50 to 200) because the parsers raises exception and Elasticsearch is stopped...
I need to restart the server, re run the previous batch..
I m not reindexing all the docs, before starting the batch, the script tries to check if a doc is already indexed and then skip it.
When i run previous batch it nearly always work without crashing Elasticsearch.
So, i understand that the parsers can not handle every files (for many reasons) but, why does it crash Elasticsearch ?
Why the execptions are not handled instead of crashing everything ?
Is there a way to handle exceptions before Elasticsearch chrash ?
Sample log errors :
[WARN ][org.apache.tika.parser.microsoft.AbstractPOIFSExtractor] Ignoring unexpected exception while parsing summary entry ^ESummaryInformation
java.io.UnsupportedEncodingException: Codepage number may not be 0
[WARN ][org.apache.pdfbox.pdfparser.XrefTrailerResolver] Did not found XRef object at specified startxref position 730864
[2014-07-09 10:51:40,078][WARN ][org.apache.pdfbox.pdfparser.BaseParser] Specified stream length 587 is wrong. Fall back to reading stream until 'endstream'.
[org.apache.pdfbox.pdfparser.BaseParser] Specified stream length 952 is wrong. Fall back to reading stream until 'endstream'.
[2014-07-09 11:52:08,044][ERROR][org.apache.pdfbox.pdmodel.font.PDSimpleFont] Can't determine the width of the space character using 250 as default
Thank you
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5e682100-4bce-48c7-b43d-78ccd85b5750%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b97c3ba8-fa27-40ac-a3c6-aa820bc6408a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/74A79AA8-B221-472D-8760-33385D279F06%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.