Hi,
Here is the full log (today) : Log
https://gist.github.com/anonymous/ef0cbf956714cf9b138f
this log contains other kind of error i made like typo on curl.. not
revelant for the indexing problem.
Most files are less than 2Mo. I had a problem with a 80Mo .rtf file but the
file was corrupted.
I'm not able to attach documents :
- high confidentiality
- Elasticsearch (parsers) does not produce interesting log, no filename,
document reference or any usefull infos, i c'ant find wich file made crash
the server. And the process does not run on a local server. - I did not handle errors correctly and now i can't determine it. And i
can't re index all file now
If i can find one, i'll post it.
But, like i said, i understand that a parser can crash because the file is
too big, corrupted... but Elasticsearch should not crash too ?
Thank you.
Le mercredi 9 juillet 2014 15:45:04 UTC+11, David Pilato a écrit :
Could you gist the full logs?
Do you have some "big" attachments?
Could you copy some failing attachments to bintray or any other service
and paste the link here?--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocsLe 9 juil. 2014 à 05:42, aurelien bax <pico...@gmail.com <javascript:>> a
écrit :Hi,
i'm trying to index 11 000 documents (pdf, word...).
My conf :
elasticSearch 1.2.1 , elasticsearch-river-jdbc-1.2.1.1-plugin.zip,
elasticsearch-mapper-attachments/2.0.0 on a Debian server.I'm using elasticSearch-php. I don't think that posting my code is usefull.
I'm obliged to make small batches (from 50 to 200) because the parsers
raises exception and Elasticsearch is stopped...I need to restart the server, re run the previous batch..
I m not reindexing all the docs, before starting the batch, the script
tries to check if a doc is already indexed and then skip it.When i run previous batch it nearly always work without crashing
Elasticsearch.So, i understand that the parsers can not handle every files (for many
reasons) but, why does it crash Elasticsearch ?Why the execptions are not handled instead of crashing everything ?
Is there a way to handle exceptions before Elasticsearch chrash ?
Sample log errors :
[WARN ][org.apache.tika.parser.microsoft.AbstractPOIFSExtractor] Ignoring
unexpected exception while parsing summary entry ^ESummaryInformation
java.io.UnsupportedEncodingException: Codepage number may not be 0[WARN ][org.apache.pdfbox.pdfparser.XrefTrailerResolver] Did not found
XRef object at specified startxref position 730864
[2014-07-09 10:51:40,078][WARN ][org.apache.pdfbox.pdfparser.BaseParser]
Specified stream length 587 is wrong. Fall back to reading stream until
'endstream'.[org.apache.pdfbox.pdfparser.BaseParser] Specified stream length 952 is
wrong. Fall back to reading stream until 'endstream'.
[2014-07-09
11:52:08,044][ERROR][org.apache.pdfbox.pdmodel.font.PDSimpleFont] Can't
determine the width of the space character using 250 as defaultThank you
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5e682100-4bce-48c7-b43d-78ccd85b5750%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5e682100-4bce-48c7-b43d-78ccd85b5750%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b97c3ba8-fa27-40ac-a3c6-aa820bc6408a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.