Mapper-attachment plugin - java.io.UnsupportedEncodingException: Codepage number may not be


(SK) #1

Hi All,

We are running ES 2.3.2 with mapper-attachment plugin. For some of the documents we are seeing java.io.UnsupportedEncodingException issue and issue like [ERROR][org.apache.pdfbox.pdmodel.font.PDSimpleFont] Can't determine the width of the space character using 250 as default
java.security.AccessControlException: access denied ("java.io.FilePermission" "/opt/weblogic/.fonts" "read")

Could you please provide some insights into how to resolve this issue.

Log1:
java.security.AccessControlException: access denied ("java.io.FilePermission" "/opt/weblogic/.fonts" "read")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
at java.io.File.exists(File.java:814)
at org.apache.fontbox.util.autodetect.NativeFontDirFinder.find(NativeFontDirFinder.java:44)
at org.apache.fontbox.util.autodetect.FontFileFinder.find(FontFileFinder.java:74)
at org.apache.fontbox.util.FontManager.loadFonts(FontManager.java:65)
at org.apache.fontbox.util.FontManager.findTTFontname(FontManager.java:290)
at org.apache.fontbox.util.FontManager.findTTFont(FontManager.java:326)
at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getTTFFont(PDTrueTypeFont.java:638)
at org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.getFontWidth(PDTrueTypeFont.java:673)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.getFontWidth(PDSimpleFont.java:231)
at org.apache.pdfbox.pdmodel.font.PDSimpleFont.getSpaceWidth(PDSimpleFont.java:533)
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:458)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:383)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:342)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:148)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:148)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:537)
at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:94)
at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:91)
at java.security.AccessController.doPrivileged(Native Method)
at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:91)
at org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:481)

Log2:

java.io.UnsupportedEncodingException: Codepage number may not be 0
at org.apache.poi.util.CodePageUtil.codepageToEncoding(CodePageUtil.java:277)
at org.apache.poi.util.CodePageUtil.codepageToEncoding(CodePageUtil.java:255)
at org.apache.poi.util.CodePageUtil.getStringFromCodePage(CodePageUtil.java:233)
at org.apache.poi.util.CodePageUtil.getStringFromCodePage(CodePageUtil.java:221)
at org.apache.poi.hpsf.CodePageString.getJavaValue(CodePageString.java:70)
at org.apache.poi.hpsf.VariantSupport.read(VariantSupport.java:210)
at org.apache.poi.hpsf.Property.(Property.java:163)
at org.apache.poi.hpsf.Section.(Section.java:277)
at org.apache.poi.hpsf.PropertySet.init(PropertySet.java:451)
at org.apache.poi.hpsf.PropertySet.(PropertySet.java:246)
at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:83)
at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:73)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:126)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:537)
at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:94)
at org.elasticsearch.mapper.attachments.TikaImpl$1.run(TikaImpl.java:91)
at java.security.AccessController.doPrivileged(Native Method)
at org.elasticsearch.mapper.attachments.TikaImpl.parse(TikaImpl.java:91)
at org.elasticsearch.mapper.attachments.AttachmentMapper.parse(AttachmentMapper.java:481)


(system) #2