Attachment indexing breaks shards or hangs during indexing


(Travis Groth) #1

Hi all,

I'm looking at incorporating ES into our environment to allow us to
search some large databases that simply don't do well with standard SQL
queries to find things. First, I just want to say I am very impressed
with ES so far. Great stuff.

One of the big requirements I need to test out is indexing not just of
plain text but of attachments - usually standard business docs like
docx, html, pdf, xls, etc. After getting the plugin installed I am
seeing two issues. I can reproduce from a fresh index/jvm.

  1. Image files (GIFs in particular in my testing) seem to cause issues
    in replication between nodes - it will index on the node it was posted
    to (or perhaps the primary node for the shard? My mental picture of the
    clustering side of things isn't entirely formed yet) and show up fine if
    I try to pull up that id, but if I try to retrieve it from the other
    node I'll get a 404 error. On the node I posted the data to I'll get
    this in the log:

[2011-01-29 17:20:26,063][WARN ][action.index ] [Rafferty]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
sun.java2d.Disposer
at
javax.imageio.stream.FileCacheImageInputStream.(FileCacheImageInputStream.java:94)
at
com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstance(InputStreamImageInputStreamSpi.java:51)
at javax.imageio.ImageIO.createImageInputStream(ImageIO.java:331)
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:72)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMapper.java:254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectMapper.java:377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectMapper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectMapper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:173)
at
org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChannelHandler.java:195)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2011-01-29 17:20:26,063][WARN ][cluster.action.shard ] [Rafferty]
sending failed shard for [tickets][2], node[I6QZH35TSTiYm0Ud5EIQ3A],
[R], s[STARTED], reason [Failed to perform [indices/index/shard/index]
on replica, message [RemoteTransportException[[Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];
nested: NoClassDefFoundError[Could not initialize class
sun.java2d.Disposer]; ]]

On the other node I'll see this:

[2011-01-29 17:20:27,605][WARN ][cluster.action.shard ] [Nathaniel
Essex] received shard failed for [tickets][2],
node[I6QZH35TSTiYm0Ud5EIQ3A], [R], s[STARTED], reason [Fail
ed to perform [indices/index/shard/index] on replica, message
[RemoteTransportException[[Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];
nested:
NoClassDefFoundError[Could not initialize class sun.java2d.Disposer]; ]]

Just looks like a missing class I guess, but obviously an issue.
Doesn't seem to happen with all attachments. If this happens enough, I
seem to get full shard failures where they'll go offline. I haven't
been able to reproduce this particular aspect of the problem, though, so
that may have been unrelated.

  1. When I encountered issue #1 I figured I could just filter out image
    files since we really don't care about them anyway. Once I did this my
    index build went along at a nice clip until I ran into a .doc file. I
    basically time out waiting for a server response from the post. No
    errors come back, nothing in the log on either node (I am running
    logging at DEBUG). I just don't get a response. The document is small
    so it isn't transport time that is causing the timeout. I don't see
    much of anything to troubleshoot or provide more information with. No
    message is logged. I can use the tika jar directly and it returns data
    from the exact same document without issue in a very reasonable amount
    of time. Maybe 5 seconds including jvm startup. It doesn't seem to
    happen with all .doc files, which is odd. FWIW here is what file
    tells me about the document in question -

CDF V2 Document, Little Endian, Os: Windows, Version 6.0

If there's any more info that would assist let me know and I will be
happy to provide it. I am going to do another pass and log the files
that cause the timeout and see if I can find any more of a pattern to it.

Thanks!

Travis


(Travis Groth) #2

Does anyone have any insight to this?

I did a larger pass with some logging of what I tried to index and it
looks like PDFs also have a similar problem with being unable to load
one of the classes that tika uses:

[2011-02-01 03:55:05,325][WARN ][action.index ] [Logan]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Strange,
Stephen][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.pdfbox.pdmodel.PDPage
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:201)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:175)
at
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:
212)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:
321)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:
241)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:
53)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:
90)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMapper.java:
254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectMapper.java:
377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:
295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:
316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectMapper.java:
360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:
289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(ObjectMapper.java:
316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectMapper.java:
360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.java:
289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:
230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica(TransportIndexAction.java:
187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperat
ionAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperat
ionAction.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Once again, the document is indexed on the node that I posted the data
to via REST, and missing on the other node. After letting my index
script throw data at it all day I came to find several failures like
this, and more sun.java2d.Disposer errors even though I am skipping
all images. Perhaps they're embedded in other document types? At
some point, the node I was using to index simply stopped acknowledging
posts, even for completely plain text data (this is a new symptom). I
had to completely recreate the index for it to accept data again. I
am, however, storing _source now to aid in debugging so perhaps that
is related.

Is there something I am maybe missing in the plugin setup or something
I have to drop into the lib directory? I simply ran

./plugin install mapper-attachments

in the bin directory as others had mentioned on the list. I tried
putting the tika-app jar into lib but that broke logging, at least, so
that doesn't appear to be the solution. Does anyone know what's going
on or what I am doing wrong?

Thanks!

Travis

On Jan 30, 3:13 pm, Travis Groth tgr...@gmail.com wrote:

Hi all,

I'm looking at incorporating ES into our environment to allow us to
search some large databases that simply don't do well with standard SQL
queries to find things. First, I just want to say I am very impressed
with ES so far. Great stuff.

One of the big requirements I need to test out is indexing not just of
plain text but of attachments - usually standard business docs like
docx, html, pdf, xls, etc. After getting theplugininstalled I am
seeing two issues. I can reproduce from a fresh index/jvm.

  1. Image files (GIFs in particular in my testing) seem to cause issues
    in replication between nodes - it will index on the node it was posted
    to (or perhaps the primary node for the shard? My mental picture of the
    clustering side of things isn't entirely formed yet) and show up fine if
    I try to pull up that id, but if I try to retrieve it from the other
    node I'll get a 404 error. On the node I posted the data to I'll get
    this in the log:

[2011-01-29 17:20:26,063][WARN ][action.index ] [Rafferty]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
sun.java2d.Disposer
at
javax.imageio.stream.FileCacheImageInputStream.(FileCacheImageInputSt ream.java:94)
at
com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstanc e(InputStreamImageInputStreamSpi.java:51)
at javax.imageio.ImageIO.createImageInputStream(ImageIO.java:331)
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:72)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa pper.java:254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM apper.java:377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter nalIndexShard.java:230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica (TransportIndexAction.java:187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR eplicationOperationAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR eplicationOperationAction.java:173)
at
org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChanne lHandler.java:195)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908)
at java.lang.Thread.run(Thread.java:662)
[2011-01-29 17:20:26,063][WARN ][cluster.action.shard ] [Rafferty]
sending failed shard for [tickets][2], node[I6QZH35TSTiYm0Ud5EIQ3A],
[R], s[STARTED], reason [Failed to perform [indices/index/shard/index]
on replica, message [RemoteTransportException[[Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];
nested: NoClassDefFoundError[Could not initialize class
sun.java2d.Disposer]; ]]

On the other node I'll see this:

[2011-01-29 17:20:27,605][WARN ][cluster.action.shard ] [Nathaniel
Essex] received shard failed for [tickets][2],
node[I6QZH35TSTiYm0Ud5EIQ3A], [R], s[STARTED], reason [Fail
ed to perform [indices/index/shard/index] on replica, message
[RemoteTransportException[[Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];
nested:
NoClassDefFoundError[Could not initialize class sun.java2d.Disposer]; ]]

Just looks like a missing class I guess, but obviously an issue.
Doesn't seem to happen with all attachments. If this happens enough, I
seem to get full shard failures where they'll go offline. I haven't
been able to reproduce this particular aspect of the problem, though, so
that may have been unrelated.

  1. When I encountered issue #1 I figured I could just filter out image
    files since we really don't care about them anyway. Once I did this my
    index build went along at a nice clip until I ran into a .doc file. I
    basically time out waiting for a server response from the post. No
    errors come back, nothing in the log on either node (I am running
    logging at DEBUG). I just don't get a response. The document is small
    so it isn't transport time that is causing the timeout. I don't see
    much of anything to troubleshoot or provide more information with. No
    message is logged. I can use the tika jar directly and it returns data
    from the exact same document without issue in a very reasonable amount
    of time. Maybe 5 seconds including jvm startup. It doesn't seem to
    happen with all .doc files, which is odd. FWIW here is what file
    tells me about the document in question -

CDF V2 Document, Little Endian, Os: Windows, Version 6.0

If there's any more info that would assist let me know and I will be
happy to provide it. I am going to do another pass and log the files
that cause the timeout and see if I can find any more of a pattern to it.

Thanks!

Travis


(Travis Groth) #3

After upgrading to 0.14.3 and turning on additional debug logging on
the index and gateway I found this started showing up in close
proximity to the other errors:

[2011-02-01 22:26:31,626][WARN ][cluster.action.shard ] [Scarlet
Witch] sending failed shard for [tickets][0], node[MbTXMpNLRn-
FKfUDILzFww], [R], s[STARTED], reason [Failed to perform [indices/
index/shard/index] on replica, message
[RemoteTransportException[[Fasaud][inet[/10.140.20.168:9300]][indices/
index/shard/index/replica]]; nested: UnsatisfiedLinkError[/usr/java/
jdk1.6.0_23/jre/lib/
amd64/xawt/libmawt.so: libXtst.so.6: cannot open shared object file:
No such file or directory]; ]]

On a hunch, I installed libXtst and it would seem all the issues I was
seeing have cleared up.

Travis

On Jan 31, 11:15 pm, Travis tgr...@gmail.com wrote:

Does anyone have any insight to this?

I did a larger pass with some logging of what I tried to index and it
looks like PDFs also have a similar problem with being unable to load
one of the classes that tika uses:

[2011-02-01 03:55:05,325][WARN ][action.index ] [Logan]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Strange,
Stephen][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.pdfbox.pdmodel.PDPage
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:201)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:175)
at
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.j ava:
212)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:
321)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:
241)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:
53)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:
90)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa pper.java:
254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM apper.java:
377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:
295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:
316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:
360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:
289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:
316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:
360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:
289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:
430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:
368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter nalIndexShard.java:
230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica (TransportIndexAction.java:
187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication Operat
ionAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication Operat
ionAction.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Once again, the document is indexed on the node that I posted the data
to via REST, and missing on the other node. After letting my index
script throw data at it all day I came to find several failures like
this, and more sun.java2d.Disposer errors even though I am skipping
all images. Perhaps they're embedded in other document types? At
some point, the node I was using to index simply stopped acknowledging
posts, even for completely plain text data (this is a new symptom). I
had to completely recreate the index for it to accept data again. I
am, however, storing _source now to aid in debugging so perhaps that
is related.

Is there something I am maybe missing in the plugin setup or something
I have to drop into the lib directory? I simply ran

./plugin install mapper-attachments

in the bin directory as others had mentioned on the list. I tried
putting the tika-app jar into lib but that broke logging, at least, so
that doesn't appear to be the solution. Does anyone know what's going
on or what I am doing wrong?

Thanks!

Travis

On Jan 30, 3:13 pm, Travis Groth tgr...@gmail.com wrote:

Hi all,

I'm looking at incorporating ES into our environment to allow us to
search some large databases that simply don't do well with standard SQL
queries to find things. First, I just want to say I am very impressed
with ES so far. Great stuff.

One of the big requirements I need to test out is indexing not just of
plain text but of attachments - usually standard business docs like
docx, html, pdf, xls, etc. After getting theplugininstalled I am
seeing two issues. I can reproduce from a fresh index/jvm.

  1. Image files (GIFs in particular in my testing) seem to cause issues
    in replication between nodes - it will index on the node it was posted
    to (or perhaps the primary node for the shard? My mental picture of the
    clustering side of things isn't entirely formed yet) and show up fine if
    I try to pull up that id, but if I try to retrieve it from the other
    node I'll get a 404 error. On the node I posted the data to I'll get
    this in the log:

[2011-01-29 17:20:26,063][WARN ][action.index ] [Rafferty]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
sun.java2d.Disposer
at
javax.imageio.stream.FileCacheImageInputStream.(FileCacheImageInputSt ream.java:94)
at
com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstanc e(InputStreamImageInputStreamSpi.java:51)
at javax.imageio.ImageIO.createImageInputStream(ImageIO.java:331)
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:72)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa pper.java:254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM apper.java:377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter nalIndexShard.java:230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica (TransportIndexAction.java:187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR eplicationOperationAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR eplicationOperationAction.java:173)
at
org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChanne lHandler.java:195)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908)
at java.lang.Thread.run(Thread.java:662)
[2011-01-29 17:20:26,063][WARN ][cluster.action.shard ] [Rafferty]
sending failed shard for [tickets][2], node[I6QZH35TSTiYm0Ud5EIQ3A],
[R], s[STARTED], reason [Failed to perform [indices/index/shard/index]
on replica, message [RemoteTransportException[[Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];
nested: NoClassDefFoundError[Could not initialize class
sun.java2d.Disposer]; ]]

On the other node I'll see this:

[2011-01-29 17:20:27,605][WARN ][cluster.action.shard ] [Nathaniel
Essex] received shard failed for [tickets][2],

...

read more »


(Shay Banon) #4

It seems like the Apache Tika module that is used to index attachments requires AWT, which requires, I guess more libs than what was indexed. Not to happy with the attachments module to be honest, need to work on it a bit and make it better.
On Wednesday, February 2, 2011 at 6:04 PM, Travis wrote:

After upgrading to 0.14.3 and turning on additional debug logging on
the index and gateway I found this started showing up in close
proximity to the other errors:

[2011-02-01 22:26:31,626][WARN ][cluster.action.shard ] [Scarlet
Witch] sending failed shard for [tickets][0], node[MbTXMpNLRn-
FKfUDILzFww], [R], s[STARTED], reason [Failed to perform [indices/
index/shard/index] on replica, message
[RemoteTransportException[[Fasaud][inet[/10.140.20.168:9300]][indices/
index/shard/index/replica]]; nested: UnsatisfiedLinkError[/usr/java/
jdk1.6.0_23/jre/lib/
amd64/xawt/libmawt.so: libXtst.so.6: cannot open shared object file:
No such file or directory]; ]]

On a hunch, I installed libXtst and it would seem all the issues I was
seeing have cleared up.

Travis

On Jan 31, 11:15 pm, Travis tgr...@gmail.com wrote:

Does anyone have any insight to this?

I did a larger pass with some logging of what I tried to index and it
looks like PDFs also have a similar problem with being unable to load
one of the classes that tika uses:

[2011-02-01 03:55:05,325][WARN ][action.index ] [Logan]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Strange,
Stephen][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.pdfbox.pdmodel.PDPage
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:201)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)
at
org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:175)
at
org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.j ava:
212)
at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:
321)
at
org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:
241)
at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:
53)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:
90)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:
137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa pper.java:
254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM apper.java:
377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:
295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:
316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:
360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:
289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:
316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:
360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:
289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:
430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:
368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter nalIndexShard.java:
230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica (TransportIndexAction.java:
187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication Operat
ionAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction
$ReplicaOperationTransportHandler.messageReceived(TransportShardReplication Operat
ionAction.java:173)
at org.elasticsearch.transport.netty.MessageChannelHandler
$3.run(MessageChannelHandler.java:195)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Once again, the document is indexed on the node that I posted the data
to via REST, and missing on the other node. After letting my index
script throw data at it all day I came to find several failures like
this, and more sun.java2d.Disposer errors even though I am skipping
all images. Perhaps they're embedded in other document types? At
some point, the node I was using to index simply stopped acknowledging
posts, even for completely plain text data (this is a new symptom). I
had to completely recreate the index for it to accept data again. I
am, however, storing _source now to aid in debugging so perhaps that
is related.

Is there something I am maybe missing in the plugin setup or something
I have to drop into the lib directory? I simply ran

./plugin install mapper-attachments

in the bin directory as others had mentioned on the list. I tried
putting the tika-app jar into lib but that broke logging, at least, so
that doesn't appear to be the solution. Does anyone know what's going
on or what I am doing wrong?

Thanks!

Travis

On Jan 30, 3:13 pm, Travis Groth tgr...@gmail.com wrote:

Hi all,

I'm looking at incorporating ES into our environment to allow us to
search some large databases that simply don't do well with standard SQL
queries to find things. First, I just want to say I am very impressed
with ES so far. Great stuff.

One of the big requirements I need to test out is indexing not just of
plain text but of attachments - usually standard business docs like
docx, html, pdf, xls, etc. After getting theplugininstalled I am
seeing two issues. I can reproduce from a fresh index/jvm.

  1. Image files (GIFs in particular in my testing) seem to cause issues
    in replication between nodes - it will index on the node it was posted
    to (or perhaps the primary node for the shard? My mental picture of the
    clustering side of things isn't entirely formed yet) and show up fine if
    I try to pull up that id, but if I try to retrieve it from the other
    node I'll get a 404 error. On the node I posted the data to I'll get
    this in the log:

[2011-01-29 17:20:26,063][WARN ][action.index ] [Rafferty]
Failed to perform indices/index/shard/index on replica Index Shard
[tickets][2]
org.elasticsearch.transport.RemoteTransportException: [Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]
Caused by: java.lang.NoClassDefFoundError: Could not initialize class
sun.java2d.Disposer
at
javax.imageio.stream.FileCacheImageInputStream.(FileCacheImageInputSt ream.java:94)
at
com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstanc e(InputStreamImageInputStreamSpi.java:51)
at javax.imageio.ImageIO.createImageInputStream(ImageIO.java:331)
at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:72)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
at org.apache.tika.Tika.parseToString(Tika.java:290)
at
org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa pper.java:254)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM apper.java:377)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:295)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:289)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object Mapper.java:316)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM apper.java:360)
at
org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav a:289)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:430)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte ntDocumentMapper.java:368)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter nalIndexShard.java:230)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica (TransportIndexAction.java:187)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR eplicationOperationAction.java:180)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOpera tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR eplicationOperationAction.java:173)
at
org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChanne lHandler.java:195)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 908)
at java.lang.Thread.run(Thread.java:662)
[2011-01-29 17:20:26,063][WARN ][cluster.action.shard ] [Rafferty]
sending failed shard for [tickets][2], node[I6QZH35TSTiYm0Ud5EIQ3A],
[R], s[STARTED], reason [Failed to perform [indices/index/shard/index]
on replica, message [RemoteTransportException[[Nathaniel
Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];
nested: NoClassDefFoundError[Could not initialize class
sun.java2d.Disposer]; ]]

On the other node I'll see this:

[2011-01-29 17:20:27,605][WARN ][cluster.action.shard ] [Nathaniel
Essex] received shard failed for [tickets][2],

...

read more »


(Travis Groth) #5
To be fair, I don't think we get away without that library on any of
the application servers here anyway.  I'm surprised I didn't trip
over it earlier.  This happened to be a fresh barebones VM.




The attachments mapper may not be ideal but my brief testing turned
up OK results.  I've seen some exceptions indexing ODT and a few
PDFs but the stack traces are definitely straight from Tika's
parser.  Most of the time it seems to be working just fine which is
good enough for my purposes.  Probably 98% of our use cases are on
plain text anyway.




Once again, great stuff.  Looking forward to 0.15 proper.  




On 02/02/2011 11:21 AM, Shay Banon wrote:
<blockquote cite="mid:B1EA4BA019C4420798113F5062F75598@elasticsearch.com" type="cite">It seems like the Apache Tika module that is used
        to index attachments requires AWT, which requires, I guess
        more libs than what was indexed. Not to happy with the
        attachments module to be honest, need to work on it a bit
        and make it better.

On Wednesday, February 2, 2011 at 6:04 PM, Travis wrote:

After upgrading to 0.14.3 and turning on additional debug logging on
              the index and gateway I found this started showing up
              in close


              proximity to the other errors:




              [2011-02-01 22:26:31,626][WARN ][cluster.action.shard
              ] [Scarlet


              Witch] sending failed shard for [tickets][0],
              node[MbTXMpNLRn-


              FKfUDILzFww], [R], s[STARTED], reason [Failed to
              perform [indices/


              index/shard/index] on replica, message

[RemoteTransportException[[Fasaud][inet[/10.140.20.168:9300]][indices/

              index/shard/index/replica]]; nested:
              UnsatisfiedLinkError[/usr/java/


              jdk1.6.0_23/jre/lib/


              amd64/xawt/libmawt.so: libXtst.so.6: cannot open
              shared object file:


              No such file or directory]; ]]




              On a hunch, I installed libXtst and it would seem all
              the issues I was


              seeing have cleared up.




              Travis




              On Jan 31, 11:15Â pm, Travis &lt;tgr...@<a moz-do-not-send="true" href="http://gmail.com">gmail.com</a>&gt;
              wrote:
Does anyone have any insight to this?
                  I did a larger pass with some logging of what I
                  tried to index and it


                  looks like PDFs also have a similar problem with
                  being unable to load


                  one of the classes that tika uses:




                  [2011-02-01 03:55:05,325][WARN ][action.index    
                  Â  Â  Â  Â  ] [Logan]


                  Failed to perform indices/index/shard/index on
                  replica Index Shard


                  [tickets][2]


                  org.elasticsearch.transport.RemoteTransportException:
                  [Strange,

Stephen][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]

                  Caused by: java.lang.NoClassDefFoundError: Could
                  not initialize class


                  org.apache.pdfbox.pdmodel.PDPage


                  Â  Â  Â  Â  at

org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:201)

                  Â  Â  Â  Â  at

org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)

                  Â  Â  Â  Â  at

org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:207)

                  Â  Â  Â  Â  at

org.apache.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:175)

                  Â  Â  Â  Â  at


                  org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.j
                  ava:


                  212)


                  Â  Â  Â  Â  at

org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:

                  321)


                  Â  Â  Â  Â  at

org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:

                  241)


                  Â  Â  Â  Â  at
                  org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:


                  53)


                  Â  Â  Â  Â  at
                  org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:


                  90)


                  Â  Â  Â  Â  at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

                  Â  Â  Â  Â  at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

                  Â  Â  Â  Â  at

org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:

                  137)


                  Â  Â  Â  Â  at
                  org.apache.tika.Tika.parseToString(Tika.java:290)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa
                  pper.java:


                  254)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM
                  apper.java:


                  377)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav
                  a:


                  295)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object
                  Mapper.java:


                  316)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM
                  apper.java:


                  360)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav
                  a:


                  289)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object
                  Mapper.java:


                  316)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM
                  apper.java:


                  360)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav
                  a:


                  289)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
                  ntDocumentMapper.java:


                  430)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
                  ntDocumentMapper.java:


                  368)


                  Â  Â  Â  Â  at


                  org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter
                  nalIndexShard.java:


                  230)


                  Â  Â  Â  Â  at


                  org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica
                  (TransportIndexAction.java:


                  187)


                  Â  Â  Â  Â  at


                  org.elasticsearch.action.support.replication.TransportShardReplicationOpera
                  tionAction


                  $ReplicaOperationTransportHandler.messageReceived(TransportShardReplication
                  Operat


                  ionAction.java:180)


                  Â  Â  Â  Â  at


                  org.elasticsearch.action.support.replication.TransportShardReplicationOpera
                  tionAction


                  $ReplicaOperationTransportHandler.messageReceived(TransportShardReplication
                  Operat


                  ionAction.java:173)


                  Â  Â  Â  Â  at
                  org.elasticsearch.transport.netty.MessageChannelHandler


                  $3.run(MessageChannelHandler.java:195)


                  Â  Â  Â  Â  at java.util.concurrent.ThreadPoolExecutor


                  $Worker.runTask(ThreadPoolExecutor.java:886)


                  Â  Â  Â  Â  at java.util.concurrent.ThreadPoolExecutor


                  $Worker.run(ThreadPoolExecutor.java:908)


                  Â  Â  Â  Â  at java.lang.Thread.run(Thread.java:662)




                  Once again, the document is indexed on the node
                  that I posted the data


                  to via REST, and missing on the other node. Â After
                  letting my index


                  script throw data at it all day I came to find
                  several failures like


                  this, and more  sun.java2d.Disposer errors even
                  though I am skipping


                  all images. Â Perhaps they're embedded in other
                  document types? Â At


                  some point, the node I was using to index simply
                  stopped acknowledging


                  posts, even for completely plain text data (this
                  is a new symptom). Â I


                  had to completely recreate the index for it to
                  accept data again. Â I


                  am, however, storing _source now to aid in
                  debugging so perhaps that


                  is related.




                  Is there something I am maybe missing in the
                  plugin setup or something


                  I have to drop into the lib directory? Â I simply
                  ran




                  ./plugin install mapper-attachments




                  in the bin directory as others had mentioned on
                  the list. Â I tried


                  putting the tika-app jar into lib but that broke
                  logging, at least, so


                  that doesn't appear to be the solution. Â Does
                  anyone know what's going


                  on or what I am doing wrong?




                  Thanks!




                  Travis




                  On Jan 30, 3:13Â pm, Travis Groth &lt;tgr...@<a moz-do-not-send="true" href="http://gmail.com">gmail.com</a>&gt;
                  wrote:
Hi all,
I'm looking at incorporating ES into our environment to allow us to
                      search some large databases that simply don't
                      do well with standard SQL


                      queries to find things. Â First, I just want to
                      say I am very impressed


                      with ES so far. Â Great stuff.
One of the big requirements I need to test out is indexing not just of
                      plain text but of attachments - usually
                      standard business docs like


                      docx, html, pdf, xls, etc. Â After getting
                      theplugininstalled I am


                      seeing two issues. Â I can reproduce from a
                      fresh index/jvm.
1) Â Image files (GIFs in particular in my testing) seem to cause issues
                      in replication between nodes - it will index
                      on the node it was posted


                      to (or perhaps the primary node for the shard?
                      Â My mental picture of the


                      clustering side of things isn't entirely
                      formed yet) and show up fine if


                      I try to pull up that id, but if I try to
                      retrieve it from the other


                      node I'll get a 404 error. Â On the node I
                      posted the data to I'll get


                      this in the log:
[2011-01-29 17:20:26,063][WARN ][action.index       ] [Rafferty]
                      Failed to perform indices/index/shard/index on
                      replica Index Shard


                      [tickets][2]


                      org.elasticsearch.transport.RemoteTransportException:
                      [Nathaniel

Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]

                      Caused by: java.lang.NoClassDefFoundError:
                      Could not initialize class


                      sun.java2d.Disposer


                      Â  Â  Â at


                      javax.imageio.stream.FileCacheImageInputStream.&lt;init&gt;(FileCacheImageInputSt
                      ream.java:94)


                      Â  Â  Â at


                      com.sun.imageio.spi.InputStreamImageInputStreamSpi.createInputStreamInstanc
                      e(InputStreamImageInputStreamSpi.java:51)


                      Â  Â  Â at
                      javax.imageio.ImageIO.createImageInputStream(ImageIO.java:331)


                      Â  Â  Â at
                      org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:72)


                      Â  Â  Â at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

                      Â  Â  Â at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)

                      Â  Â  Â at

org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)

                      Â  Â  Â at
                      org.apache.tika.Tika.parseToString(Tika.java:290)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.AttachmentMapper.parse(AttachmentMa
                      pper.java:254)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeValue(ObjectM
                      apper.java:377)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav
                      a:295)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object
                      Mapper.java:316)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM
                      apper.java:360)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav
                      a:289)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeObject(Object
                      Mapper.java:316)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.serializeArray(ObjectM
                      apper.java:360)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.ObjectMapper.parse(ObjectMapper.jav
                      a:289)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
                      ntDocumentMapper.java:430)


                      Â  Â  Â at


                      org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XConte
                      ntDocumentMapper.java:368)


                      Â  Â  Â at


                      org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(Inter
                      nalIndexShard.java:230)


                      Â  Â  Â at


                      org.elasticsearch.action.index.TransportIndexAction.shardOperationOnReplica
                      (TransportIndexAction.java:187)


                      Â  Â  Â at


                      org.elasticsearch.action.support.replication.TransportShardReplicationOpera
                      tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR
                      eplicationOperationAction.java:180)


                      Â  Â  Â at


                      org.elasticsearch.action.support.replication.TransportShardReplicationOpera
                      tionAction$ReplicaOperationTransportHandler.messageReceived(TransportShardR
                      eplicationOperationAction.java:173)


                      Â  Â  Â at


                      org.elasticsearch.transport.netty.MessageChannelHandler$3.run(MessageChanne
                      lHandler.java:195)


                      Â  Â  Â at


                      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j
                      ava:886)


                      Â  Â  Â at


                      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
                      908)


                      Â  Â  Â at java.lang.Thread.run(Thread.java:662)


                      [2011-01-29 17:20:26,063][WARN
                      ][cluster.action.shard     ] [Rafferty]


                      sending failed shard for [tickets][2],
                      node[I6QZH35TSTiYm0Ud5EIQ3A],


                      [R], s[STARTED], reason [Failed to perform
                      [indices/index/shard/index]


                      on replica, message
                      [RemoteTransportException[[Nathaniel

Essex][inet[/10.140.20.168:9300]][indices/index/shard/index/replica]];

                      nested: NoClassDefFoundError[Could not
                      initialize class


                      sun.java2d.Disposer]; ]]
On the other node I'll see this:
[2011-01-29 17:20:27,605][WARN ][cluster.action.shard   ] [Nathaniel
                      Essex] received shard failed for [tickets][2],
                  ...




                  read more »

(system) #6