ElasticSearch _bulk calls resulting in "socket hang up" despite small size

I'm using elastical to connect to ElasticSearch via node.js.

In the process of profiling my app with Nodetime to attempt to improve performance, I noticed something odd. My ElasticSearch "PUT" requests to _bulk index are frequently resulting in a "socket hang up". Furthermore, these calls are taking huge amounts of CPU time.

I'm capping each _bulk index request @ 10 items to index, and as you can see, the content-length of the requests does not even reach 50Kb, so it is hard to imagine that the size is an issue. Yet, the response time is > 60 seconds and the CPU time is >10+ seconds! Yikes!!

<nabble_img src="EoIWe.png" border="0"/>

In attempts to debug, I started running ElasticSearch in the foreground. I noticed this strange error:

[2013-02-27 11:42:39,188][WARN ][index.gateway.s3 ] [Lady Mandarin] [network][1] failed to read commit point [commit-f34]
java.io.IOException: Failed to get [commit-f34]
at org.elasticsearch.common.blobstore.support.AbstractBlobContainer.readBlobFully(AbstractBlobContainer.java:83)
at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.buildCommitPoints(BlobStoreIndexShardGateway.java:847)
at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.doSnapshot(BlobStoreIndexShardGateway.java:188)
at org.elasticsearch.index.gateway.blobstore.BlobStoreIndexShardGateway.snapshot(BlobStoreIndexShardGateway.java:160)
at org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:271)
at org.elasticsearch.index.gateway.IndexShardGatewayService$2.snapshot(IndexShardGatewayService.java:265)
at org.elasticsearch.index.engine.robin.RobinEngine.snapshot(RobinEngine.java:1090)
at org.elasticsearch.index.shard.service.InternalIndexShard.snapshot(InternalIndexShard.java:496)
at org.elasticsearch.index.gateway.IndexShardGatewayService.snapshot(IndexShardGatewayService.java:265)
at org.elasticsearch.index.gateway.IndexShardGatewayService$SnapshotRunnable.run(IndexShardGatewayService.java:366)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

Caused by: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: ..., AWS Error Code: NoSuchKey, AWS Error Message: The specified key does not exist., S3 Extended Request ID: ....
	at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:548)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:288)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:170)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:2632)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:811)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:717)
	at org.elasticsearch.cloud.aws.blobstore.AbstractS3BlobContainer$1.run(AbstractS3BlobContainer.java:73)</raw>

I'm aware that I'm using a deprecated gateway (the S3 bucket gateway). However, given that I have multiple servers running on the Amazon Cloud which need to share data (I use ElasticSearch for caching), I don't see any alternative until the ElasticSearch team releases a replacement for the S3 Bucket Gateway...

Other than this problem with the _bulk calls, I'm not seeing any problems. Searches etc. all return quickly and effectively.

Which error are you seeing? You don't need to use s3 gateway to have a cluster of nodes setup on AWS…, s3 gateway simply consistently copies the data to s3, while the local gateway will know how to restore its state from each instance local storage (EBS or ephemeral).

On Feb 27, 2013, at 9:27 PM, inZania zaneclaes@gmail.com wrote:

I'm using elastical to connect to Elasticsearch via node.js.

In the process of profiling my app with Nodetime to attempt to improve
performance, I noticed something odd. My Elasticsearch "PUT" requests to
_bulk index are frequently resulting in a "socket hang up". Furthermore,
these calls are taking huge amounts of CPU time.

I'm capping each _bulk index request @ 10 items to index, and as you can
see, the content-length of the requests does not even reach 50Kb, so it is
hard to imagine that the size is an issue. Yet, the response time is > 60
seconds and the CPU time is >10+ seconds! Yikes!!

http://elasticsearch-users.115913.n3.nabble.com/file/n4030677/EoIWe.png

In attempts to debug, I started running Elasticsearch in the foreground. I
noticed this strange error:

I'm aware that I'm using a deprecated gateway (the S3 bucket gateway).
However, given that I have multiple servers running on the Amazon Cloud
which need to share data (I use Elasticsearch for caching), I don't see any
alternative until the Elasticsearch team releases a replacement for the S3
Bucket Gateway...

Other than this problem with the _bulk calls, I'm not seeing any problems.
Searches etc. all return quickly and effectively.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/ElasticSearch-bulk-calls-resulting-in-socket-hang-up-despite-small-size-tp4030677.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It looks like you run into internal shard bulk response timeouts which
are by default 60 seconds. There should be something in the logs. Check
if you can successfully index documents to all the cluster nodes.

Best regards,

Jörg

Am 27.02.13 21:27, schrieb inZania:

Yet, the response time is > 60
seconds and the CPU time is >10+ seconds!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kimchy,

Data persistency between the instances is of very high importance to me. Eg, I might have one server which indexes a new document, and 30s later I'll need to search for that document from the other server... and then maybe one of the servers might go down (it is an auto-scaling array on the cloud, so servers go up and down all the time), and I still need the data to be available. I thought this wasn't possible without S3 as a shared data store -- am I wrong? If so, where can I find a tutorial on how to do this?