Hi All,
I'm brand new to ElasticSearch (it's great so far!) so forgive me if I'm
missing something obvious.
I'm trying to integrate ElasticSearch into a production environment. I have
a very volatile dataset that I need to fully reindex frequently, so I need
a lot more computing power for indexing than for day-to-day serving. So I
would like to have:
- A fairly large (~10 node) cluster of strong machines that I boot up and
use only for indexing, and then shutdown (to save money). - A small (~1-3) node cluster that I serve off of.
The hope is that I'll be able to have the large cluster store the indicies
to S3, shut it down, and then start the small cluster and have it pull the
indicies down from the same S3 bucket.
I managed to get the large cluster started, the indicies built, and the
data (apparently saved to S3). But when I start the small cluster with the
same gateway settings, it gets stuck in the startup with the logs showing:
[2012-11-27 21:44:25,928][TRACE][index.gateway.s3 ] [Catiana]
[labels][3] recovering_files [129] with total_size [341.9mb], reusing_files
[0] with reused_size [0b]
[2012-11-27 21:45:53,851][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: Timeout waiting for connection
[2012-11-27 21:45:54,073][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: Timeout waiting for connection
[2012-11-27 21:45:54,074][WARN ][com.amazonaws.http.AmazonHttpClient]
Unable to execute HTTP request: Timeout waiting for connection
Over and over, until eventually it starts showing apache connection timeout
errors and just loops endlessly through those. I've spent a couple hours
trying to diagnos to no avail. It seems like the issue is either with S3
itself or somewhere fairly deep in the ES transport code. Setting all the
DEBUG's to trace doesn't produce any additional information. I googled but
couldn't find anything useful
(maybe https://forums.aws.amazon.com/message.jspa?messageID=296676 is
related?)
Since the data does seem to get uploaded to S3, I tried downloading it
manually to the ES data folder using S3Cmd and setting the gateway to
local, but then the startup process just hangs endlessly with these lines:
[2012-11-27 21:55:39,904][INFO ][node ] [Banner, Robert
Bruce] {0.19.11}[2365]: initializing ...
[2012-11-27 21:55:39,949][INFO ][plugins ] [Banner, Robert
Bruce] loaded [cloud-aws], sites []
[2012-11-27 21:55:44,027][DEBUG][discovery.zen.ping.multicast] [Banner,
Robert Bruce] using group [224.2.2.4], with port [54328], ttl [3], and
address [null]
[2012-11-27 21:55:44,031][DEBUG][discovery.zen.ping.unicast] [Banner,
Robert Bruce] using initial hosts [], with concurrent_connects [10]
[2012-11-27 21:55:44,033][DEBUG][discovery.ec2 ] [Banner, Robert
Bruce] using ping.timeout [3s], master_election.filter_client [true],
master_election.filter_data [false]
[2012-11-27 21:55:44,039][DEBUG][discovery.zen.elect ] [Banner, Robert
Bruce] using minimum_master_nodes [-1]
[2012-11-27 21:55:44,040][DEBUG][discovery.zen.fd ] [Banner, Robert
Bruce] [master] uses ping_interval [1s], ping_timeout [30s], ping_retries
[3]
[2012-11-27 21:55:44,069][DEBUG][discovery.zen.fd ] [Banner, Robert
Bruce] [node ] uses ping_interval [1s], ping_timeout [30s], ping_retries
[3]
[2012-11-27 21:55:45,290][DEBUG][discovery.ec2 ] [Banner, Robert
Bruce] using host_type [PRIVATE_IP], tags [{}], groups
[[production#elasticsearch]] with any_group [true], availability_zones [[]]
[2012-11-27 21:55:46,962][DEBUG][gateway.local ] [Banner, Robert
Bruce] using initial_shards [quorum], list_timeout [30s]
Again, setting everything to TRACE reveals no additional info.
Here's my full ES.yml:
gateway:
if I try to use S3:
type: s3
s3:
bucket: [redacted]
if I try to use local:
type: local
gateway.local.auto_import_dangled: yes
index:
store:
type: memory
discovery:
type: ec2
discovery.ec2.groups: [redacted]
cloud:
aws:
access_key: [redacted]
secret_key: [redacted]
cluster:
name: production-elasticsearch
path.data: /mnt/elasticsearch/
"""
Any ideas on how I can fix these issues or achieve my goal (big intake
cluster, small production cluster) in another way? If not, I'm probably
going to have to give up on ES
Thanks!
-George
--