Okay, now it's occurring on a different machine. Whatever it is, it's not
isolated. Auto_expand_replicas is not in our config file, and we haven't
changed it at runtime. I assume that means it's off.
The logging is set lower and there isn't much there. For example, here is
the whole log from 2013-08-06. This machine isn't very busy:
2013-08-06 16:24:05,892 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-06] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings
2013-08-06 16:24:07,808 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-06] update_mapping [log] (dynamic)
2013-08-06 16:24:12,873 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-07] creating index, cause [auto(bulk
api)], shards [5]/[1], mappings
2013-08-06 16:24:14,884 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-07] update_mapping [log] (dynamic)
2013-08-06 16:46:49,844 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-06] update_mapping [log] (dynamic)
2013-08-06 22:48:58,019 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-10] creating index, cause [api],
shards [5]/[1], mappings
2013-08-06 22:50:08,738 INFO
|elasticsearch[Mephisto][clusterService#updateTask][T#1]|
c.metadata [Mephisto] [events-2013-08-11] creating index, cause [api],
shards [5]/[1], mappings
Here is _cluster/health:
{
"cluster_name" : "mycluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 85,
"active_shards" : 85,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 85
}
And _cluster/nodes:
{
"ok" : true,
"cluster_name" : "mycluster",
"nodes" : {
"Sz9SdW45QlGz0UvHQxqkCw" : {
"name" : "Dreamqueen",
"transport_address" : "inet[/192.168.1.90:9300]",
"hostname" : "writer",
"version" : "0.90.1",
"http_address" : "inet[/192.168.1.90:9200]",
"attributes" : {
"client" : "true",
"data" : "false"
}
},
"WImBPVMdQEOloVyO_zLCGg" : {
"name" : "Mephisto",
"transport_address" : "inet[/192.168.1.91:9300]",
"hostname" : "es1",
"version" : "0.90.1",
"http_address" : "inet[/192.168.1.91:9200]"
}
}
}
Interesting bits from _cluster/state:
... cluster info, including nodes...
"metadata" : {
"templates" : { },
"indices" : {
"events-2013-07-20" : {
"state" : "open",
"settings" : {
"index.number_of_shards" : "5",
"index.number_of_replicas" : "1",
"index.version.created" : "900199"
},
"mappings" : {
"log" : {
"properties" : {
"id" : {
"type" : "string"
},
"eventType" : {
"type" : "string"
},
"eventTime" : {
"type" : "string"
}
}
}
},
"aliases" :
},
... lots of index info, all basically looks the same...
"routing_table" : {
"indices" : {
"events-2013-07-20" : {
"shards" : {
"0" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "WImBPVMdQEOloVyO_zLCGg",
"relocating_node" : null,
"shard" : 0,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "events-2013-07-20"
} ],
"1" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "WImBPVMdQEOloVyO_zLCGg",
"relocating_node" : null,
"shard" : 1,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "events-2013-07-20"
} ],
"2" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "WImBPVMdQEOloVyO_zLCGg",
"relocating_node" : null,
"shard" : 2,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "events-2013-07-20"
} ],
"3" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "WImBPVMdQEOloVyO_zLCGg",
"relocating_node" : null,
"shard" : 3,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "events-2013-07-20"
} ],
"4" : [ {
"state" : "STARTED",
"primary" : true,
"node" : "WImBPVMdQEOloVyO_zLCGg",
"relocating_node" : null,
"shard" : 4,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "events-2013-07-20"
} ]
}
},
... more of the same for each index...
"routing_nodes" : {
"unassigned" : [ {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "events-2013-07-20"
}, {
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 4,
"index" : "events-2013-07-20"
},
Anything seem unusual there? Should number_of_replicas be 0, since this is
the only machine in the cluster? (The second machine is a client only.)
-- Jeremy
On Wednesday, 7 August 2013 09:14:10 UTC-7, Ivan Brusic wrote:
Missed that number_of_data_nodes is 1. Your indices obviously think that
they have replicas. Scan the cluster state (
http://localhost:9200/_cluster/statehttp://192.168.52.155:9200/_cluster/state)
For each index, does index.number_of_replicas equal anything else but 0?
The lack of the setting means the default of 1 (I might be wrong on this
number).
Do you have perhaps index.auto_expand_replicas enabled?
--
Ivan
On Wed, Aug 7, 2013 at 9:04 AM, Ivan Brusic <iv...@brusic.com<javascript:>
wrote:
Your situation is extremely odd. You state that your cluster has only one
node, but the cluster API is showing that number_of_nodes is 5. You can use
the cluster API to get all the nodes in the cluster:
http://localhost:9200/_cluster/nodeshttp://192.168.52.155:9200/_cluster/nodes Change
localhost to the node that is running elasticsearch. Either there are other
elasticsearch clusters in your network, or you started several process on a
single server. If elasticsearch cannot bind to the default ports
(9200/9300). It will choose the next available port.
You can also use a management plugin such as head or bigdesk to quickly
visualize your cluster.
--
Ivan
On Tue, Aug 6, 2013 at 11:23 AM, Jeremy Karlson <jeremy...@gmail.com<javascript:>
wrote:
Back again!
At some point over the weekend, it appears Elasticsearch decided to stop
refreshing again. This time, I was unable to force refreshes to occur by
manually triggering them. Without any better ideas, I've turned on TRACE
debugging and enabled the debugger, in case it happens again.
It is currently in the process of starting up, and it's taking quite a
while. If I query the cluster health, I see this:
{
"cluster_name" : "MyCluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 1,
"active_primary_shards" : 27,
"active_shards" : 27,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 148
}
And then if I query again a minute later, I see this:
{
"cluster_name" : "MyCluster",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 1,
"active_primary_shards" : 31,
"active_shards" : 31,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 144
}
So it's certainly taking it's time in assigning shards. To me (with my limited ES knowledge) that does seem a little odd since this is a cluster of just one node. I don't know what's normal or not in the log at this point, but I see a lot of talk about recovering shards with a reason of "post recovery." ES was cleanly shutdown and restarted, so I'm unclear on what "recovery" might mean in this situation.
This is certainly starting to become very frustrating and concerning. Anyone else have any ideas?
-- Jeremy
On Wednesday, 31 July 2013 12:34:12 UTC-7, Jeremy Karlson wrote:
Hm. I suppose "huge" is a subjective term, but no, we didn't have a
large number of updates. I would say this machine is under a
low-to-moderate load generally, with indexes and documents being deleted
and recreated fairly frequently. It's the target of a number of automated
tests, so things tend to get torn down and rebuilt with some regularity.
It has perhaps a dozen indexes with 35,000 docs in the most populated I
see. Most indexes are much less populated with a few hundred docs. Our
documents are probably 1 - 2 k each.
I have since restarted the machine and things have returned to "normal"
(up for debate - we're looking at some different weird behaviour now, but
that problem appears to have gone away), but as I recall forcing the manual
refresh was a quick operation that took almost no time running it from a
CURL command line.
-- Jeremy
On Wednesday, 31 July 2013 01:54:04 UTC-7, Alexander Reelsen wrote:
Hey,
did you have a huge amount of updates shortly before, when this
strange behaviour happens? One of the reasons you refresh is not executing,
might be because the refresh, which started a second earlier has not yet
been finished. But the refresh you are executing manually should not be
different.
If you force the refresh, how long does it take until the documents
are searchable? Immediately?
--Alex
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.