I've got an Elasticsearch 5.3 cluster that I essentially want to store logs in for archival and search purposes, made up of two nodes, each in a different data centre. I would like to be able to search both nodes from one Kibana instance (hence the cluster), but not ship the logs between the data centres.
So far, I've been able to disable replicas, but haven't been able to figure out the right settings to stop my shards from being distributed across the cluster. I've been looking at the cluster.routing.allocation.require.* directives, but haven't had much luck.
Hmm. Sites are <10ms apart, according to ping, but I acknowledge that might be an issue.
Can you suggest an alternative architecture? Kibana doesn't seem to be able to query more than one elasticsearch (which, to be honest I'm not expecting it to be able to), and I'm looking to not having to shovel raw log files between sites if I can help it.
I've looked at allocation awareness (this: https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html ) and it seems to be about keeping replica shards outside of the 'awareness zone' that the primary shards reside in, rather than allocating all the primary shards for an index on the same cluster node where the data is ingested.
Given that's likely to still leave me with a cluster latency issue, I think to best achieve my goal is to stop trying to fight elasticsearch, create two separate nodes for my data, and use Kibana and a tribe node to knit them together until Kibana supports cross-cluster search.
you could set 1 shard and 0 recplicas per index (so you effectively have only 1 primary shard).
However, you loose parallel processing and you can only store 2 billion documents per index in this configuration.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.