I am new to ElasticSearch (1.7) and currently running a Logstash (1.5.4) and ElasticSearch on the same host. I am trying to scale my ElasticSearch to have multiple instances. I have been able to get multiple ElasticSearch instances working with data being correctly inserted into their indices. I was wondering how it is possible to retrieve the data from all three instances using one query (able to retrieve from individual instances)?
Are these instances part of the same cluster? If yes, then you can query any node for any index that's part of the cluster. If no, your only option would be setting up a tribe node.
We are using the same cluster to group the nodes, so basically I would have to specifically assign each index (time-based) to a specific node through Logstash. I think I was using a round-robin so they we appearing in each node, so I was able to retrieve the results from the specific node, but was not able to figure out how to aggregate data from multiple nodes in the same cluster. I currently deploy through Ansible so this probably make a little more easier during deployment if this is the case but make a more difficult to retrieve the data through our Django application.
We are using the same cluster to group the nodes, so basically I would have to specifically assign each index (time-based) to a specific node through Logstash.
No... why? Logstash can't control where indexes and shards are allocated.
Again, just connect to any cluster node and let ES deal with everything.
I currently trying to scale the system vertically. I am still a little unclear on how the data is supposed to go from Logstash to ElasticSearch without having data getting duplicated (same index exists in each node). Currently each index is showing up in each node I have deployed. My configuration settings are below:
ElasticSearch settings (each Node is configured to have its own config, data/work directory, log file)
NODE 1
Cluster Name: Supply
Node Name: es1
Node Master: True
Node Data: True
Minimum Master Nodes: 2
Mulitcast: False
Unicast Hosts: [localhost:9300]
NODE 2
Cluster Name: Supply
Node Name: es2
Node Master: True
Node Data: True
Minimum Master Nodes: 2
Mulitcast: False
Unicast Hosts: [localhost:9300]
NODE 3
Cluster Name: Supply
Node Name: es3
Node Master: True
Node Data: True
Minimum Master Nodes: 2
Mulitcast: False
Unicast Hosts: [localhost:9300]
Logstash Parser output:
elasticsearch {
host => "localhost"
protocol => "http"
port => "9200"
index => "estimate-%{+YYYY-MM-dd}"
cluster => "sv"
template => "/opt/logsearch/templates/logsearch_apple_template.json"
template_name => "apple_template"
template_overwrite => true
}
Thanks for the help.
I currently trying to scale the system vertically.
I think you mean horizontally (i.e. increasing the total capacity by adding machines).
Currently each index is showing up in each node I have deployed.
What, exactly, do you mean by this and what makes you reach that conclusion?
Question 1:
We plan out scaling up to bigger server not more servers (scaling out). We will only have one server in deployment. We were trying to speed up query performance using aggregation due to the indices being to large causing query performance to be slow. We thought by scaling out the number of instances of ElasticSearch we could spread the load across multiple instances which would improve the performance of the aggregation.
Question 2:
Basically I am seeing the same number of indices with the same document count that return the same results in each of the nodes when query it. I not sure that this should be correct if it is then I fine with it, but I am thinking that it is duplicating the same data in 3 nodes. Is there a way to verify this conclusion?
We plan out scaling up to bigger server not more servers (scaling out). We will only have one server in deployment. We were trying to speed up query performance using aggregation due to the indices being to large causing query performance to be slow. We thought by scaling out the number of instances of Elasticsearch we could spread the load across multiple instances which would improve the performance of the aggregation.
That's correct. Elasticsearch generally scales better horizontally than vertically.
Basically I am seeing the same number of indices with the same document count that return the same results in each of the nodes when query it. I not sure that this should be correct if it is then I fine with it, but I am thinking that it is duplicating the same data in 3 nodes. Is there a way to verify this conclusion?
Cluster state is global. Any node can be queried regardless of where the data is actually stored. The behavior you see is normal. Don't worry about it.