First, apologies if the category is incorrect, I think this question is half ES and half Kibana.
Scenario:
I have 10+ machines (on the same subnet) each running their own elasticsearch instance (plus logstash & logstash-forwarder). On each of these 10+ servers I'm using logstash to ingest HTTP proxy logs (local, on each of the 10+ machines). On another subnet I have a single ELK server that is receiving logs from those 10+ machines (mentioned above) via logstash-forwarder. The logs i'm sending via logstash-forwarder are not the HTTP proxy logs. I don't want to send the proxy logs via logstash-forwarder due to bandwidth (proxy logs can easily reach 4GB+ in a single day).
Goal:
From my single ELK server I would like to be able to query the elasticsearch instances on the 10+ machines using Kibana. Even though the ELK server is on a different subnet I can still reach the 10+ machines running elasticsearch/logstash/logstash-forwarder. My HTTP proxy logs contain user-agent, URL, IP etc.. all in json format, I would like to be able to run a search from the ELK server (Kibana) that would run on 10+ machines to look for say a specific "user-agent"
TL;DR -- I want to search for data across 10+ independent elasticsearch instances/nodes from a single Kibana interface. What is the best way to accomplish this?
Would Tribe be required in my scenario since I don't have different clusters? Could the ELK server and the 10+ machines be part of the same cluster? If so, any pointers on how to make that work?
edit Also, is there any documentation on how to configure/create a "tribe node"?
From what I read, the single ELK server in another subnet he mentioned is only for storing Kibana index, and he wants that Kibana to be able to query data from 10+ ES nodes (10+ single node ES cluster ).
@sck: 4GB+ a day a not large in traffic, I'm not sure why you don't want to store all logs into one single ES cluster. On our prod environment, we sending about 200GB per day of logs from multiple servers into one ES cluster.
While the 10+ servers are on the same subnet they are geographically dispersed. So yes, 4GB is typically not a lot of traffic, but in this scenario it is not feasible. Sounds like the tribe node is the best way to go. Now i'm trying to find some info on how to setup a tribe node. Would the tribe node be setup on my ELK server?
Given that you can use tribe nodes, will the search latency acceptable when one ES node has to go to 10+ ES instances separated geographically to get data?
Do you need to search data from 10+ nodes in real time? If not I still prefer using something to slowly ship data from 10+ nodes to a central ES node where we can do whatever we want at high speed. It would be easier than having to maintain separate ES nodes.
Tribe Node is good when you want to do federated search across indices in multiple clusters.
If you have multiple indices in one cluster, you can simply use the Client Node. Point Kibana to this Client Node and it should work.
If you have to use the Tribe Node, you need to "initiate a .kibana" index manually or something similar so Kibana knows where to put its data. Under the hood, Kibana stores its data in an index in ES (if you do not know that) With the Tribe Node setup, Kibana will be confused about where to put its own index, that's why you'll need to do it "manually"
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.