I am trying to learn Elasticsearch and have some questions, please answer if and when you get a chance:
The scenario:
3 master nodes 2 for quorum
3 data nodes
The questions:
Let’s say I need to forward apache httpd logs using logstash forwarder, do I forward from all web servers to one data node or do I distribute them across the data nodes?
Are indices created for each log type per data node and are they replicated on the other data nodes?
What are shards? (simplistic explanation would suffice).
If I use Kibana for visualization, do I connect it to “any one” data node or to “any one” master node?
Is it not bad practice to not have the ES nodes on public network?
What framework do you recommend for Hadoop/ES integration? I have Hadoop+Hive configured, any starting points would be helpful (I have gone through the elastic.co doc and am unable to find how to actually send Hive data to ES)
Thanks a LOT in advance. Regards,