Some basic questions regarding ES


#1

I am trying to learn Elasticsearch and have some questions, please answer if and when you get a chance:

The scenario:
3 master nodes 2 for quorum
3 data nodes

The questions:
Let’s say I need to forward apache httpd logs using logstash forwarder, do I forward from all web servers to one data node or do I distribute them across the data nodes?

Are indices created for each log type per data node and are they replicated on the other data nodes?
What are shards? (simplistic explanation would suffice).

If I use Kibana for visualization, do I connect it to “any one” data node or to “any one” master node?

Is it not bad practice to not have the ES nodes on public network?

What framework do you recommend for Hadoop/ES integration? I have Hadoop+Hive configured, any starting points would be helpful (I have gone through the elastic.co doc and am unable to find how to actually send Hive data to ES)

Thanks a LOT in advance. Regards,


(Mark Walkom) #2

Use Filebeat instead, LSF is deprecated.

One index for it all. They are replicated by default

Basically, partitions of an index to allow distribution.

Not the masters.

That's a lot of negatives. Don't expose ES to the internet is the take away, even then make sure it's protected, just like any other datastore.


#3

Thank you very much


(system) #4