X-pack with Hadoop

Hi,

I'm trying to position Elastic 6.x X-pack to monitor our hadoop cluster operations. We want to forecast hadoop cluster behavior for a specified time period. I'm looking for help to setup the architecture.

Should it be - metricbeat on all cluster nodes --> ELK --> x-pack ML jobs --> forecasting

Not sure, if there is a different design or components for hadoop. Can someone help here??

Collecting the data is the 1st right step. Things to consider:

  • what do you care about, what are the metrics you want to collect, what do you want to forecast later?
  • carefully collect data: hadoop has monitoring built-in, instead of installing metricbeat on all nodes you might be better of getting the required data from hadoop API's, which has the additional benefit of having access to hadoop counters and many more internal metrics. More than metricbeat can offer. Another reason not to use metricbeat on worker nodes: be careful not introducing a performance bottleneck, especially network load. Hadoop already aggregates what you need, therefore start with that
  • what data is potentially useful, e.g. if the hadoop cluster is an internal service it might be useful to collect users, might be useful as partition field (typical is the 'end of quarter' analysis, when suddenly everyone kicks of jobs)

Overall I think this is an interesting usecase, there is nothing special to other cases. As said, you should be careful about how you collect the data. I suggest to 1st go with hadoop's out of the box metric API's.

For more information search for "hadoop metrics", this should give you fairly detailed howto's as well as hadoop's documentation.

Thanks Hendrik for your reply

Good question on what we need and what we want to forecast. I started off looking into Ambari metrics and believe me, its noisy. For every application id, AMS generates thousands of metrics and its difficult to find correlation amongst them. Other challenges are missing data annotation and aligning metrics by hosts by service.

I'm hoping MetricBeat can give system level metrics (most of them are known ones). We want to forecast the load by service on each host. I agree we need to evaluate impact of metricBeat from performance perspective.

Any further pointers will be appreciated!