GSoC 2018, Interested to contribute to Beats

Hi
I'm Amitosh Swain Mahapatra, a 3rd year CS undergrad from CET, Bhubaneswar, India. I'm interested to work on the idea - "Beats: Monitor Your Java Applications with JavaBeat".

I've a good deal of experience of Go, and knowlegde of JVM internals and various Java performance monitoring and instrumentation APIs. I also have prior experience with beats and ELK stack.

I have setup my development environment for beats, and currently I'm familiarizing with the codebase.

Currently, I'm working on my first PR addressing (https://github.com/elastic/beats/issues/5175), and I would like to know a bit more about what we are aiming to solve in this issue.

1 Like

Glad do hear you are interested in contributing to Beats. For https://github.com/elastic/beats/issues/5175 the main goal is to look at the existing logging level, potentially adjust the log levels or introduce new log messages to have more information available on what exactly is happening.

For the Javabeat we should discuss which metrics we can gather and how. Potentially it makes most sense to create a module for it in Metricbeat. We can directly continue this discussion here.

2 Likes

I have made a PR!

While we continue to iterate over the PR, I want to take up another issue with some more flexibility in order to be more comfortable with metricbeats. Any suggestions?

I am thinking of #6267

1 Like

As for metric collections for Javabeats, I think the JVisualVM provides a rough idea of what is possible to collect from a running JVM. The JMX APIs should allow to hook into a running JVM and collect data.

  1. Info like CPU usage and RAM usage are available from the OS
  2. Heap size
  3. GC logs and timings
  4. JIT Info, Compilation timings (?)

However, we need to call Java APIs, so I guess we will need a component in Java that collects the data and forwards them to the metricbeats module.

Elastic APM does a similar thing (but for Python and NodeJs only), we also could draw some inspiration from them.

We already have a generic jolokia/jmx in Metricbeat. Perhaps we can build on top of this?

APM is working on a Java agent and I could definitively see them also collecting similar metrics. This does not mean we can't have them in both as I think these are 2 different use cases. Either you monitor your app from outside with Metricbeat or you have a agent running inside that pushes out metrics.

1 Like

Here is some discussion on how to use JMX to get structured events: https://github.com/elastic/beats/issues/3585

1 Like

(Sorry for the radio silence, I was a little busy with school stuff)

I read through Jolokia documentation and tried running it on a local VM. I could get memory and GC info, and a bunch of other stuff.

I also read through the Jolokia metricbeat plugin code, and it appears, we are sending the names of the MBeans, as advertised by the JVM. And as highlighted by #3585, we want to define a standard subset for JMX metrics for common values like memory and GC. We can definitely build this on top of existing Jolokia integration.

Correct me if I am wrong, I'm a bit confused about the scope of the project here.

I think you got a pretty good understanding of the current state.

The scope of the project is not fully defined and we should agree on it together. First thing we should probably agree on is if jolokia/jmx is the technology we should use for this or if there are other / better ways.

In case we go with jmx we should set out a scope on the data we want to collect as part of this project. I think you already started a pretty good list in your post above. I would be more then happy if you could share from your Java experience which metrics you think are most relevant to collect.

I did a bit of weekend research on the alternatives to Jolokia.

  1. Implementing Java RMI in Go to directly hook into JVM:
    RMI is highly Java-centric and it is overwhelmingly difficult to create a Go RMI client.

  2. Java bridge to fetch JMX:
    This is basically what we are doing now, using Jolokia as JMX proxy.

However, I have found a few counter-arguments to Jolokia in many forums:

  1. Jolokia requires a servlet container, which is cumbersome to deploy at some places.
  2. Jolokia needs to be configured separately as well, XML based configurations scare away a lot of people.

An alternatively can be an embedded Jolokia or a similar JAR that is shipped as a part of the "Javabeat", which can be configured directly using metricbeat configuration.

What to measure

From my experience, these metrics are often requested:

  1. Memory Info
  2. Class loader and compilation times
  3. GC info
  4. Thread info
  5. VM Uptime
  6. Application-specific data
  7. Some support for JMX notifications

Therefore, we may also provide a means of mapping any application specific MBeans to custom metrics. One of my projects involved montioring a Solr cluster for update handler info exposed under solr.updateHandler

We should continue to provide the existing method of logging all JMX metadata, this addition should be under a different module name.

Jmx4perl, as suggested in the GitHub issues, has some application specific MBeans that we may include as examples for the user mapping.

2 Likes

I would expect the complexity of creating a Java RMI in Go as very high and more a separate project that will also need maintains which I don't think we can do from our side. So I would steer away from this option.

I had similar concerns about Jolokia when we got started but it seems we are hitting a good balance with it as it's available in lots of places and configuration does not seem to be a major issue.

Should we set 1-5 as our initial goal? With 6 are your referring to specific java service which always provide the same structure or in house built apps? Can you elaborate on 7?

2 Likes

BTW once you've written your project proposal, we're happy to take a look at it — share a secret gist or write it on Google Docs only accessible for us (see Regarding Kibana : Calendar Visualization and Filtering).
The relevant people for this proposal would be ruflin, steffen.siering, carlos, philipp at elastic dot co

I had similar concerns about Jolokia when we got started but it seems we are hitting a good balance with it as it's available in lots of places and configuration does not seem to be a major issue.

It's safe to go with the Jolokia route then.

Should we set 1-5 as our initial goal?

Yes, I'm planning to set this as my initial goal. I'll be putting the rest as stretch goals.

With 6 are your referring to specific java service which always provide the same structure or in house built apps?

Both. Apps like JBoss, Solr, Tomcat, have their own "documented" MBeans. Similarly with in-house apps. My idea is to provide a means of mapping custom MBeans name to a custom metric name. The current Jolokia module implementation provides this option as jmx.mappings.

Can you elaborate on 7?

JMX notifications allow MBeans to fire events. Those are generally some application specific alerts.

But I feel, supporting alerts is out of our scope here. Even Jolokia lacks proper support for JMX notifications.

For number 6 with the documented mbeans for some services it becomes very interesting to build potential modules for it based on jolokia. For example there could be a tomcat, jboss, solr module.

For 7: If Jolokia does not properly support it, this definitively becomes tricky, but very interesting to know.

Do you need anything else from our side to create a proposal?

Does Elastic have any format for proposals? Do you expect me to include any specific piece of information?

The project goals at elastic/gsoc mention about creating Kibana dashboards. Though I have a fair knowledge of Kibana, I have never created a templated JSON kibana dashboard.

What elements should the dashboard contain and how should we visualize them? Any pointers for a similar pre-built dashboard.

@xeraa @ruflin

I have written a draft of my proposal and shared it with you.

1 Like

Creating dashboards with Kibana is fairly easy. Here is the related Beats dev guide: https://www.elastic.co/guide/en/beats/devguide/current/build-dashboards.html To start learning about how to build the dashboards best send some Metricbeat data to ES / Kibana and start playing around with it either by creating a new dashboard or modifying the existing ones.

@agathver Thanks for the proposal, will try to have a look in the next days and comment on it.

1 Like

Thanks for this resource.

For this project, I am using the Docker and NGINX dashboards as a reference.

I'm thinking about two kinds of dashboards -> One for

  1. Overview of all running Java apps, similar to the container overview dashboard in Docker
  2. A dashboard for a specific app. The user has to manually clone and modify the queries for the second though — it's a template.
  3. Templates for other app-specific MBeans we are considering to implement.

This is probably going too far for the proposal, but just as an idea: In the [Metricbeat System] Overview dashboard you have links to the specific hosts.

If you click on one of these links, the [Metricbeat System] Host overview dashboard filters down to that specific host.

Maybe we could use something similar for Java applications.

1 Like