Seeing Indexing Rejections

monitoring

(Nandanrao) #1

In multiple places in the documentation, it is recommended that we monitor index rejections through marvel.

"In such a scenario, monitor Elasticsearch (through Marvel or other plugins) and keep an eye on bulk processing. Look at the percentage of documents being rejected" -- https://www.elastic.co/guide/en/elasticsearch/hadoop/master/performance.html

"If you are using Marvel, you can see the rejection counts under the THREAD POOLS - BULK section of the Node Statistics Dashboard. " -- https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

But it seems like this feature was removed from Marvel? Or moved somewhere? There does not seem to be any documentation as to what I can see in Marvel, so I assume I'm just supposed to click around and everything is there, but I don't seem to find a good way to see rejection percentage or number of rejections in bulk indexing. Help!


Relationship between Spark tasks and batch size
(Bohyun Kim) #2

Hi Nandanrao -

Are you currently using ES 1.x or 2.x? The reference you have above on seeing the rejection counts under Thread pools-bulk section is still valid for ES 1.x and Marvel 1.x.

If you're on 2.x, you will quickly find out that our Marvel has changed significantly, due to many changes in ES 2.0 and corresponding changes in Kibana 4. Because of these changes, Marvel 2.0 had to be completely re-written from ground up (release note) and we decided to make Marvel lighter and simpler for quicker understanding of your Elasticsearch clusters.

Please visit https://www.elastic.co/guide/en/marvel/current/index.html for the latest Marvel documentation.

Hope this helps,
Bohyun


(Nandanrao) #3

Yes, I'm using 2.x. Simpler and lighter sounds wonderful! However, now that Marvel has changed, is there a new recommended way to see those statistics and therefore tune bulk indexing?


(Chris Earle) #4

Hi nandanrao,

Marvel 2.x does not currently display anything about the threadpools beyond the CPU usage, which is correlated of course, but it does limit you to the data that it chooses to display.

Now, while we do not like forcing users to do more work, we do collect the number of bulk rejections in the .marvel-es-1-* index under the node_stats type (the field is node_stats.thread_pool.bulk.rejected).

Given that, you can graph that using Kibana against that index. The problem that you will likely run into is that the value is simply a counter of the number of rejectons per node since the node started. So if it's 1 the last time you read it, then it will be at least 1 the next time that you read it (unless it was restarted). With Elasticsearch, you can use a pipeline derivative aggregation to get the change, but Kibana does not yet support pipeline aggregations, so you'll just have to visualize the raw count.

Note: every document contains the source_node information, so you can filter by the node's name.

So, the next question is: How does this work in Kibana until it gets visualized in Marvel?

  1. You can create a visualization (probably want points, not bars or a pie chart).
  2. For the top level value (the Y value), then you would want to do max of node_stats.thread_pool.bulk.rejected (this gives the max rejections for any given bucket, thereby getting the most important value within any bucket).
  3. For the bottom value (the X value), aggregate by a date histogram. Bucket however you want (you probably want auto, but you can use minute, hour, whatever).

You would then want to also filter on the node's name to limit the graph to the individual node. Otherwise it's operating against all nodes (so you'd not necessarily want max unless you sub-aggregated the X value by node).

This will give the same effect as you got in Marvel 1.x.

Hope that helps,
Chris


(system) #5