How to identify message causing error in bulk request

I'm using BulkProcessor. Something is causing a problem:

[elasticsearch[moo][transport_client_worker][T#19]{New I/O worker #84}] ERROR - Failed to index record: MapperParsingException[failed to parse [_source]]; nested: ElasticsearchParseException[Failed to parse content to map]; nested: JsonParseException[Unexpected character (':' (code 58)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name#012 at [Source: [B@541daa55; line: 1, column: 1555]];

I read this as a JSON parse error in the bulk request. Is that right? How do I turn this into information I can use? I doesn't tell me what index the failure happened in, it doesn't tell me what the message looked like.

                  public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
                            if (response.hasFailures()) {
                                    for (BulkItemResponse item : response.getItems()) {
                                            if (item.isFailed()) {
                                                    logger.error("Failed to index record: " + item.getFailureMessage());
                                            }
                                    }
                            }
                  }

Can I get any context about the message that causes the failure in BulkItemResponse, BulkResponse, or BulkRequest? I tried pulling the "payloads" out of the BulkItemResponse, but this didn't seem to correspond to any kind of message body so I can identify where the malformed message is.

Thanks for any help that can be offered.

Something like this ought to do:

if (false == response.hasFailures()) {
  return;
}
for (int i = 0; i < response.getItems().size()) {
  if (false == response.getItems().get(i).isFailed()) {
    continue;
  }
  logger.error("Failed to index [" + request.requests().get(i) +  "]: [" + response.getItems().get(i).getFailureMessage() + "]");
}

Warning: I wrote this inside a little text box on a web page and didn't run it. It is almost certainly wrong. My only goal was to make it obvious that the requests and responses are kept in the same order.

2 Likes

Its almost certainly ok to skip the first check - it just iterates and items and looks for one with a failure so this doesn't save any time and it makes the code longer.

Thanks, I will give this a try and get back to you.

Thanks! I am hoping this will be very helpful, I got significantly more feedback this way. There were some minor Array vs. ArrayList-isms to get it to work for me, but otherwise not bad for pseudocode.

What ultimately worked for me:

                  @Override
                  public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
                            if (response.hasFailures()) {
                                    for (int i = 0; i < response.getItems().length; i++) {
                                        BulkItemResponse item = response.getItems()[i];
                                        if (item.isFailed()) {
                                              IndexRequest ireq = (IndexRequest) request.requests().get(i);
                                              logger.error("Failed while indexing to " + item.getIndex() + " type " + item.getType() + " " +
                                                           "request: [" + ireq + "]: [" + item.getFailureMessage() + "]");
                                        }
                                    }
                            }
                  }

I now get:

[elasticsearch[Dyna-Mite][transport_client_worker][T#5]{New I/O worker #70}] ERROR - Failed while indexing to data-2016.02.26 type datatype request: [index {[data-2016.02.26][datatype][null], source[{"json_obj"}]: [MapperParsingException[failed to parse [_source]]; nested: ElasticsearchParseException[Failed to parse content to map]; nested: JsonParseException[Unexpected character (':' (code 58)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name#012 at [Source: [B@5f09799a; line: 1, column: 1779]]; ]

I am hopeful that this will be a great help in troubleshooting this problem, thanks.

Just wanted to say thanks again. This was a real help for me and allowed me to move forward in my work rather than being frustrated and wondering what was going wrong.

Sure! I'm glad I could help!

I am a bit frustrated that the Client interface mixes lists and arrays arbitrarily. We'll get a real java client soon-ish without all the dependencies for Elasticsearch's core and I'll try to do some of the code reviews for it so I can make sure it is consistent about which one it uses.

As if working with JSON will make it any easier! :slight_smile:

I am also curious whether the new Java HTTP client will just throw JSON over the fence, or if it will parse JSON into a new improved ES API.

I believe their goal is to have a consistent API between all their clients.
My guess is JSON.

The existing binary API simply throws Map<String, Object> over the wire,
and no one really has written a good ORM for Elasticsearch. Everyone is
using Jackson databinding or GSON. Since the client will be standalone, I
wonder if ES will continue using Jackson.

Ivan