How to identify message causing error in bulk request


#1

I'm using BulkProcessor. Something is causing a problem:

[elasticsearch[moo][transport_client_worker][T#19]{New I/O worker #84}] ERROR - Failed to index record: MapperParsingException[failed to parse [_source]]; nested: ElasticsearchParseException[Failed to parse content to map]; nested: JsonParseException[Unexpected character (':' (code 58)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name#012 at [Source: [B@541daa55; line: 1, column: 1555]];

I read this as a JSON parse error in the bulk request. Is that right? How do I turn this into information I can use? I doesn't tell me what index the failure happened in, it doesn't tell me what the message looked like.

                  public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
                            if (response.hasFailures()) {
                                    for (BulkItemResponse item : response.getItems()) {
                                            if (item.isFailed()) {
                                                    logger.error("Failed to index record: " + item.getFailureMessage());
                                            }
                                    }
                            }
                  }

Can I get any context about the message that causes the failure in BulkItemResponse, BulkResponse, or BulkRequest? I tried pulling the "payloads" out of the BulkItemResponse, but this didn't seem to correspond to any kind of message body so I can identify where the malformed message is.

Thanks for any help that can be offered.


(Nik Everett) #2

Something like this ought to do:

if (false == response.hasFailures()) {
  return;
}
for (int i = 0; i < response.getItems().size()) {
  if (false == response.getItems().get(i).isFailed()) {
    continue;
  }
  logger.error("Failed to index [" + request.requests().get(i) +  "]: [" + response.getItems().get(i).getFailureMessage() + "]");
}

Warning: I wrote this inside a little text box on a web page and didn't run it. It is almost certainly wrong. My only goal was to make it obvious that the requests and responses are kept in the same order.


(Nik Everett) #3

Its almost certainly ok to skip the first check - it just iterates and items and looks for one with a failure so this doesn't save any time and it makes the code longer.


#4

Thanks, I will give this a try and get back to you.


#5

Thanks! I am hoping this will be very helpful, I got significantly more feedback this way. There were some minor Array vs. ArrayList-isms to get it to work for me, but otherwise not bad for pseudocode.

What ultimately worked for me:

                  @Override
                  public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
                            if (response.hasFailures()) {
                                    for (int i = 0; i < response.getItems().length; i++) {
                                        BulkItemResponse item = response.getItems()[i];
                                        if (item.isFailed()) {
                                              IndexRequest ireq = (IndexRequest) request.requests().get(i);
                                              logger.error("Failed while indexing to " + item.getIndex() + " type " + item.getType() + " " +
                                                           "request: [" + ireq + "]: [" + item.getFailureMessage() + "]");
                                        }
                                    }
                            }
                  }

I now get:

[elasticsearch[Dyna-Mite][transport_client_worker][T#5]{New I/O worker #70}] ERROR - Failed while indexing to data-2016.02.26 type datatype request: [index {[data-2016.02.26][datatype][null], source[{"json_obj"}]: [MapperParsingException[failed to parse [_source]]; nested: ElasticsearchParseException[Failed to parse content to map]; nested: JsonParseException[Unexpected character (':' (code 58)): was expecting either valid name character (for unquoted name) or double-quote (for quoted) to start field name#012 at [Source: [B@5f09799a; line: 1, column: 1779]]; ]

I am hopeful that this will be a great help in troubleshooting this problem, thanks.


#6

Just wanted to say thanks again. This was a real help for me and allowed me to move forward in my work rather than being frustrated and wondering what was going wrong.


(Nik Everett) #7

Sure! I'm glad I could help!

I am a bit frustrated that the Client interface mixes lists and arrays arbitrarily. We'll get a real java client soon-ish without all the dependencies for Elasticsearch's core and I'll try to do some of the code reviews for it so I can make sure it is consistent about which one it uses.


(Ivan Brusic) #8

As if working with JSON will make it any easier! :slight_smile:


(Jörg Prante) #9

I am also curious whether the new Java HTTP client will just throw JSON over the fence, or if it will parse JSON into a new improved ES API.


(Ivan Brusic) #10

I believe their goal is to have a consistent API between all their clients.
My guess is JSON.

The existing binary API simply throws Map<String, Object> over the wire,
and no one really has written a good ORM for Elasticsearch. Everyone is
using Jackson databinding or GSON. Since the client will be standalone, I
wonder if ES will continue using Jackson.

Ivan


(system) #11