Missing documents

I'm still in the process of reproducing the problem, so I don't have too
many details. However, I wanted to ask if anyone has faced an issue with
missing documents?

One of our .NET web services makes 2000 inserts into ElasticSearch where
the avg. document size is 20KB. On the test machine, we ended up with
50-70% of the inserted documents. I'm unable to repro the problem locally.
The setup is identical. The service also inserts the same document into SQL
and all of those end up there.

My inserts don't check for the response for higher throughput (assume that
they all make it), but I think I will need to so that I can retry. What
kind of failures messages should I look from Elastic if the insert fails?
So far, I'm only seeing success messages. For test, I'm going to negate
that and log anything I encounter.

However, I'd like to understand what the correct protocol is and why all
the documents might not be making it. The machine has 16 cores and 16 GB
memory and there are no resource issues.

Thanks,
Nick

--

Elasticsearch should return message like this if the record was indexed
correctly:

{"ok":true,"_index":"test","_type":"doc","_id":"1","_version":1}

and http response should have status 200. A typical error would look like
this:

{"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected character ('{' (code 123)): was expecting
either valid name character (for unquoted name) or double-quote (for
quoted) to start field name\n at [Source: [B@4eff1d61; line: 1, column: 229]]; ","status":400}

You should definitely check for errors. Insert can fail for different
reasons: network issues, json can be malformed, the stored data might not
match the mapping, date format might be wrong, etc.

If you are providing record ids, make sure that indeed uniquely identify
your documents.

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't have too
many details. However, I wanted to ask if anyone has faced an issue with
missing documents?

One of our .NET web services makes 2000 inserts into ElasticSearch where
the avg. document size is 20KB. On the test machine, we ended up with
50-70% of the inserted documents. I'm unable to repro the problem locally.
The setup is identical. The service also inserts the same document into SQL
and all of those end up there.

My inserts don't check for the response for higher throughput (assume that
they all make it), but I think I will need to so that I can retry. What
kind of failures messages should I look from Elastic if the insert fails?
So far, I'm only seeing success messages. For test, I'm going to negate
that and log anything I encounter.

However, I'd like to understand what the correct protocol is and why all
the documents might not be making it. The machine has 16 cores and 16 GB
memory and there are no resource issues.

Thanks,
Nick

--

Thanks, I forgot to mention two details about what you said:

  1. We're inserting the same two documents in the test - 1000 times each. We
    know that they're not malformed.
  2. The record id's are unique GUID's. That was one of my concerns as well,
    but we've validated that by sticking them in a dictionary.
  3. Also, everything is on a single machine for this test so I don't think
    the network was an issue.

When I added code to wait for a response, the testers haven't seen any
failed inserts....but we will have to validate that it's working with
several more tests.

I didn't realize that it returned other error codes if insert failed. That
should be caught in the catch block now that we have more logging.

Thanks,
Nick

On Friday, November 9, 2012 6:06:29 PM UTC-8, Igor Motov wrote:

Elasticsearch should return message like this if the record was indexed
correctly:

{"ok":true,"_index":"test","_type":"doc","_id":"1","_version":1}

and http response should have status 200. A typical error would look like
this:

{"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected character ('{' (code 123)): was expecting
either valid name character (for unquoted name) or double-quote (for
quoted) to start field name\n at [Source: [B@4eff1d61; line: 1, column: 229]]; ","status":400}

You should definitely check for errors. Insert can fail for different
reasons: network issues, json can be malformed, the stored data might not
match the mapping, date format might be wrong, etc.

If you are providing record ids, make sure that indeed uniquely identify
your documents.

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't have too
many details. However, I wanted to ask if anyone has faced an issue with
missing documents?

One of our .NET web services makes 2000 inserts into ElasticSearch where
the avg. document size is 20KB. On the test machine, we ended up with
50-70% of the inserted documents. I'm unable to repro the problem locally.
The setup is identical. The service also inserts the same document into SQL
and all of those end up there.

My inserts don't check for the response for higher throughput (assume
that they all make it), but I think I will need to so that I can
retry. What kind of failures messages should I look from Elastic if the
insert fails? So far, I'm only seeing success messages. For test, I'm going
to negate that and log anything I encounter.

However, I'd like to understand what the correct protocol is and why all
the documents might not be making it. The machine has 16 cores and 16 GB
memory and there are no resource issues.

Thanks,
Nick

--

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't
have too many details. However, I wanted to ask if anyone has
faced an issue with missing documents?

One of our .NET web services makes 2000 inserts into
ElasticSearch where the avg. document size is 20KB. On the test
machine, we ended up with 50-70% of the inserted documents.

Thanks, I forgot to mention two details about what you said:

  1. We're inserting the same two documents in the test - 1000 times
    each. We know that they're not malformed.
  2. The record id's are unique GUID's. That was one of my concerns
    as well, but we've validated that by sticking them in a dictionary.
  3. Also, everything is on a single machine for this test so I don't
    think the network was an issue.

In addition to checking for errors as Igor suggested, if you're
checking the count quickly after insertion, you need to make sure to
refresh the index so you make your newly created documents available.

http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh.html

-Drew

--

Interesting. Thanks for the suggestion.
We were checking counts several minutes/hours later so I don't think
refresh was the issue. We validated that by using curl to insert additional
documents and checking the count to make sure it went up.

-Nick

On Sunday, November 11, 2012 7:07:56 PM UTC-8, Drew Raines wrote:

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't
have too many details. However, I wanted to ask if anyone has
faced an issue with missing documents?

One of our .NET web services makes 2000 inserts into
ElasticSearch where the avg. document size is 20KB. On the test
machine, we ended up with 50-70% of the inserted documents.

Thanks, I forgot to mention two details about what you said:

  1. We're inserting the same two documents in the test - 1000 times
    each. We know that they're not malformed.
  2. The record id's are unique GUID's. That was one of my concerns
    as well, but we've validated that by sticking them in a dictionary.
  3. Also, everything is on a single machine for this test so I don't
    think the network was an issue.

In addition to checking for errors as Igor suggested, if you're
checking the count quickly after insertion, you need to make sure to
refresh the index so you make your newly created documents available.

http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh.html

-Drew

--