Missing documents

TheOutlander_2 · November 9, 2012, 9:01pm

I'm still in the process of reproducing the problem, so I don't have too
many details. However, I wanted to ask if anyone has faced an issue with
missing documents?

One of our .NET web services makes 2000 inserts into ElasticSearch where
the avg. document size is 20KB. On the test machine, we ended up with
50-70% of the inserted documents. I'm unable to repro the problem locally.
The setup is identical. The service also inserts the same document into SQL
and all of those end up there.

My inserts don't check for the response for higher throughput (assume that
they all make it), but I think I will need to so that I can retry. What
kind of failures messages should I look from Elastic if the insert fails?
So far, I'm only seeing success messages. For test, I'm going to negate
that and log anything I encounter.

However, I'd like to understand what the correct protocol is and why all
the documents might not be making it. The machine has 16 cores and 16 GB
memory and there are no resource issues.

Thanks,
Nick

--

Igor_Motov · November 10, 2012, 2:06am

Elasticsearch should return message like this if the record was indexed
correctly:

{"ok":true,"_index":"test","_type":"doc","_id":"1","_version":1}

and http response should have status 200. A typical error would look like
this:

{"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected character ('{' (code 123)): was expecting
either valid name character (for unquoted name) or double-quote (for
quoted) to start field name\n at [Source: [B@4eff1d61; line: 1, column: 229]]; ","status":400}

You should definitely check for errors. Insert can fail for different
reasons: network issues, json can be malformed, the stored data might not
match the mapping, date format might be wrong, etc.

If you are providing record ids, make sure that indeed uniquely identify
your documents.

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't have too
many details. However, I wanted to ask if anyone has faced an issue with
missing documents?

One of our .NET web services makes 2000 inserts into Elasticsearch where
the avg. document size is 20KB. On the test machine, we ended up with
50-70% of the inserted documents. I'm unable to repro the problem locally.
The setup is identical. The service also inserts the same document into SQL
and all of those end up there.

My inserts don't check for the response for higher throughput (assume that
they all make it), but I think I will need to so that I can retry. What
kind of failures messages should I look from Elastic if the insert fails?
So far, I'm only seeing success messages. For test, I'm going to negate
that and log anything I encounter.

However, I'd like to understand what the correct protocol is and why all
the documents might not be making it. The machine has 16 cores and 16 GB
memory and there are no resource issues.

Thanks,
Nick

--

TheOutlander_2 · November 10, 2012, 5:08am

Thanks, I forgot to mention two details about what you said:

We're inserting the same two documents in the test - 1000 times each. We
know that they're not malformed.
The record id's are unique GUID's. That was one of my concerns as well,
but we've validated that by sticking them in a dictionary.
Also, everything is on a single machine for this test so I don't think
the network was an issue.

When I added code to wait for a response, the testers haven't seen any
failed inserts....but we will have to validate that it's working with
several more tests.

I didn't realize that it returned other error codes if insert failed. That
should be caught in the catch block now that we have more logging.

Thanks,
Nick

On Friday, November 9, 2012 6:06:29 PM UTC-8, Igor Motov wrote:

Elasticsearch should return message like this if the record was indexed
correctly:

{"ok":true,"_index":"test","_type":"doc","_id":"1","_version":1}

and http response should have status 200. A typical error would look like
this:

{"error":"MapperParsingException[Failed to parse]; nested:
JsonParseException[Unexpected character ('{' (code 123)): was expecting
either valid name character (for unquoted name) or double-quote (for
quoted) to start field name\n at [Source: [B@4eff1d61; line: 1, column: 229]]; ","status":400}

You should definitely check for errors. Insert can fail for different
reasons: network issues, json can be malformed, the stored data might not
match the mapping, date format might be wrong, etc.

If you are providing record ids, make sure that indeed uniquely identify
your documents.

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't have too
many details. However, I wanted to ask if anyone has faced an issue with
missing documents?

One of our .NET web services makes 2000 inserts into Elasticsearch where
the avg. document size is 20KB. On the test machine, we ended up with
50-70% of the inserted documents. I'm unable to repro the problem locally.
The setup is identical. The service also inserts the same document into SQL
and all of those end up there.

My inserts don't check for the response for higher throughput (assume
that they all make it), but I think I will need to so that I can
retry. What kind of failures messages should I look from Elastic if the
insert fails? So far, I'm only seeing success messages. For test, I'm going
to negate that and log anything I encounter.

However, I'd like to understand what the correct protocol is and why all
the documents might not be making it. The machine has 16 cores and 16 GB
memory and there are no resource issues.

Thanks,
Nick

--

drewr · November 12, 2012, 3:07am

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't
have too many details. However, I wanted to ask if anyone has
faced an issue with missing documents?

One of our .NET web services makes 2000 inserts into
Elasticsearch where the avg. document size is 20KB. On the test
machine, we ended up with 50-70% of the inserted documents.

Thanks, I forgot to mention two details about what you said:

We're inserting the same two documents in the test - 1000 times
each. We know that they're not malformed.

The record id's are unique GUID's. That was one of my concerns
as well, but we've validated that by sticking them in a dictionary.

Also, everything is on a single machine for this test so I don't
think the network was an issue.

In addition to checking for errors as Igor suggested, if you're
checking the count quickly after insertion, you need to make sure to
refresh the index so you make your newly created documents available.

Elasticsearch Platform — Find real-time answers at scale | Elastic

-Drew

--

TheOutlander_2 · November 20, 2012, 11:09pm

Interesting. Thanks for the suggestion.
We were checking counts several minutes/hours later so I don't think
refresh was the issue. We validated that by using curl to insert additional
documents and checking the count to make sure it went up.

-Nick

On Sunday, November 11, 2012 7:07:56 PM UTC-8, Drew Raines wrote:

On Friday, November 9, 2012 4:01:09 PM UTC-5, TheOutlander wrote:

I'm still in the process of reproducing the problem, so I don't
have too many details. However, I wanted to ask if anyone has
faced an issue with missing documents?

One of our .NET web services makes 2000 inserts into
Elasticsearch where the avg. document size is 20KB. On the test
machine, we ended up with 50-70% of the inserted documents.

Thanks, I forgot to mention two details about what you said:

We're inserting the same two documents in the test - 1000 times
each. We know that they're not malformed.

The record id's are unique GUID's. That was one of my concerns
as well, but we've validated that by sticking them in a dictionary.

Also, everything is on a single machine for this test so I don't
think the network was an issue.

In addition to checking for errors as Igor suggested, if you're
checking the count quickly after insertion, you need to make sure to
refresh the index so you make your newly created documents available.

Elasticsearch Platform — Find real-time answers at scale | Elastic

-Drew

--

Topic		Replies	Views
Missing documents during bulk insert Elasticsearch	5	961	July 21, 2022
Messages getting lost while insertion Elasticsearch	2	361	July 6, 2017
NEST indexing missing documents without errors Elasticsearch language-clients	2	269	October 18, 2021
How to check if a document was successfully insert into elasticsearch Logstash	5	1294	October 18, 2017
Writes to Elastic search fails Elasticsearch	8	1929	July 5, 2017

Missing documents

Related topics