Call to "DocumentExists" triggers an exception (NEST / C#)


(Thomas) #1

I have the following call:

var Exists = Driver.DocumentExists<Mydata>(Id).Exists;

It checks if a document exists with the Id passed as a parameter. The call behaves as expected, however it throws some exceptions that get caught, but appear in the debugger on every call where a document is not found:

Exception: Exception thrown: 'System.Net.WebException' in System.dll
("The remote server returned an error: (404) Not Found.").
Exception thrown: 'System.Net.WebException' in System.dll
("The remote server returned an error: (404) Not Found.")

(this is how it looks: http://imgur.com/a/kFIZP, pages and pages of this)

I have to do this for tens of thousands of documents; essentially I want to delete a large number of documents but I don't have all the IDs at once to make batch calls and since I need to perform other operations at the same time, I can't really batch things. Deleting without checking if the document exists triggers the same exception.

The exceptions are making the code run insanely slow in the debugger. Am I missing something, or is it the expected behavior?


(Russ Cam) #2

Hey @ThomasD3,

I answered your same question on Stack Overflow, but will add here for posterity.

The default IConnection implementation (the type that makes the actual HTTP request) in NEST for desktop CLR uses HttpWebRequest internally which, as the default behaviour, throws a WebException when a 404 HTTP status code is returned.

Since the default behaviour of the client is not to throw on exceptions, and some endpoints quite rightly can return a 404 (NEST knows which ones these are), the exception is caught internally with a try/catch, but could still show up in the debugger. As far as I am aware, the most recent stable HttpClient implementation for desktop CLR also uses HttpWebRequest under the covers in the HttpClientHandler used to make the request, so, although it does not throw on 404s, it also internally catches the exception in a try/catch.

Deleting tens of thousands of documents is, relatively speaking, the worst performing operation in Elasticsearch, due to the way in which Lucene works under the covers. If you can, it is a good practice to partition documents into separate indices, and delete the entire index but depending on the use case, this is not always possible.

Are you running with the debugger attached in production?


(Thomas) #3

Hi,
yes, I saw your reply on Stack Overflow.
The debugger is not attached in production, but we're doing some heavy testing with a lot of data and it is making the tests incredibly slow.
Since we couldn't partition the data in multiple index, would it make sense to have a "deleted" field so that we would just update that part of the documents instead of deleting them; knowing that all queries would have to include that field to make sure only the live docs can be returned?


(Russ Cam) #4

What version of Elasticsearch are you running against?


(Thomas) #5

we're using 5.0.2


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.