var Exists = Driver.DocumentExists<Mydata>(Id).Exists;
It checks if a document exists with the Id passed as a parameter. The call behaves as expected, however it throws some exceptions that get caught, but appear in the debugger on every call where a document is not found:
Exception: Exception thrown: 'System.Net.WebException' in System.dll
("The remote server returned an error: (404) Not Found.").
Exception thrown: 'System.Net.WebException' in System.dll
("The remote server returned an error: (404) Not Found.")
I have to do this for tens of thousands of documents; essentially I want to delete a large number of documents but I don't have all the IDs at once to make batch calls and since I need to perform other operations at the same time, I can't really batch things. Deleting without checking if the document exists triggers the same exception.
The exceptions are making the code run insanely slow in the debugger. Am I missing something, or is it the expected behavior?
The default IConnection implementation (the type that makes the actual HTTP request) in NEST for desktop CLR uses HttpWebRequest internally which, as the default behaviour, throws a WebException when a 404 HTTP status code is returned.
Since the default behaviour of the client is not to throw on exceptions, and some endpoints quite rightly can return a 404 (NEST knows which ones these are), the exception is caught internally with a try/catch, but could still show up in the debugger. As far as I am aware, the most recent stable HttpClient implementation for desktop CLR also uses HttpWebRequest under the covers in the HttpClientHandler used to make the request, so, although it does not throw on 404s, it also internally catches the exception in a try/catch.
Deleting tens of thousands of documents is, relatively speaking, the worst performing operation in Elasticsearch, due to the way in which Lucene works under the covers. If you can, it is a good practice to partition documents into separate indices, and delete the entire index but depending on the use case, this is not always possible.
Are you running with the debugger attached in production?
Hi,
yes, I saw your reply on Stack Overflow.
The debugger is not attached in production, but we're doing some heavy testing with a lot of data and it is making the tests incredibly slow.
Since we couldn't partition the data in multiple index, would it make sense to have a "deleted" field so that we would just update that part of the documents instead of deleting them; knowing that all queries would have to include that field to make sure only the live docs can be returned?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.