I would like to get all of the documents for a user in elastic enterprise search using @elastic/enterprise-search npm package.
The only way i had some success was using client.app.search() using empty string for query but it seems unreliable and i am trying to look for another way to do it.
Elastic currently in use is 8.12.1 i can upgrade if needed
The only way i had some success was using client.app.search() using empty string for query but it seems unreliable
Can you expand on what makes it seem unreliable? The App Search search query should return all docs ordered by ID by default.
If docs are being added, deleted, updated, etc. during the search operation and that search result is paginated, there could be some inconsistencies in the results. You can resolve that by using point in time API, but to do this you'd need to use the Elasticsearch JS client to send the requests to Elasticsearch instead of Enterprise Search.
What i was thinking actually is there an easier way to find all documents by property i es has deleteByQuery can we use that in conjuction to elastic enterprise search
Anything that is officially supported in the HTTP Search API will work in the body field for client.app.search(). If there's nothing there that can be used to find the specific docs for a user based on your setup, then perhaps the Elasticsearch client will be more suited.
But first, going back to the original question:
I would like to get all of the documents for a user in elastic enterprise search
How are the documents differentiated per user? Is this who uploaded the doc, or is there a user id associated to the doc, or something else?
@nfeekery there is userId defined in the document.
In regards to any Elasticsearch client i am having issues using deleteByQuery since i cant get the index of my engine that is used or maybe i am doing something wrong
Also, you can use the search explain API which should show you the exact queries Enterprise Search will run against Elasticsearch. This will also show the index name for you.
@nfeekery I have used it like this and it works really good. I have one follow up question it might be even for a separate thread.
If we are deleting thousands or even 10 of thousands of documents can it timeout and how can we know and what should we do in those cases should we split the query somehow into smaller batches.
If you're concerned about timeouts for large payloads, you can run deleteByQuery asynchronously by using the flag wait_for_completion=false . The API will then return a task id that you can check the status on. Here's the documentation for that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.