Get all of the documents for user in elastic enterprise search

I would like to get all of the documents for a user in elastic enterprise search using @elastic/enterprise-search npm package.

The only way i had some success was using client.app.search() using empty string for query but it seems unreliable and i am trying to look for another way to do it.

Elastic currently in use is 8.12.1 i can upgrade if needed

Thanks in advance

Hi @CTO_Servefast

The only way i had some success was using client.app.search() using empty string for query but it seems unreliable

Can you expand on what makes it seem unreliable? The App Search search query should return all docs ordered by ID by default.

If docs are being added, deleted, updated, etc. during the search operation and that search result is paginated, there could be some inconsistencies in the results. You can resolve that by using point in time API, but to do this you'd need to use the Elasticsearch JS client to send the requests to Elasticsearch instead of Enterprise Search.

What i was thinking actually is there an easier way to find all documents by property i es has deleteByQuery can we use that in conjuction to elastic enterprise search

Anything that is officially supported in the HTTP Search API will work in the body field for client.app.search(). If there's nothing there that can be used to find the specific docs for a user based on your setup, then perhaps the Elasticsearch client will be more suited.

But first, going back to the original question:

I would like to get all of the documents for a user in elastic enterprise search

How are the documents differentiated per user? Is this who uploaded the doc, or is there a user id associated to the doc, or something else?

@nfeekery there is userId defined in the document.

In regards to any Elasticsearch client i am having issues using deleteByQuery since i cant get the index of my engine that is used or maybe i am doing something wrong

The index of the engine should be .ent-search-engine-documents-<engine_name>. Is that the index you're trying to use deleteByQuery against?

EDIT: if the engines are considerably older they may be .app-search-<engine_name>

Also, you can use the search explain API which should show you the exact queries Enterprise Search will run against Elasticsearch. This will also show the index name for you.

This is perfect thank you i will try it also it would be great if we could get index name for the engine

@nfeekery I have used it like this and it works really good. I have one follow up question it might be even for a separate thread.

If we are deleting thousands or even 10 of thousands of documents can it timeout and how can we know and what should we do in those cases should we split the query somehow into smaller batches.

Sample code:

    const index = `.ent-search-engine-documents-${engineName}`;
    const deleteResponse = await elasticSearchClient.deleteByQuery({
      index: index,
      query: {
        match: {
          user_id: userId,
        },
      },
    });

Best Regards

@CTO_Servefast glad to hear that it's working!

If you're concerned about timeouts for large payloads, you can run deleteByQuery asynchronously by using the flag wait_for_completion=false . The API will then return a task id that you can check the status on. Here's the documentation for that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.