Retrieve documents matching criteria along with missing documentIds

priyakc · September 19, 2022, 7:31am

I have list of document ids. When I query Elasticsearch, In result I need document ids which are matching my criteria and documentIds for which documents do not exist in Elasticsearch.

Background on use case
I need to search for missing reports for sites. DocumentId of report is the siteId.
A report is missing if its reportState is NEW (not submitted) or if it doesn't exist in Elasticsearch.

Expected Request and Response

Request: all sites i.e documentIds in request to Elasticsearch
[ABC1, ABC2, ABC3]

Documents in Elasticsearch
[ABC1 with reportState NEW, ABC2 with reportState SUBMITTED, ABC3 document doesn't exist with this documentId]

Now I want to build query X such that my response will retrieve below documentIds.
[ABC1, ABC3]

{
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "reportState": "NEW"
                    }
                },
                {
                    //Build something like below specifying document doesn't exist
                    "must_not": {
                        "exists": {
                            "field": "reportState"
                        }
                    }
                }
            ]
        }
    }
}

Available Alternative
I found an approach using mget on Check documents not existing at elasticsearch - Stack Overflow but I require to paginate over results, hence looking for a query.

anime_lover · September 19, 2022, 11:04am

can you please write question with example...where we can also operate on kibana to find solution

Christian_Dahlqvist · September 19, 2022, 11:53am

A query can never return documents that do not exist, so I do not see how you can build a query to return what you describe. If you knew you expected these two documents you could write a query to return the first one and deduce the other one is not present in the client application.

priyakc · September 19, 2022, 12:08pm

I updated the question with sample query.

priyakc · September 19, 2022, 12:13pm

I cannot do that. With your logic

Request: [ABC1, ABC2, ABC3]

Elasticsearch Response: [ABC1]
By deducing other ones are not present [ABC2, ABC3], computed response becomes [ABC1, ABC2, ABC3]
Whereas expected response is [ABC1, ABC3].

The issue here is if I fire a query with just one criteria ReportState=NEW, I do not know whether ABC2 didn't match my query or whether it didn't even exist.

Christian_Dahlqvist · September 19, 2022, 12:23pm

A query can not return a document that does not exist, so you need to find a way to handle this in your application logic or perhaps run multiple queries.

priyakc · September 19, 2022, 12:31pm

Ofcourse I can handle this in application logic but its going to be a heavy operation.

I am looking for an optimised approach to achieve this. Using mget is the only way I could find so far, but I need pagination as well over missing reports.

Christian_Dahlqvist · September 19, 2022, 1:40pm

Can you perhaps initially create documents matching all reports you expect and let these have a special field that indicate that they do not yet exist. When you submit a report you overwrite this with the proper document. This way you might be able to adjust your search criteria to return these together with the matching ones.

priyakc · September 22, 2022, 10:33am

We have DynamoDB as primary storage, we would require to build a complex mechanism to create documents by default for our use case. That would be last option for us. Looking to solve using Elasticsearch itself.

Christian_Dahlqvist · September 22, 2022, 10:51am

That is as far as I know not possible.

Mahi13 · September 25, 2022, 3:01am

Hi,

Try wrapping must_not query with another bool query because must_not query works only with bool query

{
    "query": {
        "bool": {
            "should": [
                {
                    "term": {
                        "reportState": "NEW"
                    }
                },
                {
                    //Build something like below specifying document doesn't exist
                    "bool":[
                        "must_not": {
                              "exists": {
                                   "field": "reportState"
                                }
                         }
                     ]
                }
            ]
        }
    }
}

system · October 23, 2022, 3:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Document exist check Elasticsearch	5	2958	July 5, 2017
ELK 6.7.0: Search query match on inexisting fields Elasticsearch	4	445	May 1, 2019
Query to fetch Document if certain nested document does not exists Elasticsearch	5	1001	May 7, 2020
Query retrieving documents when only one of two conditions are found under same aggregation Elasticsearch eql-elastic-query-language	1	174	November 22, 2023
Is there a way to return true/false if documents exists for the given query? Elasticsearch eql-elastic-query-language	6	681	November 16, 2022

Retrieve documents matching criteria along with missing documentIds

Related topics