Help improving this query before it explodes!

Hey,

I made a working request but it kinda sucks ! Was wondering if a pleasing soul could help me get to something better...

The problem

Lets say I have 2 types of documents for simplicty's sake.

  • The type square and oval have a uuid key which acts as a primary key.

  • The type shape_info also uses the uuid key but the document is optional.

      {
          "type": "square",
          "uuid":  "xxxx"
      },
      {
          "type": "oval",
          "uuid":  "zzzz"
      },
      {
          "type": "square",
          "uuid":  "xxxx"
      },
      {
          "type": "square",
          "uuid":  "xxxx"
      },
      {
          "type": "oval",
          "uuid":  "zzzz"
      },
      {
          "type": "shape_info",
          "uuid":  "zzzz"
      }
    

Now, I would like to search for all UUIDs documents that don't already have a type: shape_info.

For instance, with the following document:

{
    "type": "shape_info",
    "uuid":  "zzzz"
}

I would like all documents with uuid: zzzz to be excluded as this uuid has shape_info document.

What I did

To achieve this I had to do 2 requests:

{
   "size": 0,
    "query": {
        "bool": {
            "must": [
                {
                    "terms": {
                        "type.keyword": ["shape_info"]
                    }
                }
            ]
        }
    },
    "aggs": {
        "uuids": {
            "composite": {
                "size": 50,
                "sources": [
                    {"myfield": {"terms": {"field": "uuid.keyword"}}}
                ]
            }
        }
    }
}

=> This returns the list of uuids that have the type:shape_info. Now I must get all records that do NOT have this uuid...

GET /datasets-1.3.0/_search
{
  "query": {
    "bool": {
      "must": [
       #not relevant
      ],
      "must_not": [
        { "match" : { "uuid": "zzzz"} }
      ]
    }
  }
}

I added the list of UUIDs we got from the previous request in the must_not section and it works...

But when I'll have thousand of uuids returned by the first query the must_not will then have thousand of UUIDs too. My god...

I don't think it's any good performance wise and I would like to find a way to merge the two queries so that I d'ont have to use a fat array of strings inside the second query !

Any help is appreciated,
Thx !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.