Not in other index

Considering these two indices:

## Mappings
PUT breeds
{
  "mappings": {
    "properties": {
      "size": { "type": "keyword" },
      "guarding_skills": { "type": "keyword" }
    }
  }
}

PUT dogs
{
  "mappings": {
    "properties": {
      "breed": { "type": "keyword" },
      "name": { "type": "keyword" },
    }
  }
}

## Datas
PUT breeds/_doc/german_sheperd
{
  "size":   "large",
  "guarding_skills": "high"
}

PUT breeds/_doc/poodle
{
  "size":   "small",
  "guarding_skills": "low"
}

PUT dogs/_doc/1
{
  "breed":   "german_sheperd",
  "name":   "cooper",
}

PUT dogs/_doc/2
{
  "breed":   "poodle",
  "name":   "duke",
}

PUT dogs/_doc/3
{
  "breed":   "beagle",
  "name":   "toby",
}

I make an aggregation on /dogs index to get unique list of breeds like this:

GET /dogs/_search
{
"size": 0,
"aggs": {
    "urls": {
      "terms": {
        "field": "breed"
      }
    }
  }
}

But I also would like to filter out breeds witch aren't in the breeds index, in this example, I would only get beagle.

I can make a query first to get a list of breeds from the breeds index and then use a must_not with terms like this:

GET /dogs/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must_not": [
        {
          "terms": {
            "breed": [
              "german_sheperd",
              "poodle"
            ]
          }
        }
      ]
    },
    "aggs": {
      "urls": {
        "terms": {
          "field": "breed"
        }
      }
    }
  }
}

But I wish there is a more elegant way in a single query. Besides, my case involves a large amount of values (~10k) to use in the must_not clause.

I'm pretty sure this kind of question has been asked many times, but I didn't find any similar case.

Bonjour Loic and welcome :hugs: .

What you want to do would mean doing "joins" in Elasticsearch, which is not really supported. From this page: Joining queries | Elasticsearch Guide [8.11] | Elastic

Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.

You could look at Join field type | Elasticsearch Guide [8.11] | Elastic

But, the best option is to do the join at index time if doable.

1 Like

Thanks for this answer.
What do you mean by ?

Something like:

PUT dogs/_doc/1
{
  "breed":  { 
    "type": "german_sheperd",
    "size":   "large",
    "guarding_skills": "high"
  },
  "name":   "cooper"
}

Ok, I see, thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.