Queries with nested fields and "having" clauses

I have this index with a nested field:

{
  "test-nested": {
    "mappings": {
      "properties": {
        "indicator": {
          "type": "nested",
          "properties": {
            "id": {
              "type": "keyword"
            },
            "value": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

where I inserted some docs like this:

POST test-nested/_doc
{
  "indicator": [
    { "id": "topic", "value": "plants" },
    { "id": "topic", "value": "animals" },
    { "id": "topic", "value": "rivers" },
    { "id": "country", "value": "FR" }
  ]
}

Now, I want to get all the docs having at least two indicators whose "id" is "topic".
Is it possible?

Welcome!

This might be doable but I think that the best/fastest approach is to compute that at index time, i.e, indexing something like:

{
  "indicator": {
    "topic": {
      "values: ["plants", "animals" , "rivers"],
      "size": 3
    },
    "country": {
      "values: ["FR"],
      "size": 1
    }
  }
}

Then finding a topic with more than 2 in indicator.topic.size is easier.

Thank you for your answer.
Unfortunately this wouldn't work in my case, as I have no control on the indexing. I'm dealing with already existing indexes controlled by third parties.
You said that my approach "is doable". Do you have any idea how this can be done?
Thank you.

Here is a trick:

DELETE nested
PUT nested
{
  "mappings": {
    "properties": {
      "indicator": {
        "type": "nested",
        "properties": {
          "id": {
            "type": "keyword"
          },
          "value": {
            "type": "keyword"
          }
        }
      }
    }
  }
}
POST nested/_doc
{
  "indicator": [
    { "id": "topic", "value": "plants" },
    { "id": "topic", "value": "animals" },
    { "id": "topic", "value": "rivers" },
    { "id": "country", "value": "FR" }
  ]
}
POST nested/_doc
{
  "indicator": [
    { "id": "topic", "value": "plants" },
    { "id": "topic", "value": "rivers" },
    { "id": "country", "value": "FR" }
  ]
}
POST nested/_doc
{
  "indicator": [
    { "id": "topic", "value": "rivers" },
    { "id": "country", "value": "FR" }
  ]
}
GET nested/_search
{
  "query": {
    "nested": {
      "path": "indicator",
      "score_mode": "sum",
      "query": {
        "constant_score": {
          "filter": {
            "term": {
              "indicator.id": {
                "value": "topic"
              }
            }
          },
          "boost": "1"
        }
      }
    }
  },
  "min_score": 2
}

Basically, set the _score to 1 for each matching nested doc, use sum to combine the _score of the nested docs and ask Elasticsearch to retrieve only documents where _score is >= to 2.

1 Like

Great!
It works like a charm.
Thank you.