What's the problem with these query?

Hello, I have these index:

{
    "my_collection_data": {
        "mappings": {
            "document_data_index": {
                "properties": {
                    "abstract": {
                        "type": "keyword"
                    },
                    "bibliographic_citation": {
                        "type": "keyword"
                    },
                    "contributor": {
                        "type": "keyword"
                    },
                    "datasets": {
                        "type": "nested",
                        "properties": {
                            "identifier": {
                                "type": "keyword"
                            },
                            "keywords": {
                                "type": "keyword"
                            },
                            "title": {
                                "type": "keyword"
                            }
                        }
                    },
                    "date_available": {
                        "type": "date"
                    },
                    "date_created": {
                        "type": "date"
                    },
                    "doc_format": {
                        "type": "keyword"
                    },
                    "file": {
                        "type": "keyword"
                    },
                    "identifier": {
                        "type": "keyword"
                    },
                    "license": {
                        "type": "keyword"
                    },
                    "subject": {
                        "type": "keyword"
                    },
                    "title": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

I need to get all documents with datasets=XXXX, so I tried these query:

{
  "from": 0,
  "size": 10,
  "query": {
    "match_all": {
      "boost": 1
    }
  },
  "post_filter": {
    "terms": {
      "datasets": [
        "Tidbits"
      ],
      "boost": 1
    }
  },
  "_source": {
    "includes": [],
    "excludes": []
  },
  "aggregations": {
    "_filter_datasets": {
      "filter": {
        "match_all": {
          "boost": 1
        }
      },
      "aggregations": {
        "datasets": {
          "terms": {
            "field": "datasets",
            "size": 999,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_term": "asc"
              }
            ]
          }
        }
      }

    }
  },
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

It doesn't return any record...

What I'm doing wrong here?

regards

Your post filter is executing a terms query on the "datasets" field, but
that field is defined as a nested field in your mapping. You perhaps mean
datasets.keywords, datasets.title, etc.. Besides correcting the field, you
would need to wrap the terms query in a nested query.

Hi @Ivan

Thanks for your reply, but I really doesn't understand... can you give me a example?

P.S.: These query is genereated using elasticsearch-dsl for python... maybe something is wrong when I build my query..

If you provided an example document that should match (with the mapping
defined in the first post), perhaps we can help further.

Your filter is essentially:

"terms": {
  "datasets": [
    "Tidbits"
  ]
}

but your mapping defines the "datasets" fields as nested:

"datasets": {
"type": "nested",
"properties": {
...
}
}

Once again, example documents will help.

Cheers,

Ivan

Hi @Ivan
Here is the result with no filter:

{
    "title": [
        {
            "title": "doc1",
            "abstract": "doc1",
            "subject": "doc1",
            "contributor": "fellipe",
            "date_created": "2017-08-22T03:00:00Z",
            "date_available": "2017-08-22T03:00:00Z",
            "identifier": "doc1",
            "doc_format": "OTH",
            "file": "/media/documents/admin_at_admin_com-20170822160014.pdf",
            "license": "",
            "bibliographic_citation": "",
            "datasets": [
                {
                    "identifier": "Tidbits",
                    "title": "Tidbits"
                },
                {
                    "identifier": "170503 Hellen",
                    "title": "Aves do Brasil"
                },
                {
                    "identifier": "PeixesSulo",
                    "title": "Peixes do Sul do Brasil"
                },
                {
                    "identifier": "TesteCRT",
                    "title": "CRT"
                }
            ]
        }
    ],
    "facets": {
        "datasets": []
    }
}

Since you are filtering on a nested object, you will need to use a nested
query:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Since the fields of the datasets nested object are not analyzed (keyword),
the use of terms is correct, although a simple term query would work as
well. The query should look something like (not tested):

"post_filter": {
"nested": {
"path": "datasets",
"query": {
"terms": {
"datasets.identifier": [
"Tidbits"
]
}
}
}
}

There is a nested query on the "datasets" object and the terms query within
is on the datasets.identifier field.

Cheers,

Ivan

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.