What's the problem with these query?


(Fellipe Henrique) #1

Hello, I have these index:

{
    "my_collection_data": {
        "mappings": {
            "document_data_index": {
                "properties": {
                    "abstract": {
                        "type": "keyword"
                    },
                    "bibliographic_citation": {
                        "type": "keyword"
                    },
                    "contributor": {
                        "type": "keyword"
                    },
                    "datasets": {
                        "type": "nested",
                        "properties": {
                            "identifier": {
                                "type": "keyword"
                            },
                            "keywords": {
                                "type": "keyword"
                            },
                            "title": {
                                "type": "keyword"
                            }
                        }
                    },
                    "date_available": {
                        "type": "date"
                    },
                    "date_created": {
                        "type": "date"
                    },
                    "doc_format": {
                        "type": "keyword"
                    },
                    "file": {
                        "type": "keyword"
                    },
                    "identifier": {
                        "type": "keyword"
                    },
                    "license": {
                        "type": "keyword"
                    },
                    "subject": {
                        "type": "keyword"
                    },
                    "title": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}

I need to get all documents with datasets=XXXX, so I tried these query:

{
  "from": 0,
  "size": 10,
  "query": {
    "match_all": {
      "boost": 1
    }
  },
  "post_filter": {
    "terms": {
      "datasets": [
        "Tidbits"
      ],
      "boost": 1
    }
  },
  "_source": {
    "includes": [],
    "excludes": []
  },
  "aggregations": {
    "_filter_datasets": {
      "filter": {
        "match_all": {
          "boost": 1
        }
      },
      "aggregations": {
        "datasets": {
          "terms": {
            "field": "datasets",
            "size": 999,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_term": "asc"
              }
            ]
          }
        }
      }

    }
  },
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

It doesn't return any record...

What I'm doing wrong here?

regards


(Ivan Brusic) #2

Your post filter is executing a terms query on the "datasets" field, but
that field is defined as a nested field in your mapping. You perhaps mean
datasets.keywords, datasets.title, etc.. Besides correcting the field, you
would need to wrap the terms query in a nested query.


(Fellipe Henrique) #3

Hi @Ivan

Thanks for your reply, but I really doesn't understand... can you give me a example?

P.S.: These query is genereated using elasticsearch-dsl for python... maybe something is wrong when I build my query..


(Ivan Brusic) #4

If you provided an example document that should match (with the mapping
defined in the first post), perhaps we can help further.

Your filter is essentially:

"terms": {
  "datasets": [
    "Tidbits"
  ]
}

but your mapping defines the "datasets" fields as nested:

"datasets": {
"type": "nested",
"properties": {
...
}
}

Once again, example documents will help.

Cheers,

Ivan


(Fellipe Henrique) #5

Hi @Ivan
Here is the result with no filter:

{
    "title": [
        {
            "title": "doc1",
            "abstract": "doc1",
            "subject": "doc1",
            "contributor": "fellipe",
            "date_created": "2017-08-22T03:00:00Z",
            "date_available": "2017-08-22T03:00:00Z",
            "identifier": "doc1",
            "doc_format": "OTH",
            "file": "/media/documents/admin_at_admin_com-20170822160014.pdf",
            "license": "",
            "bibliographic_citation": "",
            "datasets": [
                {
                    "identifier": "Tidbits",
                    "title": "Tidbits"
                },
                {
                    "identifier": "170503 Hellen",
                    "title": "Aves do Brasil"
                },
                {
                    "identifier": "PeixesSulo",
                    "title": "Peixes do Sul do Brasil"
                },
                {
                    "identifier": "TesteCRT",
                    "title": "CRT"
                }
            ]
        }
    ],
    "facets": {
        "datasets": []
    }
}

(Ivan Brusic) #6

Since you are filtering on a nested object, you will need to use a nested
query:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html

Since the fields of the datasets nested object are not analyzed (keyword),
the use of terms is correct, although a simple term query would work as
well. The query should look something like (not tested):

"post_filter": {
"nested": {
"path": "datasets",
"query": {
"terms": {
"datasets.identifier": [
"Tidbits"
]
}
}
}
}

There is a nested query on the "datasets" object and the terms query within
is on the datasets.identifier field.

Cheers,

Ivan


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.