Query with terms does not work properly


(Andreas Jung) #1

I have this document in Elasticsearch (1.6)

{
"_index": "onkopedia",
"_type": "document_",
"_id": "0afa26afc2d1440a8ed03dac0e8511fc",
"_version": 1,
"_score": null,
"_source": {
"description": "",
"contributors": [ ],
"metaTypeName": "Connector",
"sortableTitle": "mammakarzinom der frau",
"subject": [ ],
"authorizedUsers": [
"Anonymous"
],
"language": "",
"title": "Mammakarzinom der Frau",
"url": "http://dev1.veit-schiele.de:9080/onkopedia/de/onkopedia/guidelines/mammakarzinom-der-frau",
"author": "ajung",
"modified": "2015-05-11T05:21:14",
"metaType": "xmldirector.plonecore.connector",
"content": " Mammakarzinom der Frau Stand: Januar 2013 Autoren der aktuellen .....",
"authorName": "ajung",
"created": "2015-05-11T05:21:14",
"review_state": "published"
},
"sort": [
null
]
}

containing a key

'authorizedUsers': ['Anonymous']

The following query is supposed to return the document above however it does not:

 {
  "sort": [
    "_score"
  ], 
  "from": 0, 
  "fields": [
    "url", 
    "title", 
    "description", 
    "metaType", 
    "metaTypeName", 
    "author", 
    "authorName", 
    "contributors", 
    "modified", 
    "subject", 
    "review_state", 
    "language", 
    "content"
  ], 
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "terms": {
              "execution": "or", 
              "metaType": [
                "Document", 
                "FormFolder", 
                "Collection", 
                "Discussion Item", 
                "News Item", 
                "xmldirector.plonecore.connector", 
                "CaptchaField"
              ]
            }
          }, 
          {
            "terms": {
              "execution": "or", 
              "authorizedUsers": [
                "Manager", 
                "Authenticated", 
                "Anonymous", 
                "user:ajung"
              ]
            }
          }
        ]
      }, 
      "query": {
        "query_string": {
          "query": "mammakarzinom", 
          "default_operator": "AND", 
          "fields": [
            "title^3", 
            "contributors^2", 
            "subject^2", 
            "description", 
            "content"
          ]
        }
      }
    }
  }, 
  "highlight": {
    "fields": {
      "content": {
        "fragment_size": 250, 
        "number_of_fragments": 3
      }, 
      "description": {
        "fragment_size": 250, 
        "number_of_fragments": 2
      }, 
      "title": {
        "number_of_fragments": 0
      }
    }
  }, 
  "size": 15
}

The query without the filter for 'authorizedUsers' does return the document. Why? 'Anonymous' as value for 'authorizedUsers' is available within the query, so I would expect that the document would be found by the first query, or?

{
  "sort": [
    "_score"
  ], 
  "from": 0, 
  "fields": [
    "url", 
    "title", 
    "description", 
    "metaType", 
    "metaTypeName", 
    "author", 
    "authorName", 
    "contributors", 
    "modified", 
    "subject", 
    "review_state", 
    "language", 
    "content"
  ], 
  "query": {
    "filtered": {
      "filter": {
        "and": [
          {
            "terms": {
              "execution": "or", 
              "metaType": [
                "Document", 
                "FormFolder", 
                "Collection", 
                "Discussion Item", 
                "News Item", 
                "xmldirector.plonecore.connector", 
                "CaptchaField"
              ]
            }
          }

Both subqueries for 'authorizedUsers' and 'metaType' must be ANDed. 'authorizedUsers' itself is a multi-value field.


(Igor Motov) #2

The most likely reason for this issue is that the field authorizedUsers is mapped as analyzed with an analyzer that lowercases the value of this field. So, during indexing the value Anonymous becomes anonymous. The terms filter doesn't analyze the terms, so it is searching for the term Anonymous (with upper case A) which doesn't exist in the index. Judging from the usage the fields authorizedUsers and metaType should be reindexed as not_analyzed fields. I would recommend checking the Mapping and Analysis section of The Definitive Guide for more details.


(Andreas Jung) #3

Not exactly right but your thoughts brought me on the right track.

What was happening here:

  • I have some document_ mappings for holding documents by language
  • some documents did not have a language property and got indexed into 'document_' which
    had no predefinied mapping definition...so ES defaults applied
  • all documents indexed under document_ could not be found properly

(system) #4