Indexing of empty arrays affecting search results

Hi all,

We are using Elasticsearch for searching content on our website which is primarily musician names.

The primary search field is "name" but we also have a number of secondary index fields included in the search such as an "aliases" field for matching the musician's stage name rather than their real name. Since a musician can have several aliases, this stored as an array e.g. Sean Combs has aliases Puff Daddy and P. Diddy.

Note that "aliases" is part of the "secondary_index_field" object.

A problem arises if a document is indexed where the "name" fields has a full stop e.g. "Portugal. The Man" and the aliases field is an empty array as per the example below.

{
  "_index": "jaxsta-20211124-231212",
  "_type": "_doc",
  "_id": "d0140c45-3122-4934-9d83-f4429c048069",
  "_version": 2,
  "_score": 0,
  "_source": {
    "jaxsta_uuid": "d0140c45-3122-4934-9d83-f4429c048069",
    "name": "Portugal. The Man",
    "secondary_index_field": {
      "entity_aliases": [],
      "identifiers": [
        "167649475",
        "4kI8Ie27vjvonwaB2ePh8T"
      ]
    }
  }
}

When I use the following search query, the document above is not returned.

GET jaxsta/_search
{
  "size": 2000,
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "Portugal. The Man",
          "fields": [
            "name",
            "secondary_index_field.entity_aliases",
            "secondary_index_field.identifiers"
          ],
          "type": "cross_fields"
        }
      }
    }
  }
}

But if I remove "secondary_index_field.entity_aliases" from the fields list, the correct document is returned in the search result.

In terms of mapping, we are using the following for these fields:

"name" : {
  "type" : "text",
  "fields" : {
	"keyword" : {
	  "type" : "keyword",
	  "ignore_above" : 256
	}
  }
},
"secondary_index_field" : {
  "properties" : {
	"entity_aliases" : {
	  "type" : "text",
	  "fields" : {
		"keyword" : {
		  "type" : "keyword",
		  "ignore_above" : 256
		}
	  }
	}
  }
}

Can anyone suggest what we are doing wrong?

Thanks,

Michael Stone

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.