Elasticsearch support both case sensitive & insensitive

Setup: Elasticsearch 6.3

I have an index that represents the products catalog.

Every document contains one product's data.

One of the fields called categories which is an array of strings - List of relevant categories.

99.9% of the queries are: give me the products that match categories A, B and C. The query about is case insensitive, thus categories mapping looks like:

"categories": {
    "type": "keyword",
    "normalizer": "lowercase_normalizer"
}

For reporting (0.1% of all queries) I need to return a list of all possible categories case sensitive!

Consider the following documents:

"_id": "product1",
"_source": {
    "categories": [
        "WOMEN",
        "Footwear"
     ]
}

"_id": "product2",
"_source": {
    "categories": [
        "Men",
        "Footwear"
     ]
}

Running the following query:

{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "categories",
        "size": 100
      }
    }
  }
}

return:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 40453,
    "max_score": 0,
    "hits": [

    ]
  },
  "aggregations": { 
    "sterms#categories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 12453,
      "buckets": [
        {
          "key": "men",
          "doc_count": 27049
        },
        {
          "key": "women",
          "doc_count": 21332
        },
       .........
      ]
    }
  }
}

Is there a way to return the categories with their case sensitivity (as stored in the documents)? I'm interested in ["WOMEN", "Men"] in this query's result.

Thanks,
Itay

StackOverflow question

You can index the same field for multiple use cases:

  • One for search, using a text type or keyword with the lowercase normalizer you defined
  • One for aggregation, using a keyword datatype with no normalizer.

thanks @dadoonet. Not sure I understand.
Can you please show me how the mapping for this field should look like?

DELETE index
PUT index
{
  "mappings": {
    "properties": {
      "foo": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}
POST index/_doc
{
  "foo": "FEMALE"
}
GET index/_search
{
  "query": {
    "match": {
      "foo": "female"
    }
  },
  "aggs": {
    "foo": {
      "terms": {
        "field": "foo.keyword"
      }
    }
  }
}

gives:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "nzhywm8ByvWqfRN45-FK",
        "_score" : 0.2876821,
        "_source" : {
          "foo" : "FEMALE"
        }
      }
    ]
  },
  "aggregations" : {
    "foo" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "FEMALE",
          "doc_count" : 1
        }
      ]
    }
  }
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.