Case-sensitive search

I'm trying to build an index for case-sensitive search. The index settings I'm using are at the bottom (including defining a custom "caseSensitive" analyzer).

If I perform a query like

GET /emails_text/_search
{
  "fields": [
    "text"
  ],
  "query": {
    "match": {
      "text": {
        "query": "begin",
        "analyzer": "caseSensitive"
      }
    }
  }
}

then it seems that the query itself is case sensitive (searching for capitalized terms gives no results), but the actual index seems to be all lowercased -- a lowercase query gives results for both lower and uppercase terms.

What am I doing wrong here? Am I querying wrong or should I be taking a different approach?

My index settings at creation time (ignore the stuff about shingles):

{
  "settings": {
    "analysis": {
      "filter": {
        "shingle_filter": {
          "type": "shingle",
          "max_shingle_size": 5
        }
      },
      "analyzer": {
        "shingles": {
          "tokenizer": "standard",
          "filter": [
            "shingle_filter"
          ]
        },
        "caseSensitive": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "stop"
          ]
        }
      }
    },
    "mappings": {
      "email": {
        "properties": {
          "sent": {
            "type": "date",
            "format": "epoch_millis"
          },
          "text": {
            "type": "string",
            "index_analyzer": "caseSensitive",
            "search_analyzer": "caseSensitive",
            "term_vector": "with_positions_offsets_payloads",
            "store": true,
            "fields": {
              "shingle": {
                "type": "string",
                "index_analyzer": "shingles",
                "search_analyzer": "caseSensitive"
              }
            }
          }
        }
      }
    }
  }
}

I pasted your settings to Sense and I was flabergasted for a while. Until I tried this

GET /test/_mapping

And the response was simply {}.

well... anyway it turns out that your "mappings" was nested under "settings". It needs to be a sibiling of settings, as in:

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "caseSensitive": {
          "type": "custom",
          "tokenizer": "whitespace"      
        }
      }
    }
  },
  "mappings": {
      "email": {
        "properties": {        
          "text": {
            "type": "string",
            "analyzer": "caseSensitive",
            "search_analyzer": "caseSensitive"          
          }
        }
      }
    }
}
2 Likes

Well, problem solved.

Definitely one of my dumber mistakes, although I wonder why elastic just accepts an error like that silently? Was I changing some other setting that didn't matter to this issue?

I've done this before. I agree it should be treated as an error. At least catch this specific common mistake.

Please feel free to raise an issue on github for this, then we can fix it! :slight_smile: