Elasticsearch find "%40" in email text field


I have several emails in our es that have %40 instead of the @ in them (e.g. john%40test.com). I'm trying to search directly for them using the regex search function, but am having issues in identifying for these cases. I've tried searching for both %40 and \u0025, but both of these haven't produced any hits. The following is the general search query I've using:

ES.search( "db", body={"query": {"regexp": {"emails": {"value": "\u0025"}}}})

Any help would be much appreciated,


this question cannot be answered with some more insight into your setup of the mapping of the field that stores the email address. By default it's like the following

GET _analyze
  "text": "john%40test.com"

# response
  "tokens" : [
      "token" : "john",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
      "token" : "40test.com",
      "start_offset" : 5,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 1

This means the percent sign is removed as part of analysis, making it really hard to impossible to search for it... If speed is not an issue, you could use a script filter or search the keyword field, but you should still fix the mapping.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.