Like query string

ashishtiwari1993 · September 26, 2018, 9:40am

Hi guys,

Specification:
Elasticsearch version : 6.2.2

I have data which having "emailaddress" field. Suppose my one of the value is "user@example.com". I might have search input like "@exam", "user@" , "user", ".com" etc.

I have tried to achieve this using query_string with ** pattern match. I got appropriate result but It seems, it is slow. I also cannot use uax_url_email tokenizer because user is free to give any input. I have used default standard tokenizer.

With this, suppose i have two email address "user@example.com" and "example@user.com". If i am searching with "user@", It is showing both result. Which is not my output.

Is there any specific tokenizer or method by which i can achieve my output. It should be same as like mysql's like query with %@user%.

After this i used edge_ngram tokenizer which gives me partially better output. Below is my analyzer:

"analysis":{
  "filter":{
	"emailtoken":{
	  "type":"edge_ngram",
	  "min_gram":1,
	  "max_gram":255,
	  "token_chars":[
	    "letter",
	    "digit",
	    "symbol",
	    "punctuation"
	  ]
	}
  },
  "analyzer":{
	"email":{
	  "type":"custom",
	  "tokenizer":"standard",
	  "filter":[
	    "lowercase",
	    "emailtoken"
	  ]
	 }
  }
 }
}

With this, Suppose i have email address like "name.sirname@example.com", So above anlyzer also got failed with the search of ".sirname" or "@example".

Thanks

fkelbert · September 26, 2018, 9:48am

Hi @ashishtiwari1993,

As you described, ** does work, but gives poor performance. This is due to how Elasticsearch indexes its documents internally.

Another approach is to index your strings multiple times (using fields) using different custom analyzers. In particular, you might want to look into the NGram tokenizer.

You might then also want to consider combining NGrams with a prefix query.

ashishtiwari1993 · September 26, 2018, 9:52am

@fkelbert Thanks for quick prompt. I have edit my question forgot to mention about edge_ngram tokenizer. In edge_ngram i am facing issue with special chars like ".", "@".

fkelbert · September 26, 2018, 9:55am

Hi @ashishtiwari1993, Consider using NGrams instead of Edge-NGrams .

fkelbert · September 26, 2018, 9:56am

Further to that, you probably want to use the keyword tokenizer, which does not actually tokenize the string (i.e., it is a noop and keeps the string as-is).

ashishtiwari1993 · September 26, 2018, 10:03am

@fkelbert Thats cool it wokrs Thanks for quick help. Just one doubt, I am having lots of records which contains email address. Now ngram tokenizer is going to apply on each email address. Would it be cause and delay in write performance ? Because i have multiple fields, On which want to implement the same method .

fkelbert · September 26, 2018, 10:07am

Great to hear

I'm afraid you'll need to experiment with performance. You can increase write throughput by configuring your index to use more primary shards.

Out of curiosity: how much faster are the ngrams compared to the ** method for you?

fkelbert · September 26, 2018, 10:12am

Another note: Think twice whether you actually need this kind of method for many of your fields. For search, the "more standard" analyzers do a pretty good job in 95% of the use cases.

ashishtiwari1993 · September 26, 2018, 10:15am

Right now i am using same **. Need to reindex the data to measure performance. But it will definitely faster than to regex, wildcard & **. I will share my output here . Thanks for the suggestion @fkelbert. Definitely i ll take care of use case.

system · October 24, 2018, 10:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Email Tokenizing Not Working? Elasticsearch	6	1239	July 6, 2017
Issue with Edge NGram Tokenizer in elastic search Elasticsearch	2	661	January 13, 2017
Search Email Address Elasticsearch	2	8773	July 5, 2017
Special Characters not indexed and hence not searchable Elasticsearch	9	2903	July 6, 2017
Tokenize email address Elasticsearch	5	2350	July 6, 2017

Like query *string*

Related topics

Like query string