"." is behaving like special character in ES 2.3


#1

When doing a GET request to the following endpoint against all of our Elastic Search (ES) environments to see how the analyzers work:

http://[elastic search endpoint]/_analyze?analyzer=standard&text=eee.fe.Esddasdae.ds64.Cl

It returns 2 tokens:
"eee.fe.Esddasdae.ds64" and "Cl"

The very last ".Cl" is considered a different token than the rest. This should not happen. "." do not need to be escaped in ES as it's not considered to be a special character (which can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-query-string-query.html#_reserved_characters)

After playing around with this more, i've noticed that if there is a character that is a number before a "." that will cause the string to be tokenised at this point.

So running http://[elastic search endpoint]/_analyze?analyzer=standard&text=eee.fe.Esddasdae.ds.Cl

returns only 1 token which is: eee.fe.Esddasdae.ds.Cl

Why does this happen, and how can we avoid this?

Thank you


(David Pilato) #2

What is the expected result you would like to have?

eee.fe.Esddasdae.ds.Cl ?

If so, why not just using a type keyword instead?
If not, can you be more specific?


#3

Hi David,

What is the expected result you would like to have?

eee.fe.Esddasdae.ds.Cl ?

To give you a bit more context: I am simple trying to look up a document given a field-value pair in a query string query. It could be anything, but a real life example would be a url www.bloop.com, or a package name. com.bloop.bleep2.www

I've come across this undefined behavior when i search for fields with values in ES (which are stored as strings). It seem like, if that last character before a "." is a character that can also be a number, the dot is considered a token. This is causing documents to not return when they should.

If so, why not just using a type keyword instead?
If not, can you be more specific?

What do you mean by the type "keyword"? Do you mean a type of filter, or tokenizer, or an analyzer setting in a index?


(David Pilato) #4

I meant this: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/keyword.html


#5

I am using elastic search 2.3. Seems that this isn't available in 2.3.

Additionally, I would like to be able to perform wildcard searches on the string fields I'm searching for. Looks like keywords are only for exact matches which won't work.


(David Pilato) #6

Then use non analyzed sub fields.

Wildcard queries are not analyzed so that should work against non analyzed fields.
That said, wildcard queries are super slow. I'm not all for using them.


#7

I am running a query_string query, with a value that contains wildcards. (this is a requirement as I want to return results if anything the user enters is found within that field. All my other searches for strings that don't have a number next to a dot work without issues.

Would query_string query make a query use an analyzer? I understand that f there is no analyzer defined for my index, then default it would use standard. I know you can specify an analyzer in the query string query, but can you turn off the analyzer?


(David Pilato) #8

I'm sorry but I'm a bit lost.

Could you provide a full recreation script as described in

It will help to better understand what you are doing.
Please, try to keep the example as simple as possible.


#9

So this is an example of a type of query I want to perform:

I have defined in my in my index settings for my index:

{
	"settings": {
		"analysis": {
			"analyzer": {
				"my_analyzer": {
					"type": "custom",
					"tokenizer": "keyword"
				}
			}
		}
	},
	"index": {
		"analysis": {
			"analyzer": {
				"default_search": {
					"analyzer": "my_analyzer"
				},
				"default_index": {
					"filter": [
						"standard",
						"lowercase",
						"stop",
						"asciifolding"
					],
					"tokenizer": "whitespace"
				}
			}
		}
	}
}

when i perform this query directly against ES:

{
	"query": {
					"query_string": {
						"analyzer": "my_analyzer",
						"use_dis_max": "true",
						"default_operator": "AND",
						"query": "*www.discuss.elastic2.com*",
						"default_field": "somevaluethatexists"
					}
				
		}
	,
	"from": 0,
	"size": 10,
	"sort": {
		"creationDate": {
			"order": "desc"
		}
	}
}

this fails to return the documents i expect, however, "query": "www.discuss.elastic2", will return my document as well as any possible subset for the value of query. I have tested my analyser and it DOES NOT create two tokens for the example i have in my first post.

My 2.3 ES instance is hosted on AWS as a managed service. I have actually opened a ticket with AWS asking them for the information about the version of the OS and plugins, still waiting to hear back (which is why i havent created a bug in ES's github)


(David Pilato) #10

Could you provide a full recreation script?


(system) closed #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.