'dot' analyzer


(Rich Kroll) #1

I am indexing some log data, and just realized that the class/package names
are not being indexed. For example "com.java.util.List" could only be
searched using "list". Is there a way to modify the analyzer to tokenize
on the 'dot' as well as whitespace?

Regards,
Rich

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein


(Lukáš Vlček) #2

Hi,

this depends on the analyzer. By default the standard analyzer does not
break text into tokens on dot if it is not followed by a whitespace.

Take the following text as an example:
"one.two.three. four"

Using default analyzer:

curl -XGET '
http://localhost:9200/twitter/_analyze?text=one.two.three.+four&pretty=1'

{
"tokens" : [ {
"token" : "one.two.three",
"start_offset" : 0,
"end_offset" : 14,
"type" : "",
"position" : 1
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "",
"position" : 2
} ]
}

But if you use different analyzer (for example
'simplehttp://www.elasticsearch.com/docs/elasticsearch/index_modules/analysis/analyzer/simple/'
analyzer) then you can get what you want:

curl -XGET '
http://localhost:9200/twitter/_analyze?analyzer=simple&text=one.two.three.+four&pretty=1
'

{
"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "three",
"start_offset" : 8,
"end_offset" : 13,
"type" : "word",
"position" : 3
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 4
} ]
}

Regards,
Lukas

On Tue, Dec 7, 2010 at 12:57 AM, Rich Kroll kroll.rich@gmail.com wrote:

I am indexing some log data, and just realized that the class/package names
are not being indexed. For example "com.java.util.List" could only be
searched using "list". Is there a way to modify the analyzer to tokenize
on the 'dot' as well as whitespace?

Regards,
Rich

--
“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein


(system) #3