'dot' analyzer

I am indexing some log data, and just realized that the class/package names
are not being indexed. For example "com.java.util.List" could only be
searched using "list". Is there a way to modify the analyzer to tokenize
on the 'dot' as well as whitespace?


“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein


this depends on the analyzer. By default the standard analyzer does not
break text into tokens on dot if it is not followed by a whitespace.

Take the following text as an example:
"one.two.three. four"

Using default analyzer:

curl -XGET '

"tokens" : [ {
"token" : "one.two.three",
"start_offset" : 0,
"end_offset" : 14,
"type" : "",
"position" : 1
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "",
"position" : 2
} ]

But if you use different analyzer (for example
analyzer) then you can get what you want:

curl -XGET '

"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 2
}, {
"token" : "three",
"start_offset" : 8,
"end_offset" : 13,
"type" : "word",
"position" : 3
}, {
"token" : "four",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 4
} ]


On Tue, Dec 7, 2010 at 12:57 AM, Rich Kroll kroll.rich@gmail.com wrote:

I am indexing some log data, and just realized that the class/package names
are not being indexed. For example "com.java.util.List" could only be
searched using "list". Is there a way to modify the analyzer to tokenize
on the 'dot' as well as whitespace?


“We can't solve problems by using the same kind of thinking we used when we
created them.” ~ Albert Einstein