Custom analyzer not applied on property in query

Filip · May 10, 2012, 10:03am

Hi,

I'm using a custom Lucene analyzer, made available through a plugin.
In my index mapping file, I specified this analyzer using:

{
"article" : {
"_all" : {
"indexAnalyzer" : "foo_analyzer",
"searchAnalyzer" : "foo_analyzer"
}
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed"
}
"content" : {
"type" : "string",
"index" : "analyzed"
}
...
}

It turns out that the analyzer is used properly when searching over
the entire article, e.g.
http://localhost:9200/development_articles/_search?q=foo-bar

But the analyzer is not used when I search on a specific property, it
seems to use the default analyzer:
http://localhost:9200/development_articles/_search?q=someproperty:foo-bar

The goal of this analyzer is to make sure words concatenated with a
hyphen (foo-bar) are considered one word, and that the query doesn't
match on one part of the word only (foo or bar separately)

Any idea?

kimchy · May 13, 2012, 9:44am

You only specified your analyzer on the _all field, and not all the other
fields (title, content). If you want it to apply to all (string) fields,
you can simply name it "default".

On Thu, May 10, 2012 at 1:03 PM, Filip filip.neven@gmail.com wrote:

Hi,

I'm using a custom Lucene analyzer, made available through a plugin.
In my index mapping file, I specified this analyzer using:

{
"article" : {
"_all" : {
"indexAnalyzer" : "foo_analyzer",
"searchAnalyzer" : "foo_analyzer"
}
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed"
}
"content" : {
"type" : "string",
"index" : "analyzed"
}
...
}

It turns out that the analyzer is used properly when searching over
the entire article, e.g.
http://localhost:9200/development_articles/_search?q=foo-bar

But the analyzer is not used when I search on a specific property, it
seems to use the default analyzer:
http://localhost:9200/development_articles/_search?q=someproperty:foo-bar

The goal of this analyzer is to make sure words concatenated with a
hyphen (foo-bar) are considered one word, and that the query doesn't
match on one part of the word only (foo or bar separately)

Any idea?

Filip · May 14, 2012, 12:52pm

I tried specifying my analyzer on each field individually, but that
didn't work (same effect as with _all). I have also tried setting it
as the default analyzer in elasticsearch.yml:

index.analysis.analyzer.default.type:
my.elasticsearch.FooAnalyzerProvider

But then I get this NPE when recreating the index for each store:

java.lang.NullPointerException
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.reusableTokenStream(FieldNameAnalyzer.java:
60)
at
org.elasticsearch.common.lucene.all.AllTokenStream.allTokenStream(AllTokenStream.java:
38)
at
org.elasticsearch.common.lucene.all.AllField.tokenStreamValue(AllField.java:
64)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
111)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
276)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
766)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:
2060)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:
567)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:
479)
at
org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:
323)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
206)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:
532)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction
$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

On May 13, 11:44 am, Shay Banon kim...@gmail.com wrote:

You only specified your analyzer on the _all field, and not all the other
fields (title, content). If you want it to apply to all (string) fields,
you can simply name it "default".

On Thu, May 10, 2012 at 1:03 PM, Filip filip.ne...@gmail.com wrote:

Hi,

I'm using a custom Lucene analyzer, made available through a plugin.
In my index mapping file, I specified this analyzer using:

{
"article" : {
"_all" : {
"indexAnalyzer" : "foo_analyzer",
"searchAnalyzer" : "foo_analyzer"
}
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed"
}
"content" : {
"type" : "string",
"index" : "analyzed"
}
...
}

It turns out that the analyzer is used properly when searching over
the entire article, e.g.
http://localhost:9200/development_articles/_search?q=foo-bar

But the analyzer is not used when I search on a specific property, it
seems to use the default analyzer:
http://localhost:9200/development_articles/_search?q=someproperty:foo...

The goal of this analyzer is to make sure words concatenated with a
hyphen (foo-bar) are considered one word, and that the query doesn't
match on one part of the word only (foo or bar separately)

Any idea?

kimchy · May 15, 2012, 8:41pm

That NPE is strange..., is there a chance for a recreation?

On Mon, May 14, 2012 at 3:52 PM, Filip Neven filip.neven@gmail.com wrote:

I tried specifying my analyzer on each field individually, but that
didn't work (same effect as with _all). I have also tried setting it
as the default analyzer in elasticsearch.yml:

index.analysis.analyzer.default.type:
my.elasticsearch.FooAnalyzerProvider

But then I get this NPE when recreating the index for each store:

java.lang.NullPointerException
at

org.elasticsearch.index.analysis.FieldNameAnalyzer.reusableTokenStream(FieldNameAnalyzer.java:
60)
at

org.elasticsearch.common.lucene.all.AllTokenStream.allTokenStream(AllTokenStream.java:
38)
at

org.elasticsearch.common.lucene.all.AllField.tokenStreamValue(AllField.java:
64)
at

org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
111)
at

org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
276)
at

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
766)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:
2060)
at

org.elasticsearch.index.engine.robin.RobinEngine.innerIndex(RobinEngine.java:
567)
at
org.elasticsearch.index.engine.robin.RobinEngine.index(RobinEngine.java:
479)
at

org.elasticsearch.index.shard.service.InternalIndexShard.index(InternalIndexShard.java:
323)
at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
206)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:
532)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction
$1.run(TransportShardReplicationOperationAction.java:430)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

On May 13, 11:44 am, Shay Banon kim...@gmail.com wrote:

You only specified your analyzer on the _all field, and not all the other
fields (title, content). If you want it to apply to all (string) fields,
you can simply name it "default".

On Thu, May 10, 2012 at 1:03 PM, Filip filip.ne...@gmail.com wrote:

Hi,

I'm using a custom Lucene analyzer, made available through a plugin.
In my index mapping file, I specified this analyzer using:

{
"article" : {
"_all" : {
"indexAnalyzer" : "foo_analyzer",
"searchAnalyzer" : "foo_analyzer"
}
"properties" : {
"title" : {
"type" : "string",
"index" : "analyzed"
}
"content" : {
"type" : "string",
"index" : "analyzed"
}
...
}

It turns out that the analyzer is used properly when searching over
the entire article, e.g.
http://localhost:9200/development_articles/_search?q=foo-bar

But the analyzer is not used when I search on a specific property, it
seems to use the default analyzer:
http://localhost:9200/development_articles/_search?q=someproperty:foo.
..

The goal of this analyzer is to make sure words concatenated with a
hyphen (foo-bar) are considered one word, and that the query doesn't
match on one part of the word only (foo or bar separately)

Any idea?

Filip · May 16, 2012, 10:28am

Yes, I tried recreating the index, first destroy the current one using:

curl -X DELETE http://localhost:9200/_all

and then execute the program that indexes the articles, but I get the NPE
for each index-attempt.