Custom analyzer in ES


(Sisu Alexandru) #1

Hi,

I would need some help with the setup of a custom written analyzer.
I want to provide a certain type of analyzing for my "posting"-type
documents.

Here's my setup:

[elasticsearch.yml]

index:
analysis:
analyzer:
default:
type:
com.buzzbuzz.elasticsearch.analysis.CustomAnalyzerAugmentedProvider

The custom analysis classes are being placed in lib:
customanalysis.zip

[CustomAnalyzerAugmentedProvider]
public class CustomAnalyzerAugmentedProvider extends
AbstractIndexAnalyzerProvider {

private CustomAnalyzerAugmented analyzer;

@Inject
public CustomAnalyzerAugmentedProvider(Index index, @IndexSettings

Settings indexSettings,
@Assisted String name, @Assisted Settings settings) {
super(index, indexSettings, name, settings);
System.out
.println("{CustomAnalyzerAugmentedProvider} initialized");
this.analyzer = new CustomAnalyzerAugmented(Version.LUCENE_31);
}

@Override
public CustomAnalyzerAugmented get() {
	System.out.println("get() called");
	return this.analyzer;

}

@Override
public String name() {
	System.out.println("name() called");
	return "escustomanalyzer";
}

}

[schema]
SCHEMA = {

    "posting":
            {
            "properties":
                    {
                    "impact_factor": {"type": "integer", "store":

"yes"},
"username": {"type": "string", "store":
"yes"},
"body": {"type": "string", "store": "yes"},
}
}
}

[settings]
SETTINGS = {
"index":{
"number_of_shards":1
}
}

[version of elastic search] 0.18.4

I startup the elasticsearch. I can see the println() statements
displayed.
I'm trying to index some documents from python.
At the end of the mail you have the errors that I got.

Additional information:
I've tried to investigate the problem by debugging the code (!!the
latest version from git!!).
The problem occurs but at another method:
return getAnalyzer(fieldName).reusableTokenStream(fieldName, reader);
for field: "_all", the analyzers map contain no analyzer for all, and
what's interesting is that also defaultAnalyzer from the
FieldNameAnalyzer class is null.

The same problem occurs, when I'm not altering the elasticsearch.ym
file, but when I'm creating the index through rest calls,
The setting that I use is this one:
SETTINGS = {
"index":{
"number_of_shards":1,
"analysis" : {
"analyzer" : {
"default" : {
"type" :
"com.buzzbuzz.elasticsearch.analysis.CustomAnalyzerAugmentedProvider",

                                   }
                        }
}

}

Other questions:

  1. I want to make it work, what's the problem? How can I solve it?

  2. Is the configuration of elasticsearch.yml (+ writing providers
    extending AbstractIndexAnalyzerProvider) the the only way of
    providing a custom analysis?

  3. I took a look at the ICU plugin. That example shows a way of using
    a custom analysis. If I write a custom analysis plugin like ICU, what
    the difference (performance, memory consumption, threads used) between
    this method and extending AbstractIndexAnalyzerProvide?

[Error]

[2011-11-22 15:23:52,031][DEBUG][action.index ] [Mentus]
[testindex][0], node[sz3-xLAfQk61NHnuIKQbgA], [P], s[STARTED]: Failed
to execute [index {[testindex][posting][Gh4_DfM3Tt-aerr129kVOg],
source[{
"username":"Username_Alex",
"body":"some very important text to index",
"impact_factor":100,
}]}]
java.lang.NullPointerException
at
org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
196)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
278)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
766)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:
2067)
at
org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:
460)
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:
353)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:
293)
at
org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
193)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:
487)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction
$1.run(TransportShardReplicationOperationAction.java:400)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)


(system) #2