Possible solution found Re: custom analyzer in ES

Sisu_Alexandru · November 24, 2011, 4:13pm

... but I'm not sure its the correct one.
After checking a bit, the AnalysisService class and the way the providers
and analysers are instantiated,
I made the method name() returning "default", and after deploy it worked.

@Override
public String name() {
System.out.println("name() called");
return "default";
}

Cheers,

Alex

On Tue, Nov 22, 2011 at 3:46 PM, alex sisu.eugen@gmail.com wrote:

Hi,

I would need some help with the setup of a custom written analyzer.
I want to provide a certain type of analyzing for my "posting"-type
documents.

Here's my setup:

[elasticsearch.yml]

index:
analysis:
analyzer:
default:
type:
com.buzzbuzz.elasticsearch.analysis.CustomAnalyzerAugmentedProvider

The custom analysis classes are being placed in lib:
customanalysis.zip

[CustomAnalyzerAugmentedProvider]
public class CustomAnalyzerAugmentedProvider extends
AbstractIndexAnalyzerProvider {
   private CustomAnalyzerAugmented analyzer;

   @Inject
   public CustomAnalyzerAugmentedProvider(Index index, @IndexSettings
Settings indexSettings,
@Assisted String name, @Assisted Settings settings)
{
super(index, indexSettings, name, settings);
System.out
.println("{CustomAnalyzerAugmentedProvider}
initialized");
this.analyzer = new
CustomAnalyzerAugmented(Version.LUCENE_31);
}
   @Override
   public CustomAnalyzerAugmented get() {
           System.out.println("get() called");
           return this.analyzer;

   }

   @Override
   public String name() {
           System.out.println("name() called");
           return "escustomanalyzer";
   }
}

[schema]
SCHEMA = {
   "posting":
           {
           "properties":
                   {
                   "impact_factor": {"type": "integer", "store":
"yes"},
"username": {"type": "string", "store":
"yes"},
"body": {"type": "string", "store": "yes"},
}
}
}

[settings]
SETTINGS = {
"index":{
"number_of_shards":1
}
}

[version of Elasticsearch] 0.18.4

I startup the elasticsearch. I can see the println() statements
displayed.
I'm trying to index some documents from python.
At the end of the mail you have the errors that I got.

Additional information:
I've tried to investigate the problem by debugging the code (!!the
latest version from git!!).
The problem occurs but at another method:
return getAnalyzer(fieldName).reusableTokenStream(fieldName, reader);
for field: "_all", the analyzers map contain no analyzer for all, and
what's interesting is that also defaultAnalyzer from the
FieldNameAnalyzer class is null.

The same problem occurs, when I'm not altering the elasticsearch.ym
file, but when I'm creating the index through rest calls,
The setting that I use is this one:
SETTINGS = {
"index":{
"number_of_shards":1,
"analysis" : {
"analyzer" : {
"default" : {
"type" :
"com.buzzbuzz.elasticsearch.analysis.CustomAnalyzerAugmentedProvider",
                                  }
                       }
}
}

Other questions:

I want to make it work, what's the problem? How can I solve it?

Is the configuration of elasticsearch.yml (+ writing providers
extending AbstractIndexAnalyzerProvider) the the only way of
providing a custom analysis?

I took a look at the ICU plugin. That example shows a way of using
a custom analysis. If I write a custom analysis plugin like ICU, what
the difference (performance, memory consumption, threads used) between
this method and extending AbstractIndexAnalyzerProvide?

[Error]

[2011-11-22 15:23:52,031][DEBUG][action.index ] [Mentus]
[testindex][0], node[sz3-xLAfQk61NHnuIKQbgA], [P], s[STARTED]: Failed
to execute [index {[testindex][posting][Gh4_DfM3Tt-aerr129kVOg],
source[{
"username":"Username_Alex",
"body":"some very important text to index",
"impact_factor":100,
}]}]
java.lang.NullPointerException
at

org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at

org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
196)
at

org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
278)
at

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
766)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:
2067)
at

org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:
460)
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:
353)
at

org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:
293)
at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
193)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:
487)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction
$1.run(TransportShardReplicationOperationAction.java:400)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

kimchy · November 24, 2011, 6:42pm

Or just don't override the name method in your custom analyzer provider.

On Thu, Nov 24, 2011 at 6:13 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

... but I'm not sure its the correct one.
After checking a bit, the AnalysisService class and the way the providers
and analysers are instantiated,
I made the method name() returning "default", and after deploy it worked.

@Override
public String name() {
System.out.println("name() called");
return "default";
}

Cheers,

Alex

On Tue, Nov 22, 2011 at 3:46 PM, alex sisu.eugen@gmail.com wrote:
Hi,

I would need some help with the setup of a custom written analyzer.
I want to provide a certain type of analyzing for my "posting"-type
documents.

Here's my setup:

[elasticsearch.yml]

index:
analysis:
analyzer:
default:
type:
com.buzzbuzz.elasticsearch.analysis.CustomAnalyzerAugmentedProvider

The custom analysis classes are being placed in lib:
customanalysis.zip

[CustomAnalyzerAugmentedProvider]
public class CustomAnalyzerAugmentedProvider extends
AbstractIndexAnalyzerProvider {
   private CustomAnalyzerAugmented analyzer;

   @Inject
   public CustomAnalyzerAugmentedProvider(Index index, @IndexSettings
Settings indexSettings,
@Assisted String name, @Assisted Settings
settings) {
super(index, indexSettings, name, settings);
System.out

.println("{CustomAnalyzerAugmentedProvider} initialized");
this.analyzer = new
CustomAnalyzerAugmented(Version.LUCENE_31);
}
   @Override
   public CustomAnalyzerAugmented get() {
           System.out.println("get() called");
           return this.analyzer;

   }

   @Override
   public String name() {
           System.out.println("name() called");
           return "escustomanalyzer";
   }
}

[schema]
SCHEMA = {
   "posting":
           {
           "properties":
                   {
                   "impact_factor": {"type": "integer", "store":
"yes"},
"username": {"type": "string", "store":
"yes"},
"body": {"type": "string", "store": "yes"},
}
}
}

[settings]
SETTINGS = {
"index":{
"number_of_shards":1
}
}

[version of Elasticsearch] 0.18.4

I startup the elasticsearch. I can see the println() statements
displayed.
I'm trying to index some documents from python.
At the end of the mail you have the errors that I got.

Additional information:
I've tried to investigate the problem by debugging the code (!!the
latest version from git!!).
The problem occurs but at another method:
return getAnalyzer(fieldName).reusableTokenStream(fieldName, reader);
for field: "_all", the analyzers map contain no analyzer for all, and
what's interesting is that also defaultAnalyzer from the
FieldNameAnalyzer class is null.

The same problem occurs, when I'm not altering the elasticsearch.ym
file, but when I'm creating the index through rest calls,
The setting that I use is this one:
SETTINGS = {
"index":{
"number_of_shards":1,
"analysis" : {
"analyzer" : {
"default" : {
"type" :
"com.buzzbuzz.elasticsearch.analysis.CustomAnalyzerAugmentedProvider",
                                  }
                       }
}
}

Other questions:

I want to make it work, what's the problem? How can I solve it?

Is the configuration of elasticsearch.yml (+ writing providers
extending AbstractIndexAnalyzerProvider) the the only way of
providing a custom analysis?

I took a look at the ICU plugin. That example shows a way of using
a custom analysis. If I write a custom analysis plugin like ICU, what
the difference (performance, memory consumption, threads used) between
this method and extending AbstractIndexAnalyzerProvide?

[Error]

[2011-11-22 15:23:52,031][DEBUG][action.index ] [Mentus]
[testindex][0], node[sz3-xLAfQk61NHnuIKQbgA], [P], s[STARTED]: Failed
to execute [index {[testindex][posting][Gh4_DfM3Tt-aerr129kVOg],
source[{
"username":"Username_Alex",
"body":"some very important text to index",
"impact_factor":100,
}]}]
java.lang.NullPointerException
at

org.elasticsearch.index.analysis.FieldNameAnalyzer.getOffsetGap(FieldNameAnalyzer.java:
66)
at

org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:
196)
at

org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:
278)
at

org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:
766)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:
2067)
at

org.elasticsearch.index.engine.robin.RobinEngine.innerCreate(RobinEngine.java:
460)
at
org.elasticsearch.index.engine.robin.RobinEngine.create(RobinEngine.java:
353)
at

org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:
293)
at

org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:
193)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction

$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:
487)
at

org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction
$AsyncShardOperationAction
$1.run(TransportShardReplicationOperationAction.java:400)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

Topic		Replies	Views
Custom analyzer in ES Elasticsearch	1	374	July 6, 2017
How to set the anlyzer? Elasticsearch	8	553	July 6, 2017
Adding analyzers Elasticsearch	4	531	July 6, 2017
Adding my own Analyzers Elasticsearch	1	269	July 6, 2017
Did custom analyzer configuration change in 0.9? Elasticsearch	2	280	July 6, 2017

Possible solution found Re: custom analyzer in ES

Related topics