Baseform plugin not working for me

vineeth_mohan_2 · March 12, 2014, 1:33pm

Hello ,

I have been trying to make the baseform plugin work , but its not working
for me.

I tried it with the _analyse API end point , but rather than giving both
variants of the word , its giving 2 repetition of the same word.

For eg:

curl -XGET
'localhost:9200/xyz/_analyze?tokenizer=letter&filters=baseform&pretty' -d
'sweltering'
{
"tokens" : [ {
"token" : "sweltering",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
}, {
"token" : "sweltering",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
} ]
}

Here i was expecting sweltering to be reduced to swelter but sweltering
has come twice and not the baseform.

I tried this on both 0.90 and 1+ version of elasticsearch and I am seeing
the same wrong output.

Is there anything wrong in how i have setup the plugin or is it an issue on
plugin side ?

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5m4S2UydYTEJeiAp75se%3DK1OB6RQvs1sZRnfaq6NmfGhA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jprante · March 12, 2014, 6:07pm

I assume you have correctly set english language as in the example.

The baseform plugin is based on training data for english language, it is
possible that sweltering is not recognized.

You can add missing words to the training data file in the plugin source

https://github.com/jprante/elasticsearch-analysis-baseform/blob/master/src/main/resources/en-lemma-utf8.txt

and recompile. Patches are welcome!

Jörg

On Wed, Mar 12, 2014 at 2:33 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello ,

I have been trying to make the baseform plugin work , but its not working
for me.

I tried it with the _analyse API end point , but rather than giving both
variants of the word , its giving 2 repetition of the same word.

For eg:

curl -XGET
'localhost:9200/xyz/_analyze?tokenizer=letter&filters=baseform&pretty' -d
'sweltering'
{
"tokens" : [ {
"token" : "sweltering",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
}, {
"token" : "sweltering",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
} ]
}

Here i was expecting sweltering to be reduced to swelter but sweltering
has come twice and not the baseform.

I tried this on both 0.90 and 1+ version of elasticsearch and I am seeing
the same wrong output.

Is there anything wrong in how i have setup the plugin or is it an issue
on plugin side ?

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGm-Ti2D1D6t_Hf6Z5Z8g04bGRChOOsznWyAHYFaf_4HQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

vineeth_mohan_2 · March 12, 2014, 6:49pm

Hello Joerg ,

I have taken an example from the txt fine you have pointed . I am still
seeing the same -
Kindly check

curl -XGET
'localhost:9200/relations/_analyze?tokenizer=letter&filters=baseform&pretty'
-d 'sweets'
{
"tokens" : [ {
"token" : "sweets",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "sweets",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}

Thanks
Vineeth

On Wed, Mar 12, 2014 at 11:37 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I assume you have correctly set english language as in the example.

The baseform plugin is based on training data for english language, it is
possible that sweltering is not recognized.

You can add missing words to the training data file in the plugin source

https://github.com/jprante/elasticsearch-analysis-baseform/blob/master/src/main/resources/en-lemma-utf8.txt

and recompile. Patches are welcome!

Jörg

On Wed, Mar 12, 2014 at 2:33 PM, vineeth mohan vm.vineethmohan@gmail.comwrote:

Hello ,

I have been trying to make the baseform plugin work , but its not working
for me.

I tried it with the _analyse API end point , but rather than giving both
variants of the word , its giving 2 repetition of the same word.

For eg:

curl -XGET
'localhost:9200/xyz/_analyze?tokenizer=letter&filters=baseform&pretty' -d
'sweltering'
{
"tokens" : [ {
"token" : "sweltering",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
}, {
"token" : "sweltering",
"start_offset" : 0,
"end_offset" : 10,
"type" : "word",
"position" : 1
} ]
}

Here i was expecting sweltering to be reduced to swelter but sweltering
has come twice and not the baseform.

I tried this on both 0.90 and 1+ version of elasticsearch and I am seeing
the same wrong output.

Is there anything wrong in how i have setup the plugin or is it an issue
on plugin side ?

Thanks
Vineeth

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3Dv5infdCUY49aNSZErqkQ-aYijXBq4ViVhD77K83igYA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
[ANN] Elasticsearch Analysis Baseform plugin 1.1.0 Elasticsearch	9	441	July 6, 2017
[Ann] Elasticsearch Analysis Baseform Plugin 1.0.0 Elasticsearch	3	481	July 6, 2017
Changing tokenizer from whitespace to standard Elasticsearch	4	2559	July 6, 2017
Stemmer is not working Elasticsearch	2	1286	July 5, 2017
Stempel Polish Analysis Plugin doesn't work Elasticsearch	1	676	July 5, 2017

Baseform plugin not working for me

Related topics