Determine if search term is a noun?


(Eric Greene) #1

Wondering if there is a way to determine if search terms are nouns.

Could some sort of dictionary list be put together and stored, then gives
some weight to items in this list?

Anyone ever done anything such as this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/77141403-c119-49c2-9952-ca98e60252b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

This process is easier (but still not easy) if you pre-process your data on
the client side at indexing time. You can mark your terms with their
respective Parts of Speech using a payload filter:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html

You can then access the payload via scripts and calculate a score
accordingly:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html#_term_positions_offsets_and_payloads

Using payloads is more common in Lucene, but I have never seen anyone do it
in Elasticsearch.

--
Ivan

On Wed, Aug 27, 2014 at 1:10 PM, Eric Greene ericdgreene@gmail.com wrote:

Wondering if there is a way to determine if search terms are nouns.

Could some sort of dictionary list be put together and stored, then gives
some weight to items in this list?

Anyone ever done anything such as this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/77141403-c119-49c2-9952-ca98e60252b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/77141403-c119-49c2-9952-ca98e60252b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDH2Eo2yw8FpHoqU-xLHKQKG%3D3j%3DEFVsVdxHkY9OVa-jg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

In my plugin

I also provide an english noun file

https://github.com/jprante/elasticsearch-analysis-baseform/blob/master/src/main/resources/baseform/en-nouns-lemma-utf8.txt?raw=true

so with a bit of modification of the FST in the analysis, you could create
a tagger for english nouns (or more precisely, for the words in the given
file).

Then, like Ivan said, with the delimited token filter and function score,
it would be possible to implement english noun boosting.

Jörg

On Wed, Aug 27, 2014 at 11:09 PM, Ivan Brusic ivan@brusic.com wrote:

This process is easier (but still not easy) if you pre-process your data
on the client side at indexing time. You can mark your terms with their
respective Parts of Speech using a payload filter:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-delimited-payload-tokenfilter.html

You can then access the payload via scripts and calculate a score
accordingly:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html#_term_positions_offsets_and_payloads

Using payloads is more common in Lucene, but I have never seen anyone do
it in Elasticsearch.

--
Ivan

On Wed, Aug 27, 2014 at 1:10 PM, Eric Greene ericdgreene@gmail.com
wrote:

Wondering if there is a way to determine if search terms are nouns.

Could some sort of dictionary list be put together and stored, then gives
some weight to items in this list?

Anyone ever done anything such as this?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/77141403-c119-49c2-9952-ca98e60252b0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/77141403-c119-49c2-9952-ca98e60252b0%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDH2Eo2yw8FpHoqU-xLHKQKG%3D3j%3DEFVsVdxHkY9OVa-jg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDH2Eo2yw8FpHoqU-xLHKQKG%3D3j%3DEFVsVdxHkY9OVa-jg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGKJgcxMpyf5RThhLOb1jK9JcS-vJB0jRjLxkXn1JqhiA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4