Is elasticsearch be able to index n-gram phrase?


(Lord Artois) #1

Hi, I am wondering if we can use elastic search to do multi-level indexing, for
example:
if I want to index text like: "Lookup by an arbitraty identifier in MSMQ"
Obviously, with the standard analyzer,
it could be tokenized like "Lookup", "by", "an", etc
But what if in addition to index each token (word), I also want to index
n-gram phrase,
like "Lookup by an", "by an arbitraty", "an arbitraty identifier", etc
I saw elasticsearch has some n-gram options, but looks like the n-gram is
made up by n char, "Loo", "ook", "oku", "kup", etc
which are not n-gram phrases made up by n tokens(word)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Adrien Grand) #2

Hi,

You should have a look at the shingle filter[1], which seems to do what you
are looking for.

[1]
http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter/

On Mon, Sep 9, 2013 at 3:39 PM, Lord Artois duketristan@gmail.com wrote:

Hi, I am wondering if we can use elastic search to do multi-level
indexing, for example:
if I want to index text like: "Lookup by an arbitraty identifier in MSMQ"
Obviously, with the standard analyzer,
it could be tokenized like "Lookup", "by", "an", etc
But what if in addition to index each token (word), I also want to index
n-gram phrase,
like "Lookup by an", "by an arbitraty", "an arbitraty identifier", etc
I saw elasticsearch has some n-gram options, but looks like the n-gram is
made up by n char, "Loo", "ook", "oku", "kup", etc
which are not n-gram phrases made up by n tokens(word)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Lord Artois) #3

Great, it looks like exacly what I am looking for.
Thank you Adrien

On Monday, September 9, 2013 9:47:24 PM UTC+8, Adrien Grand wrote:

Hi,

You should have a look at the shingle filter[1], which seems to do what
you are looking for.

[1]
http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter/

On Mon, Sep 9, 2013 at 3:39 PM, Lord Artois <duket...@gmail.com<javascript:>

wrote:

Hi, I am wondering if we can use elastic search to do multi-level
indexing, for example:
if I want to index text like: "Lookup by an arbitraty identifier in MSMQ"
Obviously, with the standard analyzer,
it could be tokenized like "Lookup", "by", "an", etc
But what if in addition to index each token (word), I also want to index
n-gram phrase,
like "Lookup by an", "by an arbitraty", "an arbitraty identifier", etc
I saw elasticsearch has some n-gram options, but looks like the n-gram
is made up by n char, "Loo", "ook", "oku", "kup", etc
which are not n-gram phrases made up by n tokens(word)

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4