Facet on phrases


(Kartavya) #1

Hi

I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"

Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.

Pulkit Agrawal


(Matt Weber) #2

Yea, tokenize the field into shingles:

http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html

Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.comwrote:

Hi

I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"

Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.

Pulkit Agrawal


(Kartavya) #3

Thanks Matt.

Can you please give me one example for implementation?
It would be very great.

Regards,
Pulkit Agrawal

On Wed, Sep 7, 2011 at 10:01 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:

http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html

Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.comwrote:

Hi

I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"

Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.

Pulkit Agrawal


(Ivan Brusic) #4

Wouldn't the shingles need to be the same size?

On Wed, Sep 7, 2011 at 12:31 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html
Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.com
wrote:

Hi
I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"
Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.
Pulkit Agrawal


(Matt Weber) #5

Not sure what you mean by same size. If you have "a b c" and "a b d c" and
set a max_shingle_size = 4 and output_unigrams = true, we should get these
facets:

2 - a
2 - b
2 - c
2 - a b
1 - d
1 - b c
1 - b d
1 - d c
1 - a b c
1 - a b d
1 - b d c
1 - a b d c

Thanks,
Matt Weber

On Thu, Sep 8, 2011 at 3:09 PM, Ivan Brusic ivan@brusic.com wrote:

Wouldn't the shingles need to be the same size?

On Wed, Sep 7, 2011 at 12:31 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:

http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html

Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.com
wrote:

Hi
I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"
Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.
Pulkit Agrawal


(Ivan Brusic) #6

I guess the word "max" in "max_shingle_size" should have given me a
clue on how it operates! The example confused me since it did not list
the shingles with one term.

Need to play around with it, I might have a use for it as well.

--
Ivan

On Thu, Sep 8, 2011 at 8:11 PM, Matt Weber matt@mattweber.org wrote:

Not sure what you mean by same size. If you have "a b c" and "a b d c" and
set a max_shingle_size = 4 and output_unigrams = true, we should get these
facets:
2 - a
2 - b
2 - c
2 - a b
1 - d
1 - b c
1 - b d
1 - d c
1 - a b c
1 - a b d
1 - b d c
1 - a b d c
Thanks,
Matt Weber

On Thu, Sep 8, 2011 at 3:09 PM, Ivan Brusic ivan@brusic.com wrote:

Wouldn't the shingles need to be the same size?

On Wed, Sep 7, 2011 at 12:31 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:

http://www.elasticsearch.org/guide/reference/index-modules/analysis/shingle-tokenfilter.html
Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.com
wrote:

Hi
I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz
ghi
def"
Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.
Pulkit Agrawal


(system) #7