Facet on phrases

Hi

I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"

Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.

Pulkit Agrawal

Yea, tokenize the field into shingles:

Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.comwrote:

Hi

I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"

Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.

Pulkit Agrawal

Thanks Matt.

Can you please give me one example for implementation?
It would be very great.

Regards,
Pulkit Agrawal

On Wed, Sep 7, 2011 at 10:01 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.comwrote:

Hi

I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"

Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.

Pulkit Agrawal

Wouldn't the shingles need to be the same size?

On Wed, Sep 7, 2011 at 12:31 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:
Elasticsearch Platform — Find real-time answers at scale | Elastic
Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.com
wrote:

Hi
I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"
Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.
Pulkit Agrawal

Not sure what you mean by same size. If you have "a b c" and "a b d c" and
set a max_shingle_size = 4 and output_unigrams = true, we should get these
facets:

2 - a
2 - b
2 - c
2 - a b
1 - d
1 - b c
1 - b d
1 - d c
1 - a b c
1 - a b d
1 - b d c
1 - a b d c

Thanks,
Matt Weber

On Thu, Sep 8, 2011 at 3:09 PM, Ivan Brusic ivan@brusic.com wrote:

Wouldn't the shingles need to be the same size?

On Wed, Sep 7, 2011 at 12:31 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:

Elasticsearch Platform — Find real-time answers at scale | Elastic

Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.com
wrote:

Hi
I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz ghi
def"
Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.
Pulkit Agrawal

I guess the word "max" in "max_shingle_size" should have given me a
clue on how it operates! The example confused me since it did not list
the shingles with one term.

Need to play around with it, I might have a use for it as well.

--
Ivan

On Thu, Sep 8, 2011 at 8:11 PM, Matt Weber matt@mattweber.org wrote:

Not sure what you mean by same size. If you have "a b c" and "a b d c" and
set a max_shingle_size = 4 and output_unigrams = true, we should get these
facets:
2 - a
2 - b
2 - c
2 - a b
1 - d
1 - b c
1 - b d
1 - d c
1 - a b c
1 - a b d
1 - b d c
1 - a b d c
Thanks,
Matt Weber

On Thu, Sep 8, 2011 at 3:09 PM, Ivan Brusic ivan@brusic.com wrote:

Wouldn't the shingles need to be the same size?

On Wed, Sep 7, 2011 at 12:31 PM, Matt Weber matt@mattweber.org wrote:

Yea, tokenize the field into shingles:

Elasticsearch Platform — Find real-time answers at scale | Elastic
Thanks,
Matt Weber

On Wed, Sep 7, 2011 at 8:32 AM, Pulkit Agrawal pulkitdotcom@gmail.com
wrote:

Hi
I am looking for facet on phrases.
e.g: suppose I have field text with value "abc xyz def" and "abc xyz
ghi
def"
Now I need some help to get facet count like "abc xyz"- count:"2" ,
"abc"-count:"2", "xyz"- count:"2", "def" - count:"2".

Is it possible with elasticsearch? I think elasticsearch is a awesome
product there must be something to get this.

Thanks in advance.
Pulkit Agrawal