TermFacet REVERSE_COUNT, above some minCount


(Ridvan Gyundogan) #1

Hello All,
I am experimenting with the TermFacet especially the REVERSE_COUNT
ordering.

This gives me at the top terms which are available in 1 document. Can
I somehow tell Elastic Search to return me terms which are available
in more than minCount documents?


(Shay Banon) #2

We can potentially add min count as an additional parameter, but, it will
need to be applied on the shard level to filter in / out docs computation.

On Wed, Aug 24, 2011 at 4:59 PM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Hello All,
I am experimenting with the TermFacet especially the REVERSE_COUNT
ordering.

This gives me at the top terms which are available in 1 document. Can
I somehow tell Elastic Search to return me terms which are available
in more than minCount documents?


(Ridvan Gyundogan) #3

Hi Shay,
it was not my intention to push this as a feature request. The
TermFacet is good we are just trying how far we can go with it.

I see that you have the script field and _index variable not sure
exactly how they work but is it possible to do something like:

        "field" : "tag",
            "size" : 10,
            "script" : "_index>5? true : false"
        }

What will be even more interesting for us is to know is it possible
for a TermFacet ordered by COUNT to specify the maximum percent of
documents it is contained in.
For example if I have 100 documents I want to see terms which are
available in less than 30% of the documents. I suppose this is not
very easy and I haven't seen this feature anywhere yet.

On Aug 24, 9:27 pm, Shay Banon kim...@gmail.com wrote:

We can potentially add min count as an additional parameter, but, it will
need to be applied on the shard level to filter in / out docs computation.

On Wed, Aug 24, 2011 at 4:59 PM, Ridvan Gyundogan ridva...@gmail.comwrote:

Hello All,
I am experimenting with the TermFacet especially the REVERSE_COUNT
ordering.

This gives me at the top terms which are available in 1 document. Can
I somehow tell Elastic Search to return me terms which are available
in more than minCount documents?


(Shay Banon) #4

Where did you see the _index notion? You can use the script to fetch a field
value and then check on it, for example: "doc['my_field'].value > 5 ? true :
false".

Anything that has a global state is tricky, 30% of a shard (and assuming
even distribution of missing terms, it might be ok) might be possible (need
to think about it).

On Wed, Aug 24, 2011 at 11:47 PM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Hi Shay,
it was not my intention to push this as a feature request. The
TermFacet is good we are just trying how far we can go with it.

I see that you have the script field and _index variable not sure
exactly how they work but is it possible to do something like:

       "field" : "tag",
           "size" : 10,
           "script" : "_index>5? true : false"
       }

What will be even more interesting for us is to know is it possible
for a TermFacet ordered by COUNT to specify the maximum percent of
documents it is contained in.
For example if I have 100 documents I want to see terms which are
available in less than 30% of the documents. I suppose this is not
very easy and I haven't seen this feature anywhere yet.

On Aug 24, 9:27 pm, Shay Banon kim...@gmail.com wrote:

We can potentially add min count as an additional parameter, but, it will
need to be applied on the shard level to filter in / out docs
computation.

On Wed, Aug 24, 2011 at 4:59 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hello All,
I am experimenting with the TermFacet especially the REVERSE_COUNT
ordering.

This gives me at the top terms which are available in 1 document. Can
I somehow tell Elastic Search to return me terms which are available
in more than minCount documents?


(Ridvan Gyundogan) #5

Where did you see the _index notion?
http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html
at the bottom.

On Aug 25, 1:51 am, Shay Banon kim...@gmail.com wrote:

Where did you see the _index notion? You can use the script to fetch a field
value and then check on it, for example: "doc['my_field'].value > 5 ? true :
false".

Anything that has a global state is tricky, 30% of a shard (and assuming
even distribution of missing terms, it might be ok) might be possible (need
to think about it).

On Wed, Aug 24, 2011 at 11:47 PM, Ridvan Gyundogan ridva...@gmail.comwrote:

Hi Shay,
it was not my intention to push this as a feature request. The
TermFacet is good we are just trying how far we can go with it.

I see that you have the script field and _index variable not sure
exactly how they work but is it possible to do something like:

       "field" : "tag",
           "size" : 10,
           "script" : "_index>5? true : false"
       }

What will be even more interesting for us is to know is it possible
for a TermFacet ordered by COUNT to specify the maximum percent of
documents it is contained in.
For example if I have 100 documents I want to see terms which are
available in less than 30% of the documents. I suppose this is not
very easy and I haven't seen this feature anywhere yet.

On Aug 24, 9:27 pm, Shay Banon kim...@gmail.com wrote:

We can potentially add min count as an additional parameter, but, it will
need to be applied on the shard level to filter in / out docs
computation.

On Wed, Aug 24, 2011 at 4:59 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hello All,
I am experimenting with the TermFacet especially the REVERSE_COUNT
ordering.

This gives me at the top terms which are available in 1 document. Can
I somehow tell Elastic Search to return me terms which are available
in more than minCount documents?


(Shay Banon) #6

Right, that means that you can specify the field name to facet on as _index,
which will then simply return counts per index (for multi index search).

On Thu, Aug 25, 2011 at 9:16 AM, Ridvan Gyundogan ridvansg@gmail.comwrote:

Where did you see the _index notion?

http://www.elasticsearch.org/guide/reference/api/search/facets/terms-facet.html
at the bottom.

On Aug 25, 1:51 am, Shay Banon kim...@gmail.com wrote:

Where did you see the _index notion? You can use the script to fetch a
field
value and then check on it, for example: "doc['my_field'].value > 5 ?
true :
false".

Anything that has a global state is tricky, 30% of a shard (and assuming
even distribution of missing terms, it might be ok) might be possible
(need
to think about it).

On Wed, Aug 24, 2011 at 11:47 PM, Ridvan Gyundogan <ridva...@gmail.com
wrote:

Hi Shay,
it was not my intention to push this as a feature request. The
TermFacet is good we are just trying how far we can go with it.

I see that you have the script field and _index variable not sure
exactly how they work but is it possible to do something like:

       "field" : "tag",
           "size" : 10,
           "script" : "_index>5? true : false"
       }

What will be even more interesting for us is to know is it possible
for a TermFacet ordered by COUNT to specify the maximum percent of
documents it is contained in.
For example if I have 100 documents I want to see terms which are
available in less than 30% of the documents. I suppose this is not
very easy and I haven't seen this feature anywhere yet.

On Aug 24, 9:27 pm, Shay Banon kim...@gmail.com wrote:

We can potentially add min count as an additional parameter, but, it
will

need to be applied on the shard level to filter in / out docs
computation.

On Wed, Aug 24, 2011 at 4:59 PM, Ridvan Gyundogan <
ridva...@gmail.com

wrote:

Hello All,
I am experimenting with the TermFacet especially the REVERSE_COUNT
ordering.

This gives me at the top terms which are available in 1 document.
Can

I somehow tell Elastic Search to return me terms which are
available

in more than minCount documents?


(system) #7