Terms facet, getting all the results?


(Adrian Gaudebert) #1

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...
}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...
}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(Kun Niu) #2

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert adrian.gaudeb...@gmail.com
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr


(Adrian Gaudebert) #3

I'm using Python. But the thing I want to know is if there is a "good" way
of doing that. Or to solve my problem.

I guess Python also have some method to get the max integer value, which I
could use if there is nothing better to do...

On Fri, Jun 10, 2011 at 3:31 PM, kun niu haoniukun@gmail.com wrote:

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert adrian.gaudeb...@gmail.com
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like
"size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the
point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't
matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is
that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every
result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would
solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(Shay Banon) #4

There isn't an option to say bring be all terms back, you can open an issue and it can be added. Though, you should be careful with responses that return a large amount of data.

On Saturday, June 11, 2011 at 1:37 AM, Adrian Gaudebert wrote:

I'm using Python. But the thing I want to know is if there is a "good" way of doing that. Or to solve my problem.

I guess Python also have some method to get the max integer value, which I could use if there is nothing better to do...

On Fri, Jun 10, 2011 at 3:31 PM, kun niu <haoniukun@gmail.com (mailto:haoniukun@gmail.com)> wrote:

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert <adrian.gaudeb...@gmail.com (mailto:adrian.gaudeb...@gmail.com)>
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(David Pilato) #5

Is there any way to get the total number of all other results ?

Such as : (let's say I want only the Top 2 So I set the size of the facet to 2)
Windows : 5
Linux : 3
Others : 10

The firsts two are the main elements.
The last one is the others elements.

Number total of facets = 5+3+10

thanks

David

Le 12 juin 2011 à 09:14, Shay Banon shay.banon@elasticsearch.com a écrit :

There isn't an option to say bring be all terms back, you can open an issue and it can be added. Though, you should be careful with responses that return a large amount of data.
On Saturday, June 11, 2011 at 1:37 AM, Adrian Gaudebert wrote:

I'm using Python. But the thing I want to know is if there is a "good" way of doing that. Or to solve my problem.

I guess Python also have some method to get the max integer value, which I could use if there is nothing better to do...

On Fri, Jun 10, 2011 at 3:31 PM, kun niu haoniukun@gmail.com wrote:

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert adrian.gaudeb...@gmail.com
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(Shay Banon) #6

Not sure if I understood what exactly you are after, but you can have "count" facets using the query or filter facets.

On Sunday, June 12, 2011 at 1:12 PM, David Pilato wrote:

Is there any way to get the total number of all other results ?

Such as : (let's say I want only the Top 2 So I set the size of the facet to 2)
Windows : 5
Linux : 3
Others : 10

The firsts two are the main elements.
The last one is the others elements.

Number total of facets = 5+3+10

thanks

David

Le 12 juin 2011 à 09:14, Shay Banon <shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> a écrit :

There isn't an option to say bring be all terms back, you can open an issue and it can be added. Though, you should be careful with responses that return a large amount of data.

On Saturday, June 11, 2011 at 1:37 AM, Adrian Gaudebert wrote:

I'm using Python. But the thing I want to know is if there is a "good" way of doing that. Or to solve my problem.

I guess Python also have some method to get the max integer value, which I could use if there is nothing better to do...

On Fri, Jun 10, 2011 at 3:31 PM, kun niu <haoniukun@gmail.com (mailto:haoniukun@gmail.com)> wrote:

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert <adrian.gaudeb...@gmail.com (mailto:adrian.gaudeb...@gmail.com)>
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(David Pilato) #7

Sorry for my english !!! :frowning:

I have docs with terms like term1, term2, term3, term4, term5, term6.

Term1 is used once
Term2 twice,
Term3 : 3 Times
Term4 : 4 Times
Term5 : 5 Times
Term6 : 6 Times
Term7 : 7 Times

When i use a term facet, i get :
Term3 : 3
Term4 : 4
Term5 : 5
Term6 : 6
Term7 : 7

What i would like to have in the same facet results :
Term3 : 3
Term4 : 4
Term5 : 5
Term6 : 6
Term7 : 7
Others : 3

Others is the sum of term1 count (1) and term2 count (2).

If it still not clear, i Will provide a full user story or test case in the next days.

Thanks
David :wink:

Le 12 juin 2011 à 13:10, Shay Banon shay.banon@elasticsearch.com a écrit :

Not sure if I understood what exactly you are after, but you can have "count" facets using the query or filter facets.
On Sunday, June 12, 2011 at 1:12 PM, David Pilato wrote:

Is there any way to get the total number of all other results ?

Such as : (let's say I want only the Top 2 So I set the size of the facet to 2)
Windows : 5
Linux : 3
Others : 10

The firsts two are the main elements.
The last one is the others elements.

Number total of facets = 5+3+10

thanks

David

Le 12 juin 2011 à 09:14, Shay Banon shay.banon@elasticsearch.com a écrit :

There isn't an option to say bring be all terms back, you can open an issue and it can be added. Though, you should be careful with responses that return a large amount of data.
On Saturday, June 11, 2011 at 1:37 AM, Adrian Gaudebert wrote:

I'm using Python. But the thing I want to know is if there is a "good" way of doing that. Or to solve my problem.

I guess Python also have some method to get the max integer value, which I could use if there is nothing better to do...

On Fri, Jun 10, 2011 at 3:31 PM, kun niu haoniukun@gmail.com wrote:

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert adrian.gaudeb...@gmail.com
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(Shay Banon) #8

Ahh, I see. Make sense, can you open an issue for this?

On Sunday, June 12, 2011 at 3:04 PM, David Pilato wrote:

Sorry for my english !!! :frowning:

I have docs with terms like term1, term2, term3, term4, term5, term6.

Term1 is used once
Term2 twice,
Term3 : 3 Times
Term4 : 4 Times
Term5 : 5 Times
Term6 : 6 Times
Term7 : 7 Times

When i use a term facet, i get :
Term3 : 3
Term4 : 4
Term5 : 5
Term6 : 6
Term7 : 7

What i would like to have in the same facet results :
Term3 : 3
Term4 : 4
Term5 : 5
Term6 : 6
Term7 : 7
Others : 3

Others is the sum of term1 count (1) and term2 count (2).

If it still not clear, i Will provide a full user story or test case in the next days.

Thanks
David :wink:

Le 12 juin 2011 à 13:10, Shay Banon <shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> a écrit :

Not sure if I understood what exactly you are after, but you can have "count" facets using the query or filter facets.

On Sunday, June 12, 2011 at 1:12 PM, David Pilato wrote:

Is there any way to get the total number of all other results ?

Such as : (let's say I want only the Top 2 So I set the size of the facet to 2)
Windows : 5
Linux : 3
Others : 10

The firsts two are the main elements.
The last one is the others elements.

Number total of facets = 5+3+10

thanks

David

Le 12 juin 2011 à 09:14, Shay Banon <shay.banon@elasticsearch.com (mailto:shay.banon@elasticsearch.com)> a écrit :

There isn't an option to say bring be all terms back, you can open an issue and it can be added. Though, you should be careful with responses that return a large amount of data.

On Saturday, June 11, 2011 at 1:37 AM, Adrian Gaudebert wrote:

I'm using Python. But the thing I want to know is if there is a "good" way of doing that. Or to solve my problem.

I guess Python also have some method to get the max integer value, which I could use if there is nothing better to do...

On Fri, Jun 10, 2011 at 3:31 PM, kun niu <haoniukun@gmail.com (mailto:haoniukun@gmail.com)> wrote:

Does setSize(int size) work for you?
It's a method of SearchRequestBuilder if you're using Java.

On Jun 10, 3:24 pm, Adrian Gaudebert <adrian.gaudeb...@gmail.com (mailto:adrian.gaudeb...@gmail.com)>
wrote:

Hi there!

Long story short:

Is there a way to retrieve every result of a terms facet? Something like "size"
: "_all"? Or do I have to trick and use "size" : 9999999?

Long story long:

I have a bunch of documents that look like this:
{
"signature" : "something",
"os" : "linux" // or "windows" or "mac"...

}

The signature field is not analyzed. There are other fields that don't
matter here. Documents can share the same signature, and that is the point.

What I am trying to do is getting the distinct signatures, and for each
signature the total count and the count by OS. My solution so far is to
query (I actually search things in the other fields, but it doesn't matter
here) a bunch of documents, and apply terms facets.

Here is what my JSON query looks like:

{
"size" : 0,
"query" : {
"match_all" : {}
},
"facets" : {
"sign" : {
"terms" : {
"field" : "signature"
}
},
"sign_win" : {
"terms" : {
"field" : "signature"
},
"facet_filter" : {
"term" : {
"os_name" : "windows"
}
}
},
...

}

This works. This gives me almost the data I want. The last problem is that I
need to get only a part of this big set of data. I want to be able to do
something similar to a "size" / "from" but for the facet. And if, as I
believe, it is not possible with ES, I would like to retrieve every result
of the facets, and then apply the size / from in my code.

Am I doing it wrong? Is there something I don't know about that would solve
my problem? What is the Ultimate Question of Life, the Universe, and
Everything?

Thanks for your help!

--
Adrian Gaudebert
WebDev Intern @ Mozillahttp://adrian.gaudebert.fr

--
Adrian Gaudebert
WebDev Intern @ Mozilla
http://adrian.gaudebert.fr


(David Pilato) #9

Here we go : https://github.com/elasticsearch/elasticsearch/issues/1029

BTW, it seems that I found an issue while creating the test case.
I will open a new thread with a gist for that.

Thanks


(system) #10