Position in facet


(Mats Norén) #1

Hi,
I'm trying to use date_histogram to calculate a users rank/position based on a script value. Somehow I sense there is a solution but I don't quite seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users position in the total standings based on his average points per game.

To obtain a users position for a specific day I used search_type=count and did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(Shay Banon) #2

I am not sure I followed the question properly..., but you can combine
several filters using and/or filters, is that what you are after? Btw, for
perf reasons, I would index the points / games value as another field, and
use a range filter on it, will be much faster...

On Sun, Jul 24, 2011 at 4:20 PM, Mats Norén mats.noren@gmail.com wrote:

Hi,
I'm trying to use date_histogram to calculate a users rank/position based
on a script value. Somehow I sense there is a solution but I don't quite
seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users position in
the total standings based on his average points per game.

To obtain a users position for a specific day I used search_type=count and
did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(Mats Norén) #3

Hi, Shay
Not really, I would like use a count with a filter/query to get a
users position for each interval in a date_histogram.

So for instance if I have for each day in a month an average score of
3400 points and there two other users with an average of 3500 and 3300
I would like to get a position/rank 2 out of a total of 3 back from a
date_histogram facet.

As I tried to describe in my first query, I've used search_type=count
to get a users position for a certain day:

http://localhost:9200/games/measure/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and" : [
{
"term" : {
"date" : "2011-07-01"
}
},
{
"script": {
"script": "(doc['points'].value / doc['games'].value) <
1000" <--- Users average score for the day! A separate query for
each day
}
}
]
}
}
}
}

Basically what I'm after is to be able to use some kind of facet that
for each date (or interval if that's possible) calculates the users
position/rank based on a field or a script.

Pseudocode

"date_histogram":{
"key_field":"date",
"value_script":"doc['points'].value / doc['games'].value",
"interval":"day",
"count_query": {
"term": { "user_id": 1 },
"operator": "gt"
}
},

Where the count_query above appends/expands the key_field(date in a
date_histogram) for each date_histogram as an and-clause to the
count_query and the field used for the comparison is the same as the
value_script in the date_histogram.

The result would be something like:

points_per_game: {

{ _type: date_histogram,
    entries: [
    {
            time: 1122336000000
            count: 3
            position: 2 <--- New field
            min: 1149.076923076923
            max: 1467.8333333333333
            total: 3865.710256410256
            total_count: 3
            mean: 1288.5700854700854
        }
      ,{
            time: 1122422400000
            count: 3
            position: 1
            min: 1039.8181818181818
            max: 1700
            total: 3934.818181818182
            total_count: 3
            mean: 1311.6060606060607
        }
     ]

Is this possible to achieve in another way?

/M

On Mon, Jul 25, 2011 at 2:11 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

I am not sure I followed the question properly..., but you can combine
several filters using and/or filters, is that what you are after? Btw, for
perf reasons, I would index the points / games value as another field, and
use a range filter on it, will be much faster...

On Sun, Jul 24, 2011 at 4:20 PM, Mats Norén mats.noren@gmail.com wrote:

Hi,
I'm trying to use date_histogram to calculate a users rank/position based
on a script value. Somehow I sense there is a solution but I don't quite
seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users position in
the total standings based on his average points per game.

To obtain a users position for a specific day I used search_type=count and
did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same
date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(Shay Banon) #4

So you want a "two level" histogram, first broken down by date, and then by
position? You can't do that, you will need to use something like terms facet
and create a facet per date range (with the date range as a filter for the
facet).

On Mon, Jul 25, 2011 at 4:36 PM, Mats Norén mats.noren@gmail.com wrote:

Hi, Shay
Not really, I would like use a count with a filter/query to get a
users position for each interval in a date_histogram.

So for instance if I have for each day in a month an average score of
3400 points and there two other users with an average of 3500 and 3300
I would like to get a position/rank 2 out of a total of 3 back from a
date_histogram facet.

As I tried to describe in my first query, I've used search_type=count
to get a users position for a certain day:

http://localhost:9200/games/measure/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and" : [
{
"term" : {
"date" : "2011-07-01"
}
},
{
"script": {
"script": "(doc['points'].value /
doc['games'].value) <
1000" <--- Users average score for the day! A separate query for
each day
}
}
]
}
}
}
}

Basically what I'm after is to be able to use some kind of facet that
for each date (or interval if that's possible) calculates the users
position/rank based on a field or a script.

Pseudocode

"date_histogram":{
"key_field":"date",
"value_script":"doc['points'].value / doc['games'].value",
"interval":"day",
"count_query": {
"term": { "user_id": 1 },
"operator": "gt"
}
},

Where the count_query above appends/expands the key_field(date in a
date_histogram) for each date_histogram as an and-clause to the
count_query and the field used for the comparison is the same as the
value_script in the date_histogram.

The result would be something like:

points_per_game: {

{ _type: date_histogram,
entries: [
{
time: 1122336000000
count: 3
position: 2 <--- New field
min: 1149.076923076923
max: 1467.8333333333333
total: 3865.710256410256
total_count: 3
mean: 1288.5700854700854
}
,{
time: 1122422400000
count: 3
position: 1
min: 1039.8181818181818
max: 1700
total: 3934.818181818182
total_count: 3
mean: 1311.6060606060607
}
]

Is this possible to achieve in another way?

/M

On Mon, Jul 25, 2011 at 2:11 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

I am not sure I followed the question properly..., but you can combine
several filters using and/or filters, is that what you are after? Btw,
for
perf reasons, I would index the points / games value as another field,
and
use a range filter on it, will be much faster...

On Sun, Jul 24, 2011 at 4:20 PM, Mats Norén mats.noren@gmail.com
wrote:

Hi,
I'm trying to use date_histogram to calculate a users rank/position
based

on a script value. Somehow I sense there is a solution but I don't quite
seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users position
in

the total standings based on his average points per game.

To obtain a users position for a specific day I used search_type=count
and

did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same
date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(Mats Norén) #5

Hi, Shay

Hmm....I'm not sure I quite understand your proposed solution.

I don't want it broken down by position (I think) but I would like to:

  1. get a specific users points_per_game for a specific date (interval)
  2. count how many records have a higher points_per_game for the date
  3. do this for a given range of dates
  4. return a list of counts (ie positions)

I've tried to come up with a proposed syntax for a threshold_histogram
(it's like a facet with a dynamic facet_filter)

I'm back at my computer so I did a gist instead of pasting code into a
mail :wink:

Is this possible?

/M

On Tue, Jul 26, 2011 at 8:03 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

So you want a "two level" histogram, first broken down by date, and then by
position? You can't do that, you will need to use something like terms facet
and create a facet per date range (with the date range as a filter for the
facet).

On Mon, Jul 25, 2011 at 4:36 PM, Mats Norén mats.noren@gmail.com wrote:

Hi, Shay
Not really, I would like use a count with a filter/query to get a
users position for each interval in a date_histogram.

So for instance if I have for each day in a month an average score of
3400 points and there two other users with an average of 3500 and 3300
I would like to get a position/rank 2 out of a total of 3 back from a
date_histogram facet.

As I tried to describe in my first query, I've used search_type=count
to get a users position for a certain day:

http://localhost:9200/games/measure/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and" : [
{
"term" : {
"date" : "2011-07-01"
}
},
{
"script": {
"script": "(doc['points'].value /
doc['games'].value) <
1000" <--- Users average score for the day! A separate query for
each day
}
}
]
}
}
}
}

Basically what I'm after is to be able to use some kind of facet that
for each date (or interval if that's possible) calculates the users
position/rank based on a field or a script.

Pseudocode

"date_histogram":{
"key_field":"date",
"value_script":"doc['points'].value / doc['games'].value",
"interval":"day",
"count_query": {
"term": { "user_id": 1 },
"operator": "gt"
}
},

Where the count_query above appends/expands the key_field(date in a
date_histogram) for each date_histogram as an and-clause to the
count_query and the field used for the comparison is the same as the
value_script in the date_histogram.

The result would be something like:

points_per_game: {

{ _type: date_histogram,
entries: [
{
time: 1122336000000
count: 3
position: 2 <--- New field
min: 1149.076923076923
max: 1467.8333333333333
total: 3865.710256410256
total_count: 3
mean: 1288.5700854700854
}
,{
time: 1122422400000
count: 3
position: 1
min: 1039.8181818181818
max: 1700
total: 3934.818181818182
total_count: 3
mean: 1311.6060606060607
}
]

Is this possible to achieve in another way?

/M

On Mon, Jul 25, 2011 at 2:11 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

I am not sure I followed the question properly..., but you can combine
several filters using and/or filters, is that what you are after? Btw,
for
perf reasons, I would index the points / games value as another field,
and
use a range filter on it, will be much faster...

On Sun, Jul 24, 2011 at 4:20 PM, Mats Norén mats.noren@gmail.com
wrote:

Hi,
I'm trying to use date_histogram to calculate a users rank/position
based
on a script value. Somehow I sense there is a solution but I don't
quite
seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users position
in
the total standings based on his average points per game.

To obtain a users position for a specific day I used search_type=count
and
did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same
date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(Shay Banon) #6

I (think) I understand, and wondering if facet_filter won't do what you
want? Basically, any facet type can be associated with a facet_filter,
which controls which documents will be included in the facet calculation.
For example, you can use term filter, or range filter for that. Here is the
docs for it (at the bottom):
http://www.elasticsearch.org/guide/reference/api/search/facets/.

Or, I just might be missing again what you are after :slight_smile:

On Tue, Jul 26, 2011 at 11:06 AM, Mats Norén mats.noren@gmail.com wrote:

Hi, Shay

Hmm....I'm not sure I quite understand your proposed solution.

I don't want it broken down by position (I think) but I would like to:

  1. get a specific users points_per_game for a specific date (interval)
  2. count how many records have a higher points_per_game for the date
  3. do this for a given range of dates
  4. return a list of counts (ie positions)

I've tried to come up with a proposed syntax for a threshold_histogram
(it's like a facet with a dynamic facet_filter)

I'm back at my computer so I did a gist instead of pasting code into a
mail :wink:

https://gist.github.com/1106230

Is this possible?

/M

On Tue, Jul 26, 2011 at 8:03 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

So you want a "two level" histogram, first broken down by date, and then
by
position? You can't do that, you will need to use something like terms
facet
and create a facet per date range (with the date range as a filter for
the
facet).

On Mon, Jul 25, 2011 at 4:36 PM, Mats Norén mats.noren@gmail.com
wrote:

Hi, Shay
Not really, I would like use a count with a filter/query to get a
users position for each interval in a date_histogram.

So for instance if I have for each day in a month an average score of
3400 points and there two other users with an average of 3500 and 3300
I would like to get a position/rank 2 out of a total of 3 back from a
date_histogram facet.

As I tried to describe in my first query, I've used search_type=count
to get a users position for a certain day:

http://localhost:9200/games/measure/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and" : [
{
"term" : {
"date" : "2011-07-01"
}
},
{
"script": {
"script": "(doc['points'].value /
doc['games'].value) <
1000" <--- Users average score for the day! A separate query for
each day
}
}
]
}
}
}
}

Basically what I'm after is to be able to use some kind of facet that
for each date (or interval if that's possible) calculates the users
position/rank based on a field or a script.

Pseudocode

"date_histogram":{
"key_field":"date",
"value_script":"doc['points'].value / doc['games'].value",
"interval":"day",
"count_query": {
"term": { "user_id": 1 },
"operator": "gt"
}
},

Where the count_query above appends/expands the key_field(date in a
date_histogram) for each date_histogram as an and-clause to the
count_query and the field used for the comparison is the same as the
value_script in the date_histogram.

The result would be something like:

points_per_game: {

{ _type: date_histogram,
entries: [
{
time: 1122336000000
count: 3
position: 2 <--- New field
min: 1149.076923076923
max: 1467.8333333333333
total: 3865.710256410256
total_count: 3
mean: 1288.5700854700854
}
,{
time: 1122422400000
count: 3
position: 1
min: 1039.8181818181818
max: 1700
total: 3934.818181818182
total_count: 3
mean: 1311.6060606060607
}
]

Is this possible to achieve in another way?

/M

On Mon, Jul 25, 2011 at 2:11 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

I am not sure I followed the question properly..., but you can combine
several filters using and/or filters, is that what you are after? Btw,
for
perf reasons, I would index the points / games value as another field,
and
use a range filter on it, will be much faster...

On Sun, Jul 24, 2011 at 4:20 PM, Mats Norén mats.noren@gmail.com
wrote:

Hi,
I'm trying to use date_histogram to calculate a users rank/position
based
on a script value. Somehow I sense there is a solution but I don't
quite
seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users
position

in
the total standings based on his average points per game.

To obtain a users position for a specific day I used
search_type=count

and
did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same
date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(Mats Norén) #7

Ok. I'll take a look at the facet_filters again.
I just don't see how I can do a count based on a filter that's varies by the day?

I'm thinking that my points_per_game for a certain query should be in the facet_filter but I don't see how I can use it on the righthandside of a boolean expression, ie
points_per_game > (points_per_game_for_user_id_1_for the_day)

Is there an example available that does something similar?

Best regards,
Mats

ds.
Thanks again for your patience and for a great piece of software.
ds.

26 jul 2011 kl. 14:00 skrev Shay Banon shay.banon@elasticsearch.com:

I (think) I understand, and wondering if facet_filter won't do what you want? Basically, any facet type can be associated with a facet_filter, which controls which documents will be included in the facet calculation. For example, you can use term filter, or range filter for that. Here is the docs for it (at the bottom): http://www.elasticsearch.org/guide/reference/api/search/facets/.

Or, I just might be missing again what you are after :slight_smile:

On Tue, Jul 26, 2011 at 11:06 AM, Mats Norén mats.noren@gmail.com wrote:
Hi, Shay

Hmm....I'm not sure I quite understand your proposed solution.

I don't want it broken down by position (I think) but I would like to:

  1. get a specific users points_per_game for a specific date (interval)
  2. count how many records have a higher points_per_game for the date
  3. do this for a given range of dates
  4. return a list of counts (ie positions)

I've tried to come up with a proposed syntax for a threshold_histogram
(it's like a facet with a dynamic facet_filter)

I'm back at my computer so I did a gist instead of pasting code into a
mail :wink:

https://gist.github.com/1106230

Is this possible?

/M

On Tue, Jul 26, 2011 at 8:03 AM, Shay Banon
shay.banon@elasticsearch.com wrote:

So you want a "two level" histogram, first broken down by date, and then by
position? You can't do that, you will need to use something like terms facet
and create a facet per date range (with the date range as a filter for the
facet).

On Mon, Jul 25, 2011 at 4:36 PM, Mats Norén mats.noren@gmail.com wrote:

Hi, Shay
Not really, I would like use a count with a filter/query to get a
users position for each interval in a date_histogram.

So for instance if I have for each day in a month an average score of
3400 points and there two other users with an average of 3500 and 3300
I would like to get a position/rank 2 out of a total of 3 back from a
date_histogram facet.

As I tried to describe in my first query, I've used search_type=count
to get a users position for a certain day:

http://localhost:9200/games/measure/_search?search_type=count
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and" : [
{
"term" : {
"date" : "2011-07-01"
}
},
{
"script": {
"script": "(doc['points'].value /
doc['games'].value) <
1000" <--- Users average score for the day! A separate query for
each day
}
}
]
}
}
}
}

Basically what I'm after is to be able to use some kind of facet that
for each date (or interval if that's possible) calculates the users
position/rank based on a field or a script.

Pseudocode

"date_histogram":{
"key_field":"date",
"value_script":"doc['points'].value / doc['games'].value",
"interval":"day",
"count_query": {
"term": { "user_id": 1 },
"operator": "gt"
}
},

Where the count_query above appends/expands the key_field(date in a
date_histogram) for each date_histogram as an and-clause to the
count_query and the field used for the comparison is the same as the
value_script in the date_histogram.

The result would be something like:

points_per_game: {

{ _type: date_histogram,
entries: [
{
time: 1122336000000
count: 3
position: 2 <--- New field
min: 1149.076923076923
max: 1467.8333333333333
total: 3865.710256410256
total_count: 3
mean: 1288.5700854700854
}
,{
time: 1122422400000
count: 3
position: 1
min: 1039.8181818181818
max: 1700
total: 3934.818181818182
total_count: 3
mean: 1311.6060606060607
}
]

Is this possible to achieve in another way?

/M

On Mon, Jul 25, 2011 at 2:11 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

I am not sure I followed the question properly..., but you can combine
several filters using and/or filters, is that what you are after? Btw,
for
perf reasons, I would index the points / games value as another field,
and
use a range filter on it, will be much faster...

On Sun, Jul 24, 2011 at 4:20 PM, Mats Norén mats.noren@gmail.com
wrote:

Hi,
I'm trying to use date_histogram to calculate a users rank/position
based
on a script value. Somehow I sense there is a solution but I don't
quite
seem to get there.

For each date I have a document like this:

{
"id": 1,
"user_id": 7,
"points": 4500,
"games": 2,
"date": "2011-07-24"
}

What I would like to do is to calculate for each day the users position
in
the total standings based on his average points per game.

To obtain a users position for a specific day I used search_type=count
and
did a filter on a script.

script: "(doc['points'].value / doc['games'].value) > 1000"

This required that I first had to obtain the users score for the same
date.

Is there a way to use a query/filter in the script clause of another?

And does anyone have an idea how to solve my usecase?

Btw, Shay, thank you for a fantastic piece of software!

Regards,
Mats


(system) #8