Terms Stats Question/Problem


(danoyoung) #1

I'm trying to run a term_stats facet to calculate some values with a
grouping, but am curious why the values returned don't add to what I
expect....I might be missing something on how the term_stats is suppose to
work.

Here's my query:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true'
-d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}
}'

I have two records that meet the filter criteria. The two values for
the actual_click_cost_last_24_hours in the documents are as follows:

doc1:
actual_click_cost_last_24_hours: 2.54

doc2:
actual_click_cost_last_24_hours: 1.75

When I run the query, these are the results of the facet:
.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.0,
"max" : 2.0,
"total" : 3.0,
"mean" : 1.5
} ]
}
}
}

Why wouldn't the total be 2.54+1.75=4.29 vs 3.0? it seems that the .54 and
.75 are being lost.....how can if prevent the rounding?

Regards,

Dan


(danoyoung) #2

Looks like if I use the _source field the values appear to be correct
now:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?
pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_script" :
"_source.actual_click_cost_last_24_hours"
}
}
}
}'

results:

"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145

On Oct 11, 10:31 pm, Dan Young danoyo...@gmail.com wrote:

I'm trying to run a term_stats facet to calculate some values with a
grouping, but am curious why the values returned don't add to what I
expect....I might be missing something on how the term_stats is suppose to
work.

Here's my query:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?pretty=true'
-d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }}
}

},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}

}'

I have two records that meet the filter criteria. The two values for
the actual_click_cost_last_24_hours in the documents are as follows:

doc1:
actual_click_cost_last_24_hours: 2.54

doc2:
actual_click_cost_last_24_hours: 1.75

When I run the query, these are the results of the facet:
.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.0,
"max" : 2.0,
"total" : 3.0,
"mean" : 1.5
} ]
}
}

}

Why wouldn't the total be 2.54+1.75=4.29 vs 3.0? it seems that the .54 and
.75 are being lost.....how can if prevent the rounding?

Regards,

Dan


(Shay Banon) #3

That should not really be the case, can you gist a recreation (including
indexing sample data)? See http://www.elasticsearch.org/help.

On Wed, Oct 12, 2011 at 6:39 AM, Dan Young danoyoung@gmail.com wrote:

Looks like if I use the _source field the values appear to be correct
now:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?
pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : {
"campaign_group_id" : 187 } },
"must" : { "field" : {
"optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id"
: 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_script" :
"_source.actual_click_cost_last_24_hours"
}
}
}
}'

results:

"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145

On Oct 11, 10:31 pm, Dan Young danoyo...@gmail.com wrote:

I'm trying to run a term_stats facet to calculate some values with a
grouping, but am curious why the values returned don't add to what I
expect....I might be missing something on how the term_stats is suppose
to
work.

Here's my query:

curl -s -XGET '
http://localhost:9200/my_index/my_type/_search?pretty=true'
-d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }}
}

},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}

}'

I have two records that meet the filter criteria. The two values for
the actual_click_cost_last_24_hours in the documents are as follows:

doc1:
actual_click_cost_last_24_hours: 2.54

doc2:
actual_click_cost_last_24_hours: 1.75

When I run the query, these are the results of the facet:
.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.0,
"max" : 2.0,
"total" : 3.0,
"mean" : 1.5
} ]
}
}

}

Why wouldn't the total be 2.54+1.75=4.29 vs 3.0? it seems that the .54
and
.75 are being lost.....how can if prevent the rounding?

Regards,

Dan


(danoyoung) #4

SHay,

Here's a gist.....Let me know if you need anything else. Thank you
for such a great product/project!

As noted, if I reference _source, the results come back correctly.

Regards,

Dan

On Oct 12, 3:14 pm, Shay Banon kim...@gmail.com wrote:

That should not really be the case, can you gist a recreation (including
indexing sample data)? Seehttp://www.elasticsearch.org/help.

On Wed, Oct 12, 2011 at 6:39 AM, Dan Young danoyo...@gmail.com wrote:

Looks like if I use the _source field the values appear to be correct
now:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?
pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : {
"campaign_group_id" : 187 } },
"must" : { "field" : {
"optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id"
: 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_script" :
"_source.actual_click_cost_last_24_hours"
}
}
}
}'

results:

"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145

On Oct 11, 10:31 pm, Dan Young danoyo...@gmail.com wrote:

I'm trying to run a term_stats facet to calculate some values with a
grouping, but am curious why the values returned don't add to what I
expect....I might be missing something on how the term_stats is suppose
to
work.

Here's my query:

curl -s -XGET '
http://localhost:9200/my_index/my_type/_search?pretty=true'
-d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }}
}

},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}

}'

I have two records that meet the filter criteria. The two values for
the actual_click_cost_last_24_hours in the documents are as follows:

doc1:
actual_click_cost_last_24_hours: 2.54

doc2:
actual_click_cost_last_24_hours: 1.75

When I run the query, these are the results of the facet:
.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.0,
"max" : 2.0,
"total" : 3.0,
"mean" : 1.5
} ]
}
}

}

Why wouldn't the total be 2.54+1.75=4.29 vs 3.0? it seems that the .54
and
.75 are being lost.....how can if prevent the rounding?

Regards,

Dan


(danoyoung) #5

Ok Shay,

Something weird....possibly on my side. I deleted my index, recreated
it and the mapping, reindexed the data and now the query works and
returns the correct results.

curl -s -XGET 'http://localhost:9200/daily_ad_network_keywords_2011/
daily_ad_network_keyword/_search?pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}
}'

Results:

.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145
} ]
}
}
}

This is weird....I'm using the bulk API via the REST interface, where
I put the message on rabbitmq and then have the messages consumed via
the rabbitmq river. I'm going to do some additional testing....but
this is strange...

On Oct 12, 3:14 pm, Shay Banon kim...@gmail.com wrote:

That should not really be the case, can you gist a recreation (including
indexing sample data)? Seehttp://www.elasticsearch.org/help.

On Wed, Oct 12, 2011 at 6:39 AM, Dan Young danoyo...@gmail.com wrote:

Looks like if I use the _source field the values appear to be correct
now:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?
pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : {
"campaign_group_id" : 187 } },
"must" : { "field" : {
"optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id"
: 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_script" :
"_source.actual_click_cost_last_24_hours"
}
}
}
}'

results:

"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145

On Oct 11, 10:31 pm, Dan Young danoyo...@gmail.com wrote:

I'm trying to run a term_stats facet to calculate some values with a
grouping, but am curious why the values returned don't add to what I
expect....I might be missing something on how the term_stats is suppose
to
work.

Here's my query:

curl -s -XGET '
http://localhost:9200/my_index/my_type/_search?pretty=true'
-d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }}
}

},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}

}'

I have two records that meet the filter criteria. The two values for
the actual_click_cost_last_24_hours in the documents are as follows:

doc1:
actual_click_cost_last_24_hours: 2.54

doc2:
actual_click_cost_last_24_hours: 1.75

When I run the query, these are the results of the facet:
.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.0,
"max" : 2.0,
"total" : 3.0,
"mean" : 1.5
} ]
}
}

}

Why wouldn't the total be 2.54+1.75=4.29 vs 3.0? it seems that the .54
and
.75 are being lost.....how can if prevent the rounding?

Regards,

Dan


(Shay Banon) #6

Strange..., well, ping if it happens again. A recreation would help with
actual curl requests for what you do, even for the mapping / indexing hte
data.

On Thu, Oct 13, 2011 at 5:24 AM, Dan Young danoyoung@gmail.com wrote:

Ok Shay,

Something weird....possibly on my side. I deleted my index, recreated
it and the mapping, reindexed the data and now the query works and
returns the correct results.

curl -s -XGET 'http://localhost:9200/daily_ad_network_keywords_2011/
daily_ad_network_keyword/_search?pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : {
"campaign_group_id" : 187 } },
"must" : { "field" : {
"optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id"
: 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}
}'

Results:

.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145
} ]
}
}
}

This is weird....I'm using the bulk API via the REST interface, where
I put the message on rabbitmq and then have the messages consumed via
the rabbitmq river. I'm going to do some additional testing....but
this is strange...

On Oct 12, 3:14 pm, Shay Banon kim...@gmail.com wrote:

That should not really be the case, can you gist a recreation (including
indexing sample data)? Seehttp://www.elasticsearch.org/help.

On Wed, Oct 12, 2011 at 6:39 AM, Dan Young danoyo...@gmail.com wrote:

Looks like if I use the _source field the values appear to be correct
now:

curl -s -XGET 'http://localhost:9200/my_index/my_type/_search?
pretty=true' -d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : {
"campaign_group_id" : 187 } },
"must" : { "field" : {
"optimizer_id" : 79 } },
"must" : { "field" : {
"keyword_id"

: 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }
}
}
},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_script" :
"_source.actual_click_cost_last_24_hours"
}
}
}
}'

results:

"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.75,
"max" : 2.54,
"total" : 4.29,
"mean" : 2.145

On Oct 11, 10:31 pm, Dan Young danoyo...@gmail.com wrote:

I'm trying to run a term_stats facet to calculate some values with a
grouping, but am curious why the values returned don't add to what I
expect....I might be missing something on how the term_stats is
suppose

to

work.

Here's my query:

curl -s -XGET '
http://localhost:9200/my_index/my_type/_search?pretty=true'
-d '
{
"query": {
"filtered": {
"query": {
"bool" : {
"must" : { "field" : { "campaign_group_id" : 187 } },
"must" : { "field" : { "optimizer_id" : 79 } },
"must" : { "field" : { "keyword_id" : 638489 } }
}
},
"filter": {
"numeric_range": { "dw_date_marker_id": { "from": 1022, "to":
1022,"include_lower" : true,"include_upper":true } }}
}

},
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"terms_stats" : {
"key_field" : "keyword_id",
"value_field" : "actual_click_cost_last_24_hours"
}
}
}

}'

I have two records that meet the filter criteria. The two values for
the actual_click_cost_last_24_hours in the documents are as follows:

doc1:
actual_click_cost_last_24_hours: 2.54

doc2:
actual_click_cost_last_24_hours: 1.75

When I run the query, these are the results of the facet:
.....
.....
"facets" : {
"actual_click_cost_last_24_hours_stats" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : 638489,
"count" : 2,
"total_count" : 2,
"min" : 1.0,
"max" : 2.0,
"total" : 3.0,
"mean" : 1.5
} ]
}
}

}

Why wouldn't the total be 2.54+1.75=4.29 vs 3.0? it seems that the
.54

and

.75 are being lost.....how can if prevent the rounding?

Regards,

Dan


(system) #7