Sub-aggregations not working as expected


(chris Hahn) #1

Playing with elasticsearch for a project.
This is sample data, every document will have a list of ingredients, some
of the ingredients may be used in different documents and in different
quantities. I would like an aggregation that lists all ingredients for a
search, and the stats of each ingredient.

I'm looking to this group for suggestions, can I structure my query
different to get the results I would like (or should I structure my
document different?).
Basically I have two constraints: I don't know what the ingredients are
when the query is written. I would like to list all ingredients, and the
average amounts of each.

Sample data:
{
"ingredients": [
{
"name": "Rock",
"quantity": 6,
"unit": "lb"
},
{
"name": "Dirt",
"quantity": 6,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 3,
"unit": "lb"
}
]
}

This query looks like it works, but doesn't. I'm quite confused as to
where these numbers are coming from .
Query:
POST /concrete/recipe/_search
{
"query" : {"match_all" : {}},
"aggs" : {
"ingredients" : {
"terms" : {
"field" : "ingredients.name"
},
"aggs" : {
"pounds" : { "stats" : { "field" : "ingredients.quantity" }
}
}
}
}
}

Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "concrete",
"_type": "recipe",
"_id": "1",
"_score": 1,
"_source": {
"ingredients": [
{
"name": "Rock",
"quantity": 6,
"unit": "lb"
},
{
"name": "Dirt",
"quantity": 6,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 3,
"unit": "lb"
}
]
}
},
{
"_index": "concrete",
"_type": "recipe",
"_id": "2",
"_score": 1,
"_source": {
"ingredients": [
{
"name": "Rock",
"quantity": 8,
"unit": "lb"
},
{
"name": "Quartz",
"quantity": 0.5,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 4.5,
"unit": "lb"
}
]
}
}
]
},
"aggregations": {
"ingredients": {
"buckets": [
{
"key": "mortar",
"doc_count": 2,
"pounds": {
"count": 5,
"min": 0,
"max": 8,
"avg": 4.2,
"sum": 21
}
},
{
"key": "rock",
"doc_count": 2,
"pounds": {
"count": 5,
"min": 0,
"max": 8,
"avg": 4.2,
"sum": 21
}
},
{
"key": "dirt",
"doc_count": 1,
"pounds": {
"count": 2,
"min": 3,
"max": 6,
"avg": 4.5,
"sum": 9
}
},
{
"key": "quartz",
"doc_count": 1,
"pounds": {
"count": 3,
"min": 0,
"max": 8,
"avg": 4,
"sum": 12
}
}
]
}
}
}

Thanks for reading,
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e7ae4039-19f7-4e45-9b72-0465833fbdfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Paweł Młynarczyk) #2

Hello Chris

ElasticSearch does not recognize your 'recipe' list as a list of logically
connected fields. So when your top level aggregation returns i.e. rock,
your sub-aggregation does not count stats based on the rock material, but
based on all the materials that are included in the documents that also
include rock.

You could try to index those materials as child documents for the main
index and then just do aggs over the child type.
You may also want to read this:

Best regards
Paweł Młynarczyk

W dniu poniedziałek, 7 kwietnia 2014 02:16:38 UTC+2 użytkownik chris Hahn
napisał:

Playing with elasticsearch for a project.
This is sample data, every document will have a list of ingredients, some
of the ingredients may be used in different documents and in different
quantities. I would like an aggregation that lists all ingredients for a
search, and the stats of each ingredient.

I'm looking to this group for suggestions, can I structure my query
different to get the results I would like (or should I structure my
document different?).
Basically I have two constraints: I don't know what the ingredients are
when the query is written. I would like to list all ingredients, and the
average amounts of each.

Sample data:
{
"ingredients": [
{
"name": "Rock",
"quantity": 6,
"unit": "lb"
},
{
"name": "Dirt",
"quantity": 6,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 3,
"unit": "lb"
}
]
}

This query looks like it works, but doesn't. I'm quite confused as to
where these numbers are coming from .
Query:
POST /concrete/recipe/_search
{
"query" : {"match_all" : {}},
"aggs" : {
"ingredients" : {
"terms" : {
"field" : "ingredients.name"
},
"aggs" : {
"pounds" : { "stats" : { "field" : "ingredients.quantity"
} }
}
}
}
}

Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "concrete",
"_type": "recipe",
"_id": "1",
"_score": 1,
"_source": {
"ingredients": [
{
"name": "Rock",
"quantity": 6,
"unit": "lb"
},
{
"name": "Dirt",
"quantity": 6,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 3,
"unit": "lb"
}
]
}
},
{
"_index": "concrete",
"_type": "recipe",
"_id": "2",
"_score": 1,
"_source": {
"ingredients": [
{
"name": "Rock",
"quantity": 8,
"unit": "lb"
},
{
"name": "Quartz",
"quantity": 0.5,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 4.5,
"unit": "lb"
}
]
}
}
]
},
"aggregations": {
"ingredients": {
"buckets": [
{
"key": "mortar",
"doc_count": 2,
"pounds": {
"count": 5,
"min": 0,
"max": 8,
"avg": 4.2,
"sum": 21
}
},
{
"key": "rock",
"doc_count": 2,
"pounds": {
"count": 5,
"min": 0,
"max": 8,
"avg": 4.2,
"sum": 21
}
},
{
"key": "dirt",
"doc_count": 1,
"pounds": {
"count": 2,
"min": 3,
"max": 6,
"avg": 4.5,
"sum": 9
}
},
{
"key": "quartz",
"doc_count": 1,
"pounds": {
"count": 3,
"min": 0,
"max": 8,
"avg": 4,
"sum": 12
}
}
]
}
}
}

Thanks for reading,
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/acfe01f9-0d35-45e9-9e93-a51170aa15be%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(chris Hahn) #3

Thanks for the reply,
Parent child did return the desired results.

I probably won't go this route because I am hoping to accomplish more
join-like functions, but I did want to reply and let you know your advice
was correct.

Chris

On Monday, April 7, 2014 1:46:05 AM UTC-5, Paweł Młynarczyk wrote:

Hello Chris

ElasticSearch does not recognize your 'recipe' list as a list of logically
connected fields. So when your top level aggregation returns i.e. rock,
your sub-aggregation does not count stats based on the rock material, but
based on all the materials that are included in the documents that also
include rock.

You could try to index those materials as child documents for the main
index and then just do aggs over the child type.
You may also want to read this:
http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/

Best regards
Paweł Młynarczyk

W dniu poniedziałek, 7 kwietnia 2014 02:16:38 UTC+2 użytkownik chris Hahn
napisał:

Playing with elasticsearch for a project.
This is sample data, every document will have a list of ingredients, some
of the ingredients may be used in different documents and in different
quantities. I would like an aggregation that lists all ingredients for a
search, and the stats of each ingredient.

I'm looking to this group for suggestions, can I structure my query
different to get the results I would like (or should I structure my
document different?).
Basically I have two constraints: I don't know what the ingredients are
when the query is written. I would like to list all ingredients, and the
average amounts of each.

Sample data:
{
"ingredients": [
{
"name": "Rock",
"quantity": 6,
"unit": "lb"
},
{
"name": "Dirt",
"quantity": 6,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 3,
"unit": "lb"
}
]
}

This query looks like it works, but doesn't. I'm quite confused as to
where these numbers are coming from .
Query:
POST /concrete/recipe/_search
{
"query" : {"match_all" : {}},
"aggs" : {
"ingredients" : {
"terms" : {
"field" : "ingredients.name"
},
"aggs" : {
"pounds" : { "stats" : { "field" : "ingredients.quantity"
} }
}
}
}
}

Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "concrete",
"_type": "recipe",
"_id": "1",
"_score": 1,
"_source": {
"ingredients": [
{
"name": "Rock",
"quantity": 6,
"unit": "lb"
},
{
"name": "Dirt",
"quantity": 6,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 3,
"unit": "lb"
}
]
}
},
{
"_index": "concrete",
"_type": "recipe",
"_id": "2",
"_score": 1,
"_source": {
"ingredients": [
{
"name": "Rock",
"quantity": 8,
"unit": "lb"
},
{
"name": "Quartz",
"quantity": 0.5,
"unit": "lb"
},
{
"name": "Mortar",
"quantity": 4.5,
"unit": "lb"
}
]
}
}
]
},
"aggregations": {
"ingredients": {
"buckets": [
{
"key": "mortar",
"doc_count": 2,
"pounds": {
"count": 5,
"min": 0,
"max": 8,
"avg": 4.2,
"sum": 21
}
},
{
"key": "rock",
"doc_count": 2,
"pounds": {
"count": 5,
"min": 0,
"max": 8,
"avg": 4.2,
"sum": 21
}
},
{
"key": "dirt",
"doc_count": 1,
"pounds": {
"count": 2,
"min": 3,
"max": 6,
"avg": 4.5,
"sum": 9
}
},
{
"key": "quartz",
"doc_count": 1,
"pounds": {
"count": 3,
"min": 0,
"max": 8,
"avg": 4,
"sum": 12
}
}
]
}
}
}

Thanks for reading,
Chris

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af2d58f7-03da-43b9-b4cd-971f2a3217c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4