Elasticsearch aggregation doesn't work with nested-type fields


(Oleg Anashkin) #1

Somehow I can't make elasticsearch aggregation+filter to work with nested fields. The data schema (relevant part) is like this:

"mappings": {
  "rb": {
    "properties": {
      "project": {
        "type": "nested",
        "properties": {
          "age": {
            "type": "long"
          },
          "name": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }    
    }
  }
}

Essentially "rb" object contains a nested field called "project" which contains two more fields - "name" and "age". Query I'm running:

"aggs": {
  "root": {
    "aggs": {
      "group": {
        "aggs": {
          "filtered": {
            "aggs": {
              "order": {
                "percentiles": {
                  "field": "project.age",
                  "percents": ["50"]
                }
              }
            },
            "filter": {
              "range": {
                "last_updated": {
                  "gte": "2015-01-01",
                  "lt": "2015-07-01"
                }
              }
            }
          }
        },
        "terms": {
          "field": "project.name",
          "min_doc_count": 5,
          "order": {
            "filtered>order.50": "asc"
          },
          "shard_size": 10,
          "size": 10
        }
      }
    },
    "nested": {
      "path": "project"
    }
  }
}

This query is supposed to produce top 10 projects (project.name field) which match the date filter, sorted by their median age, ignoring projects with less than 5 mentions in the database. Median should be calculated only for projects matching the filter (date range).

Despite having more than a hundred thousands objects in the database, this query produces empty list. No errors, just empty response. I've tried it both on ES 1.6 and ES 2.0-beta. What am I doing wrong?


(Ramy) #2

try do it like this!

{
  "query": {
    "filtered": {
      "query": {...}
    }
  },
  "filter": {...},
  "aggregations": {
    "root": {
      "nested": {
        "path": "project"
      },
      "aggregations": {
        "group": {
          "filter": {...},
          "aggregations": {
            "project_name": {
              "terms": {
                "field": "project.name",
                ...
              }
            }
          }
        }
      }
    }
  }
}

(Oleg Anashkin) #3

Not exactly like this, but it worked when I moved filtered aggregation to the top level. I also had to increase shard_size to 100, otherwise I got less than 10 values back:

"aggs": {
  "filtered": {
    "filter": {
      "range": {
        "last_updated": {
          "gte": "2015-01-01",
          "lt": "2015-07-01"
        }
      }
    },
    "aggs": {
      "root": {
        "nested": {
          "path": "project"
        },
        "aggs": {
          "group": {
            "terms": {
              "field": "project.name",
              "min_doc_count": 5,
              "order": {
                "order.50": "asc"
              },
              "shard_size": 100,
              "size": 10
            },
            "aggs": {
              "order": {
                "percentiles": {
                  "field": "project.age",
                  "percents": ["50"]
                }
              }
            }
          }
        }
      }
    }
  }
}

(Oleg Anashkin) #4

But now I have a bit more tricky question, involving two nested fields. I now have another field "comment" with exactly the same mapping as "project" - same nested format, same two subfields "name" and "age".

I need to change the query above to compute median on "project.age" but order by "comment.name" or vice versa. The purpose is to get top projects with lowest median comment time. But since both fields are nested and I can only specify one of them in nested aggregation, when I change order field from "project.age" to "comment.age" I don't get anything back.

Do I need to somehow utilize elasticsearch 2.0 sibling aggregations here or there's a simplier soluiton?


(system) #5