ElasticSearch spark esRDD not returing the aggregate values in aggregated query

siva_pradeep · September 17, 2014, 1:13pm

Hi,

I have a query which filters the rows and then applies the aggregation. I
tried running the query in "Sense" it gave me the expected result. But when
I try to run the same query using elasticsearch-spark_2.10 I get the rows
filtered by the query but not the aggregation result. I am sure I am
missing some thing but unable to figure out that.

Here is the query

GET _search
{
"query" : {
"bool": {
"must": [
{
"filtered": {
"query": {
"range": {
"@timestamp": {
"from": "2014-09-03T01:40:37.437Z",
"to": "2014-09-03T01:45:11.437Z"
}
}
}
}
}
]
}
},

"size": 0,

"fields": ["cid","entity"],
"aggs": {
"cid": {
"terms": {
"field": "cid",
"min_doc_count": 2,
"size": 100
},

  "aggs": {
    "tn": {
      "terms": {
        "field": "entity"
      }
    }
  }
}

}
}

Query Result:

{
"took": 10005,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 2430,
"max_score": 0,
"hits": []
},
"aggregations": {
"cid": {
"buckets": [
{
"key": " 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168",
"doc_count": 2,
"tn": {
"buckets": [
{
"key": "15052563268",
"doc_count": 2
}
]
}
}
]
}
}
}

Spark program :

object PresenceFilter extends App {

val query: String = "{\n\n "query" : {\n\n "bool": {\n\n
"must": [\n\n {\n\n "filtered": {\n\n
"query": {\n\n "range": {\n\n
"@timestamp": {\n\n "from":
"2014-09-03T01:40:37.437Z",\n\n "to":
"2014-09-03T01:45:11.437Z"\n\n }\n\n
}\n\n }\n\n }\n\n }\n\n ]\n\n }\n\n
},\n \n "size": 0,\n \n "fields": ["cid","entity"],\n\n
"aggs": {\n\n "cid": {\n\n "terms": {\n\n "field":
"cid",\n\n "min_doc_count": 2,\n\n "size": 100\n\n
},\n \n "aggs": {\n\n "tn": {\n\n "terms":
{\n\n "field": "entity"\n\n }\n\n }\n\n
}\n\n }\n }\n\n}"

val sparkConf = new SparkConf()
.setAppName("PresenceAnalysis")
.setMaster("local[4]")
.set("es.nodes", "prs-wch-10.sys.comcast.net")
.set("es.port", "9200")
.set("es.resource", s"presence-2014.09.03/presence")
.set("es.endpoint", "_search")
// .set("es.query", query)
val sc = new SparkContext(sparkConf)

sc.esRDD.count returns 2430 rows

How do I get the aggregation part (the following part) of the result into
the program

"aggregations": {
"cid": {
"buckets": [
{
"key": " 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168",
"doc_count": 2,
"tn": {
"buckets": [
{
"key": "15052563268",
"doc_count": 2
}
]
}
}

Please advise.

Thanks,
Siva P

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d75bf6db-58c5-4ca8-b8d2-f45a3dd0a339%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeff_Steinmetz · October 17, 2014, 3:46pm

Siva,

Try the latest build of elasticsearch-hadoop, ver 2.1.0 Beta 2

The esRDD has been changed to sparks PairRDD
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

The RDD will now be key/value (tuples) that look like (String, Map[String,
ANY])

so you could start to walk the json key/value hierarchy with something like:

esRDD.flatMap { args => args._2.get("aggregations") }

(the syntax above is not exact, since your specific query result may have a
different first key/value pair as the first object )

Best,
Jeff Steinmetz
Director of Data Science
Ekho, Inc.

@jeffsteinmetz

On Wednesday, September 17, 2014 6:13:37 AM UTC-7, siva pradeep wrote:

Hi,

I have a query which filters the rows and then applies the aggregation. I
tried running the query in "Sense" it gave me the expected result. But when
I try to run the same query using elasticsearch-spark_2.10 I get the rows
filtered by the query but not the aggregation result. I am sure I am
missing some thing but unable to figure out that.

Here is the query

GET _search
{
"query" : {
"bool": {
"must": [
{
"filtered": {
"query": {
"range": {
"@timestamp": {
"from": "2014-09-03T01:40:37.437Z",
"to": "2014-09-03T01:45:11.437Z"
}
}
}
}
}
]
}
},

"size": 0,

"fields": ["cid","entity"],
"aggs": {
"cid": {
"terms": {
"field": "cid",
"min_doc_count": 2,
"size": 100
},
  "aggs": {
    "tn": {
      "terms": {
        "field": "entity"
      }
    }
  }
}
}
}

Query Result:

{
"took": 10005,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 2430,
"max_score": 0,
"hits":
},
"aggregations": {
"cid": {
"buckets": [
{
"key": " 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168
<javascript:>",
"doc_count": 2,
"tn": {
"buckets": [
{
"key": "15052563268",
"doc_count": 2
}
]
}
}
]
}
}
}

Spark program :

object PresenceFilter extends App {

val query: String = "{\n\n "query" : {\n\n "bool": {\n\n
"must": [\n\n {\n\n "filtered": {\n\n
"query": {\n\n "range": {\n\n
"@timestamp": {\n\n "from":
"2014-09-03T01:40:37.437Z",\n\n "to":
"2014-09-03T01:45:11.437Z"\n\n }\n\n
}\n\n }\n\n }\n\n }\n\n ]\n\n }\n\n
},\n \n "size": 0,\n \n "fields": ["cid","entity"],\n\n
"aggs": {\n\n "cid": {\n\n "terms": {\n\n "field":
"cid",\n\n "min_doc_count": 2,\n\n "size": 100\n\n
},\n \n "aggs": {\n\n "tn": {\n\n "terms":
{\n\n "field": "entity"\n\n }\n\n }\n\n
}\n\n }\n }\n\n}"

val sparkConf = new SparkConf()
.setAppName("PresenceAnalysis")
.setMaster("local[4]")
.set("es.nodes", "prs-wch-10.sys.comcast.net")
.set("es.port", "9200")
.set("es.resource", s"presence-2014.09.03/presence")
.set("es.endpoint", "_search")
// .set("es.query", query)
val sc = new SparkContext(sparkConf)

sc.esRDD.count returns 2430 rows

How do I get the aggregation part (the following part) of the result into
the program

"aggregations": {
"cid": {
"buckets": [
{
"key": " 01abcecc9a20cd3d6ae6be3509d014ba@76.96.107.168
<javascript:>",
"doc_count": 2,
"tn": {
"buckets": [
{
"key": "15052563268",
"doc_count": 2
}
]
}
}

Please advise.

Thanks,
Siva P

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/01d05c62-3095-4d9e-9407-4357add26896%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
How to get aggregations working in Elasticsearch Spark adapter? Elasticsearch	5	3276	July 6, 2017
Support for Aggregations in Spark Elasticsearch es-hadoop	2	1210	June 24, 2019
Possible to use aggregations Elasticsearch es-hadoop	3	731	July 6, 2017
ES Aggregations in Spark Elasticsearch es-hadoop	2	2194	July 6, 2017
Aggregation running very slow Elasticsearch es-hadoop	8	3040	July 6, 2017

ElasticSearch spark esRDD not returing the aggregate values in aggregated query

Related topics