Curl XGET not returning all the documents


(sami) #1

I have 20 million documents in an index , the _stats also confirm that and there are 914 distinct toll_amt_full field ( confirmed from hive table) .
But curl aggregation query is only returning 10 documents ?

hive> select toll_amt_full,count(*) from pa_lane_txn_es group by toll_amt_full;
Query ID = root_20170802225401_c1f1147b-79b4-48b9-a2e6-e0bc4661737d
Total jobs = 1
Launching Job 1 out of 1

28.0 57968
29.0 1
.
.
.
56100.0 1
Time taken: 1547.462 seconds, Fetched: 914 row(s)
hive>

[elasticsearch@hadoop5 config]$ curl hadoop5:9200/lanetxn/_stats?pretty
{
"_shards" : {
"total" : 10,
"successful" : 10,
"failed" : 0
},
"_all" : {
"primaries" : {
"docs" : {
"count" : 20000020,
"deleted" : 0
},

[elasticsearch@hadoop5 config]$ curl -XPOST 'hadoop5:9200/lanetxn/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_toll_amt_full": {
"terms": {
"field": "toll_amt_full"
}
}
}
}'
{
"took" : 348,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 20000020,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"group_by_toll_amt_full" : {
"doc_count_error_upper_bound" : 96654,
"sum_other_doc_count" : 6107485,
"buckets" : [
{
"key" : 132.0,
"doc_count" : 2797710
},
{
"key" : 50.0,
"doc_count" : 1843111
},
{
"key" : 140.0,
"doc_count" : 1831116
},
{
"key" : 0.0,
"doc_count" : 1354803
},
{
"key" : 100.0,
"doc_count" : 1257437
},
{
"key" : 125.0,
"doc_count" : 1173364
},
{
"key" : 82.0,
"doc_count" : 1103776
},
{
"key" : 79.0,
"doc_count" : 987945
},
{
"key" : 75.0,
"doc_count" : 775867
},
{
"key" : 70.0,
"doc_count" : 767406
}
]
}
}
}


(Mark Walkom) #2

Have a read of https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-from-size.html


(sami) #3

maybe I am not using the right syntax , added the from: clause but still getting 10 docs

curl -XPOST 'hadoop5:9200/lanetxn/_search?pretty' -d '
{
"from" : 0, "size" : 1000,
"size": 0,
"aggs": {
"group_by_toll_amt_full": {
"terms": {
"field": "toll_amt_full"
}
}
}
}'


(David Pilato) #4

Move the size:1000 inside the terms agg


(sami) #5

moved it inside but still not good ..

curl -XPOST 'hadoop5:9200/lanetxn/_search?pretty' -d '
{
"size" : 0,
"aggs" : {
"from" : 0, "size" : 1000,
"txn_over_time" : {
"date_histogram" : {
"field" : "txn_process_date",
"interval" : "month"
}
}
}
}
'


(David Pilato) #6

You changed the aggregation. Was terms and now date_histogram.
You didn't move what I said where I said.

Is that a new problem you want to solve?


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.