i'm fighting to get over the hump of gaining competence with elasticsearch_dsl for python.
i'm trying to do my end of the year (2020) reports.. and i have a set of indexes that total about 3/4 of a billion records. i need all the unqiue values and the count of those values.
using the dev interface its a simple query.. but you cant get all the results easily.
POST /lookout-hp*/_search?size=0
{
"size": 0,
"aggs" : {
"langs" : {
"terms" : { "field" : "password.keyword", "size" : 50000 }
}
}}
results:
{
"took" : 4221,
"timed_out" : false,
"_shards" : {
"total" : 115,
"successful" : 115,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" :
},
"aggregations" : {
"langs" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 376068,
"buckets" : [
{
"key" : "password",
"doc_count" : 349639
},
{
"key" : "admin",
"doc_count" : 254823
},
{
"key" : "123456",
"doc_count" : 200632
},
{
"key" : "",
"doc_count" : 186228
},
{
"key" : "1234",
"doc_count" : 110466
},
{
"key" : "root",
"doc_count" : 92418
},
...
how do i do this in elasticsearch_dsl and python to get all the results? i cant find any good examples.
are there any other good resources? online or a book?
thank you