Get unique users in db


(Wouter Durnez) #1

Hey everyone,

I'm trying to get a list of unique users in my db. I'm using python. I've looked at other, similar questions, but I still can't manage to get what I want. My buckets are topped off at a length of 10. I tried inserting a size argument, but nothing changed the result.

I'm using the elasticsearch python package. Below is the format of the query I'm using.

# ID query
body = {
    "query": {
        "match_all": {}
    },
    "aggs": {
        "unique_id": {
            "terms": {
                "field": "id.keyword"
            }
        }
    }
}


# Search
res = es.search(index="my_index",
                body=body,
                request_timeout=60,
                #size=10000,  # Get first 1000 results
                #scroll='30s',  # Get scroll id to get next results
                doc_type="appevents")

This is what I get:

{   '_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
    'aggregations': {   'unique_id': {   'buckets': [   {   'doc_count': 89065,
                                                            'key': 'ea3c3d88-46c3-42b5-a89f-73ceccbaf1bc'},
                                                        {   'doc_count': 87323,
                                                            'key': '65cb588c-2a83-4cae-9199-bb9398c0dc39'},
                                                        {   'doc_count': 86311,
                                                            'key': '760fd726-30cc-48f0-b17e-722ab29e681b'},
                                                        {   'doc_count': 83260,
                                                            'key': 'a0e930b4-4e88-4eb1-b391-6e6b7436c639'},
                                                        {   'doc_count': 80307,
                                                            'key': 'fec988a2-6eba-49e0-8327-a89f25143ccf'},
                                                        {   'doc_count': 79968,
                                                            'key': '1159bcf7-ae5b-4d04-b702-01ea77dab259'},
                                                        {   'doc_count': 79051,
                                                            'key': '80934959-40aa-4159-8205-b50ae594a015'},
                                                        {   'doc_count': 62620,
                                                            'key': 'bb870b62-b008-4481-aa6b-6beafccdceb4'},
                                                        {   'doc_count': 61819,
                                                            'key': '69cf854a-71e8-48a8-ba13-cc32fcac1fe1'},
                                                        {   'doc_count': 61755,
                                                            'key': '89ff43b4-ec81-4f59-bdcd-860b23c48690'}],
                                         'doc_count_error_upper_bound': 49741,
                                         'sum_other_doc_count': 47391981}}, # Plus some other stuff I'm not interested in.

Any idea what I'm doing wrong? Or where exactly I should put the size argument? Nothing I tried worked, so far.


(Mark Harwood) #2

That size property is for the number of hits (documents) you want back. That should be zero as you're not interested in seeing any docs.
The size setting you want is in the terms aggregation to define the number of buckets (user ids) you want back. A note of warning though - this could be very expensive to attempt if you have a lot of users.
If you only want a count back consider using the cardinality aggregation


(Wayne Taylor) #3

Does this work. Here is an example we use

{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "tail_number:N522VA AND origination_airport:KBOS AND destination_airport:KSFO AND airline:VRD",
"analyze_wildcard": true
}
},
{
"range": {
"@timestamp": {
"gte": "now-24h"
}
}
}
],
"must_not": []
}
},
"_source": {
"excludes": []
},
"aggs": {
"unique_users": {
"cardinality": {
"field": "mac.keyword"
}
}
}
}


(Wayne Taylor) #4

Sorry for formatting will fix when I get in office


(Mark Harwood) #5

Looks legit.


(Wouter Durnez) #6

Hey guys,

Based on what Mark and Wayne replied, I finally found something that worked for me:

# ID query
body = {
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "unique_id": {
            "terms": {
                "field": "id.keyword",
                "size": 1000000
            }
        }
    }
}

res = es.search(index="myindex",
                body=body,
                request_timeout=240,
                doc_type="appevents")

# Get the ids
ids = {}

# Go over buckets and get
for b in res['aggregations']['unique_id']['buckets']:

    ids[b['key']] = b['doc_count']

return ids

This gives my all ids (got about 5400 now) with the associated doc counts. Thanks for the help!


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.