Get unique users in db

Wouter_Durnez · June 1, 2018, 10:47am

Hey everyone,

I'm trying to get a list of unique users in my db. I'm using python. I've looked at other, similar questions, but I still can't manage to get what I want. My buckets are topped off at a length of 10. I tried inserting a size argument, but nothing changed the result.

I'm using the elasticsearch python package. Below is the format of the query I'm using.

# ID query
body = {
    "query": {
        "match_all": {}
    },
    "aggs": {
        "unique_id": {
            "terms": {
                "field": "id.keyword"
            }
        }
    }
}


# Search
res = es.search(index="my_index",
                body=body,
                request_timeout=60,
                #size=10000,  # Get first 1000 results
                #scroll='30s',  # Get scroll id to get next results
                doc_type="appevents")

This is what I get:

{   '_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
    'aggregations': {   'unique_id': {   'buckets': [   {   'doc_count': 89065,
                                                            'key': 'ea3c3d88-46c3-42b5-a89f-73ceccbaf1bc'},
                                                        {   'doc_count': 87323,
                                                            'key': '65cb588c-2a83-4cae-9199-bb9398c0dc39'},
                                                        {   'doc_count': 86311,
                                                            'key': '760fd726-30cc-48f0-b17e-722ab29e681b'},
                                                        {   'doc_count': 83260,
                                                            'key': 'a0e930b4-4e88-4eb1-b391-6e6b7436c639'},
                                                        {   'doc_count': 80307,
                                                            'key': 'fec988a2-6eba-49e0-8327-a89f25143ccf'},
                                                        {   'doc_count': 79968,
                                                            'key': '1159bcf7-ae5b-4d04-b702-01ea77dab259'},
                                                        {   'doc_count': 79051,
                                                            'key': '80934959-40aa-4159-8205-b50ae594a015'},
                                                        {   'doc_count': 62620,
                                                            'key': 'bb870b62-b008-4481-aa6b-6beafccdceb4'},
                                                        {   'doc_count': 61819,
                                                            'key': '69cf854a-71e8-48a8-ba13-cc32fcac1fe1'},
                                                        {   'doc_count': 61755,
                                                            'key': '89ff43b4-ec81-4f59-bdcd-860b23c48690'}],
                                         'doc_count_error_upper_bound': 49741,
                                         'sum_other_doc_count': 47391981}}, # Plus some other stuff I'm not interested in.

Any idea what I'm doing wrong? Or where exactly I should put the size argument? Nothing I tried worked, so far.

Mark_Harwood · June 1, 2018, 12:51pm

That size property is for the number of hits (documents) you want back. That should be zero as you're not interested in seeing any docs.
The size setting you want is in the terms aggregation to define the number of buckets (user ids) you want back. A note of warning though - this could be very expensive to attempt if you have a lot of users.
If you only want a count back consider using the cardinality aggregation

Wayne_Taylor · June 1, 2018, 12:53pm

Does this work. Here is an example we use

{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "tail_number:N522VA AND origination_airport:KBOS AND destination_airport:KSFO AND airline:VRD",
"analyze_wildcard": true
}
},
{
"range": {
"@timestamp": {
"gte": "now-24h"
}
}
}
],
"must_not": []
}
},
"_source": {
"excludes": []
},
"aggs": {
"unique_users": {
"cardinality": {
"field": "mac.keyword"
}
}
}
}

Wayne_Taylor · June 1, 2018, 12:54pm

Sorry for formatting will fix when I get in office

Mark_Harwood · June 1, 2018, 12:56pm

Looks legit.

Wouter_Durnez · June 1, 2018, 2:55pm

Hey guys,

Based on what Mark and Wayne replied, I finally found something that worked for me:

# ID query
body = {
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "unique_id": {
            "terms": {
                "field": "id.keyword",
                "size": 1000000
            }
        }
    }
}

res = es.search(index="myindex",
                body=body,
                request_timeout=240,
                doc_type="appevents")

# Get the ids
ids = {}

# Go over buckets and get
for b in res['aggregations']['unique_id']['buckets']:

    ids[b['key']] = b['doc_count']

return ids

This gives my all ids (got about 5400 now) with the associated doc counts. Thanks for the help!

system · June 29, 2018, 2:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Get list with all users name? Elasticsearch	10	6350	July 6, 2017
How to get Unique Records Elasticsearch	16	3555	December 10, 2020
Elasticsearch query for unique value Elasticsearch	3	553	August 1, 2019
How do I get all the distinct field values that satisfy a particular query Elasticsearch	2	687	July 6, 2017
Get distinct data Elasticsearch	4	399	July 6, 2017

Get unique users in db

Related topics