Retrieve unique key names from "property" field

mashuma · June 16, 2020, 9:45am

Hello everyone,

Quick question.. I've got following documents.

[
    {
        "aid": 1,
        "ameta": {
            "City": "Brussels",
            "Fleet": "",
            "Works": "Building",
            "Status": "new",
            "Country": "Belgium"
        }
    },
    {
        "aid": 2,
        "ameta": {
            "City": "Amsterdam",
            "Works": "Home",
            "Country": "Netherlands"
        }
    },
    {
        "aid": 3,
        "ameta": {
            "City": "Berlin",
            "Status": "ready",
            "Timezone": "CET",
            "Country": "Germany"
        }
    }
]

I'm looking for an aggregation query that can return all unique "terms" for ameta field. Something like

City
Fleet
Works
Status
Country
Timezone

In PGSQL I can do select distinct skeys(ameta) from ... (given ameta is hstore), but I'm struggling to find same approach to elastic.

Thanks in advance!

gab.bernasconi · June 16, 2020, 10:44am

I think you are looking for the Terms Aggregation

Given your example dataset, I would do something like this:

GET /<YOUR_INDEX_HERE/_search
{
  "aggs": {
    "city": {
      "terms": { "field": "City" }
    }
  }
}

You'll get an output with a bucket for each city, and the buckets would be unique. The doc_count key will give you the count of how many times that particular city is found, given your search criteria.

mashuma · June 16, 2020, 11:31am

Thanks for the feedback, but the expected output is not the unique values for 'ameta.city' or any other field values under 'ameta', but the unique key names across all 'ameta' fields.

The query or aggregation I'm looking for should produce something like

[ "City", "Fleet", "Works", "Status", "Country", "Timezone"]

from the example input of

[
    {
        "aid": 1,
        "ameta": {
            "City": "Brussels",
            "Fleet": "",
            "Works": "Building",
            "Status": "new",
            "Country": "Belgium"
        }
    },
    {
        "aid": 2,
        "ameta": {
            "City": "Amsterdam",
            "Works": "Home",
            "Country": "Netherlands"
        }
    },
    {
        "aid": 3,
        "ameta": {
            "City": "Berlin",
            "Status": "ready",
            "Timezone": "CET",
            "Country": "Germany"
        }
    }
]

The actual format of the output does not really matter as long as I can extrapolate the key names from it later in the code that would consuming it.

mashuma · June 16, 2020, 11:33am

The whole issue is that we want to provide list of possible keys to the user to choose from. We do not know how many different "keys" there are in the index under "ameta" beforehand.

What we used to do is get all fields and their values and extrapolate from that. However after upgrading our labs to 7.6.x from 6.8.x we run into error

field expansion matches too many fields, limit: 1024, got: 1672

gab.bernasconi · June 16, 2020, 11:38am

Ok, sorry I misunderstood your question. Would the Get mapping API be useful in this case?

mashuma · June 16, 2020, 11:43am

Yes and no.

Mapping API would solve this, if the data would be for whole index. In this case I can take a look at mapping and get the fields from there. This is sort of what we have been doing thus far by grabbing all data as mentioned in previous reply.

Actual use case involves retrieving list of "ameta" fields (or key names) for a subset of records..
Such as aid between 100 and 200 or some other criteria.

I've looked around quite a bit and starting to feel like such use case is not supported in elastic

gab.bernasconi · June 16, 2020, 11:53am

Maybe the _field_names field could help you in this case, but again, I don't think it can be coupled with search criteria.

mashuma · June 16, 2020, 2:33pm

Unfortunately it is a hard requirement to have this aggregation done for subset of documents. As each targeted "segment" of data may have very different available keys to choose from.

Vinayak_Sapre · June 16, 2020, 7:18pm

@mashuma
I am not sure how well it will perform. But you can achieve it using scripted metric aggregation

{
  "size": 0,
  "aggs": {
    "ameta_fields": {
      "scripted_metric": {
        "init_script": "state.propsMap = new HashMap()",
        "map_script": "for (String p : params['_source']['ameta'].keySet()) { state.propsMap.put(p, 1); }",
        "combine_script": "return state.propsMap;",
        "reduce_script": "Map combined = new HashMap(); for (s in states) { for (k in s.keySet()) { combined.put(k, 1); } } return combined"
      }
    }
  }
}

system · July 14, 2020, 7:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Distinct key value pairs Elasticsearch	3	1646	July 6, 2017
Extract the values of a field from elasticsearch Elasticsearch	4	616	September 26, 2019
Terms aggregation on _field_names field Kibana	4	1019	July 6, 2017
Read from Kibana index Elasticsearch	1	297	July 19, 2018
How to get Unique Records Elasticsearch	16	3881	December 10, 2020

Retrieve unique key names from "property" field

Related topics