Retrieve unique key names from "property" field

Hello everyone,

Quick question.. I've got following documents.

[
    {
        "aid": 1,
        "ameta": {
            "City": "Brussels",
            "Fleet": "",
            "Works": "Building",
            "Status": "new",
            "Country": "Belgium"
        }
    },
    {
        "aid": 2,
        "ameta": {
            "City": "Amsterdam",
            "Works": "Home",
            "Country": "Netherlands"
        }
    },
    {
        "aid": 3,
        "ameta": {
            "City": "Berlin",
            "Status": "ready",
            "Timezone": "CET",
            "Country": "Germany"
        }
    }
]

I'm looking for an aggregation query that can return all unique "terms" for ameta field. Something like

City
Fleet
Works
Status
Country
Timezone

In PGSQL I can do select distinct skeys(ameta) from ... (given ameta is hstore), but I'm struggling to find same approach to elastic.

Thanks in advance!

I think you are looking for the Terms Aggregation

Given your example dataset, I would do something like this:

GET /<YOUR_INDEX_HERE/_search
{
  "aggs": {
    "city": {
      "terms": { "field": "City" }
    }
  }
}

You'll get an output with a bucket for each city, and the buckets would be unique. The doc_count key will give you the count of how many times that particular city is found, given your search criteria.

Thanks for the feedback, but the expected output is not the unique values for 'ameta.city' or any other field values under 'ameta', but the unique key names across all 'ameta' fields.

The query or aggregation I'm looking for should produce something like

[ "City", "Fleet", "Works", "Status", "Country", "Timezone"]

from the example input of

[
    {
        "aid": 1,
        "ameta": {
            "City": "Brussels",
            "Fleet": "",
            "Works": "Building",
            "Status": "new",
            "Country": "Belgium"
        }
    },
    {
        "aid": 2,
        "ameta": {
            "City": "Amsterdam",
            "Works": "Home",
            "Country": "Netherlands"
        }
    },
    {
        "aid": 3,
        "ameta": {
            "City": "Berlin",
            "Status": "ready",
            "Timezone": "CET",
            "Country": "Germany"
        }
    }
]

The actual format of the output does not really matter as long as I can extrapolate the key names from it later in the code that would consuming it.

The whole issue is that we want to provide list of possible keys to the user to choose from. We do not know how many different "keys" there are in the index under "ameta" beforehand.

What we used to do is get all fields and their values and extrapolate from that. However after upgrading our labs to 7.6.x from 6.8.x we run into error

field expansion matches too many fields, limit: 1024, got: 1672

Ok, sorry I misunderstood your question. Would the Get mapping API be useful in this case?

Yes and no.

Mapping API would solve this, if the data would be for whole index. In this case I can take a look at mapping and get the fields from there. This is sort of what we have been doing thus far by grabbing all data as mentioned in previous reply.

Actual use case involves retrieving list of "ameta" fields (or key names) for a subset of records..
Such as aid between 100 and 200 or some other criteria.

I've looked around quite a bit and starting to feel like such use case is not supported in elastic :frowning:

Maybe the _field_names field could help you in this case, but again, I don't think it can be coupled with search criteria.

Unfortunately it is a hard requirement to have this aggregation done for subset of documents. As each targeted "segment" of data may have very different available keys to choose from.

@mashuma
I am not sure how well it will perform. But you can achieve it using scripted metric aggregation

{
  "size": 0,
  "aggs": {
    "ameta_fields": {
      "scripted_metric": {
        "init_script": "state.propsMap = new HashMap()",
        "map_script": "for (String p : params['_source']['ameta'].keySet()) { state.propsMap.put(p, 1); }",
        "combine_script": "return state.propsMap;",
        "reduce_script": "Map combined = new HashMap(); for (s in states) { for (k in s.keySet()) { combined.put(k, 1); } } return combined"
      }
    }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.