Mapping to analyze a list with a fixed amount of terms

snicoll · July 24, 2017, 4:24pm

In relation to this question Top terms in comma separated list

I am wondering if there is a way to skip the scripted fields and have that information natively in the index.

To give a bit more of context to that document: it has a foo attribute which is at the moment of type String. It represents user's choice for a given service (and there are a limited number of choices, say around 80 for the sake of the discussion. That list changes from time to time).

We're happy to reindex, change the data structure or whatever is necessary to get native support for this (that's our number one use case). We need to know which combination of terms were chosen for foo (the order doesn't matter and an empty list is also a valid combination).

To illustrate a bit more the problem, this query

GET /index-foo/_search
{
"aggs" : {
    "foos" : {
        "terms" : { "field" : "foo" }
    }
}}

returns the presence of each individual term (ignoring the fact that foo contains a list of values). This information is interesting and we'd like to keep it. But we'd also a way to get the distinct list of values.

Thanks!

Joe_Fleming · July 24, 2017, 6:24pm

If you want to do the array concatenation as is outlined in the other topic, it should be pretty easy to do using an Ingest node. There's a join processor built in that should do that for you. This will keep the records enriched as you index them as well, with no additional tooling (ie. Logstash).

If I recall correctly, you can reindex against an Ingest node too, but I may be wrong about that. I'm really versed on the reindex API.

snicoll · July 24, 2017, 6:31pm

Thanks for the reply!

So the recommendation is to duplicate the data then? One string injected as a json array and one string injected as a single string with all the terms? Since we inject that attribute using a json array I thought both representation would be available somehow.

Joe_Fleming · July 24, 2017, 7:16pm

Yeah, duplicate data in JSON datastores is pretty common. As CJ pointed out, you can avoid that by using scripted fields, but I assume there's some added overhead since that field needs to be calculated with each query (though I also assume it's cached...).

If the field doesn't change, then duplicating it isn't a big deal. If it's something that DOES change, keeping it in sync is probably going to be a real pain, and you should stick with scripted fields.

system · August 21, 2017, 7:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to work with Array fields and their mapping for efficient search and results Kibana	5	279	November 26, 2020
Visualizing Terms In A Field Kibana	4	353	February 7, 2020
Aggregating multiple values from single fields Kibana	5	244	June 21, 2022
Dealing with array data Kibana	4	14701	July 6, 2017
Joining Columns Kibana	5	11616	July 6, 2017

Mapping to analyze a list with a fixed amount of terms

Related topics