How to combine similar items in aggregations

I'm trying to create a color filter. The colors being returned are mostly normal but there's a bunch of other off ones. Example, my results show up like this:

{ key: "WHITE", doc_count: 300 }, 
{ key: "OFFWHITE", doc_count: 2 }, 
{ key: "SUPER WHITE", doc_count: 1 }

My aggregation looks like:


aggs: {
    Color: {
        terms: {
            field: 'colors.keyword',
            size: 100,
        },
    },
}

I want to combine anything that includes 'white' together into one bucket. Same with the other top colors. Is this possible and how?

I don't think you'll be able to do this with a regular aggregation.

You could deal with the 'special' colours by using a filters aggregation, matching and gathering all variations of white into the same bucket but the resulting bool query could get hairy. You'd also have to deal with the regular colours in a normal terms aggregation.

Another option is to have an ingest pipeline. The pipeline could set a new 'canonical' colour field that's used in the aggregation. A script processor in the pipeline would copy the value of the Color field into the CanonicalColor field, changing it to white if the value of Color contains white. You'd still have the original value in the Color field if you need it.

I'm sorry this isn't an easy answer. If someone else has a more straight-forward way of solving this, I'd be interested in hearing it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.