Aggregation across multiple indexes/indices - significant terms

Hi, we have indexes that are split by date for manageability but contain the same mappings.

I'm currently trying to use the significant terms aggregation to identify new terms, by specifying the foreground as e.g. the last day and the background as the rest of history. When the aggregation is confined to one index it works as expected, however the background frequencies with multiple indexes do not include the same fields across all the indices, I imagine only including the background frequency within the index that the term was found in. Is this expected behaviour?

Could explain more about this?
Do background frequencies contain other fields from other indices?

Why do you think so? I suppose its reasonable background frequency calculated over all indices regardless of whether the index contains the term.

Hi Anthony.
Yes, this is expected behaviour.
Background frequency checks on potentially millions of candidate terms is expensive and so the implementation works with local stats found in a shard.
The goal of finding “what is significant today?” is just not feasible using only day-based indices.

Distributed data is the enemy of a lot of analytical functions I’m afraid.

1 Like

Thanks for replying! I was imprecise here, I meant that the multiple indexes do not include the same terms rather than fields - it seems that statistics are only collated locally (it seems to a shard) rather than across the whole group of indexes. The bg_count number seems to include the total across the indices however, up to a limit.

I found that the background count did not include occurences of the term in other indices.

1 Like

Thanks for getting back to me. Sounds reasonable. The background count for that term seems to be an extra search, it would have to be limited to the local shard or there may be in aggregate many many terms to search for across all the shards. It's straightforward enough to compile the information another way.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.