ValuesSourceType in ValuesSourceAggregationBuilder constructors

Can someone explain me what does ValuesSourceType fields denote in ValuesSourceAggreationBuilder constructors?

  • ValuesSourceAggregationBuilder(StreamInput in, ValuesSourceType valuesSourceType, ValueType targetValueType)
  • ValuesSourceAggregationBuilder(StreamInput in, ValuesSourceType valuesSourceType)

ValuesSourceType is roughly the type of data that the aggregator expects to work with. For example, MaxAggregatorBuilder uses ValuesSourceType.NUMERIC because it only works with numeric fields. ValuesSourceType.ANY is sort of a wildcard that means the aggregator can accept type of field. The missing aggregator is a good example: it doesn't really care what the data type is, it's just looking to see if the doc is missing a value or not.

That's a rough approximation, it's a little more complicated because the VS Type is used in a few different places. The main location is ValuesSourceConfig#resolve(), and helps resolve what kind of field data implementation we should load. Scripting, missing, and value_type parameters on the agg affect this decision too.

And to head off an obvious followup question, ValueType sorta maps the VSType, formatter and field data implementation to a specific field type. E.g. a long and double field have different ValueTypes, but both of those ValueTypes map to a ValuesSourceType.Numeric.

It's all a bit confusing and an area we're actively trying to refactor :slight_smile:

Thanks @polyfractal for the explanation.

Can you also explain me the use of ValuesSourceAggregationBuilder(StreamInput in, ValuesSourceType valuesSourceType) in which no targetValueType is specified ?

Sure. The comments for that stream constructor are helpful in this case:

* Read an aggregation from a stream that serializes its targetValueType. This should only be used by subclasses that override
* {@link #serializeTargetValueType(Version)} to return true.

Aggregations like the terms agg accept ANY as their values source, which means the actual type isn't known at compile time via a generic like the other aggs (max for example). So these aggregations need to serialize/deserialize their targetValueType explicitly from the stream, rather than getting it as a constructor parameter.

I think the only aggs that use ValuesSourceAggregationBuilder(StreamInput in, ValuesSourceType valuesSourceType) are the aggs that specify ANY as their VSType. Everything else uses the other constructor.

Out of curiosity, what're you building? :slight_smile: If it's private/not able to share that's fine, I'm just interested in what aggs community members are making :slight_smile:

1 Like

Thanks @polyfractal again.

Yes for now it is private. I am trying to write a custom dedup aggregation plugin for our dataset.

:+1: Sounds interesting! Let me know if you have any more questions :slight_smile:

Just as a side note: the internal API doesn't change too often, but we don't provide any backwards compat guarantees either. So it can be challenging sometimes to maintain a custom plugin since we may accidentally break your code with an internal refactor, etc.

As the plugin API grows (and we remove more instances of Guice injection) we're hoping to have more consistent BWC guarantees, but it's not as rigorous as say the REST API.

1 Like

Yeah sure @polyfractal ... I will try to reach you when I have more doubts. Thanks again :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.