A transform in a nutshell is a task that runs aggregation queries and persists the result in an index. In order to join 2 indices you need compatible mappings for grouping, e.g. a key that is named the same way and has the same type. Note, this can be achieved with scripts or runtime fields, however if you want to run in it continuously at scale, it is advised to use proper ordinary mappings.
For the group_by use terms on the common key.
For the aggregation part you can choose between all kinds of aggregations. Simple numeric fields can use e.g. min or max. For more complicated cases you have to use a script again, some examples are given in this post:
enrich vs. transform
If you wonder weather enrich or transform is the better choice:
Use enrich if 1 source is static or almost static, for example you want to expand an id to a name.
Use transform if all sources are dynamic and can arrive in different order. E.g. for distributed tracing in a larger system you can't say if the database lookup happens 1st or the access to the web server, or it might be interleaved. In such a case - and it sounds like yours - it is better to use transform.
LBNL I assume you want to join the data not just once. For this, have a look at the continuous mode in transform.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.