I'm currently using an Elasticsearch transform in order to "merge" similar documents. It works very well, however I've run into an issue attempting to concatenate fields that contain an array of objects.
For example,
If I have two documents in the form:
{
"_source": {
"name": "xxx",
"values": [
{
"label": "xxx-a",
"otherA": 123,
"otherB": "abc"
}
]
}
}
{
"_source": {
"name": "xxx",
"values": [
{
"label": "xxx-b",
"otherA": 456,
"otherB": "def"
}
]
}
}
With the following mappings:
{
"properties": {
"name": {
"type": "keyword"
},
"values": {
"properties": {
"label": {
"type": "keyword"
},
"otherA": {
"type": "integer"
},
"otherB": {
"type": "keyword"
}
}
}
}
}
My ideal result of the transform would be a single document in the form:
{
"name": "xxx",
"values": [
{
"label": "xxx-a",
"otherA": 123,
"otherB": "abc"
},
{
"label": "xxx-b",
"otherA": 456,
"otherB": "def"
}
]
}
My currrent transform pivot is:
{
"pivot": {
"group_by": {
"name": {
"terms": {
"field": "name"
}
}
},
"aggregations": {
"temporary.valuesLabel": {
"terms": {
"field": "values.label"
}
},
"temporary.valuesOtherA": {
"terms": {
"field": "values.otherA"
}
},
"temporary.valuesOtherB": {
"terms": {
"field": "values.otherB"
}
}
}
}
}
I use a pipeline with the following script to attempt to re-assemble the objects:
ctx.values = !ctx.temporary.valuesLabel.keySet().isEmpty() ? ctx.temporary.valuesLabel.keySet().stream().map(label -> ['label': label]).collect(Collectors.toList()) : [];
List valuesOtherA = !ctx.temporary.valuesOtherA.keySet().isEmpty() ? ctx.temporary.valuesOtherA.keySet().stream().collect(Collectors.toList()) : [];
List valuesOtherB = !ctx.temporary.valuesOtherB.keySet().isEmpty() ? ctx.temporary.valuesOtherB.keySet().stream().collect(Collectors.toList()) : [];
for (int i=0; i<ctx.values.length; i++) {
ctx.values[i].otherA = Int.parseInt(valuesOtherA.get(i));
ctx.values[i].otherB = valuesOtherB.get(i);
}
This almost works, except for the issue that when I do the individual terms aggs for each field in the values
array, the terms
agg can result with "misaligned" buckets, resulting in "mixed" objects".
Is there a way to have all the bucket indices get a fixed ordering? Is there a better way to achieve what I am trying to do?
Thank you for your time.