Hi,
I'm trying to reindex an index of nested records into an index that is more flat and cleaner. Is there a way to handle painless scripts with using the root field name as my column and the nested value as its value?
My records can vary and have multiple nested fields, but I'm only interested in the "fieldname" to "fieldname.data.value" mappings.
This is in Elasticsearch 6.7
Example records that exist in the index
{
"_source" : {
"status": "complete",
"name" : {
"data" : {
"value" : "Jane Doe",
"gender" : "female"
}
},
"state" : {
"data" : {
"value" : "OH",
"prior": "N/A",
}
},
"school" : {
"data" : {
"value" : "wolf high school",
"graduated" " true
}
},
"region" : {
"data" : {
"value" : "IV",
"category" : "Division 4"
}
}
}
},
{
"_source" : {
"status": "complete",
"name" : {
"data" : {
"value" : "John Doe",
"gender" : "male"
}
},
"state" : {
"data" : {
"value" : "PA",
"prior" : "LA"
}
},
"school" : {
"data" : {
"value" : "timber high school",
"graduated" " true
}
},
"region" : {
"data" : {
"value" : "III",
"category" : "Division 3"
}
}
"schoolType" : {
"data" : {
"value" : "public",
"county" : False
}
}
}
}
What my goal is to reindex to a new index with the records being clean up like this:
{
"name" : "Jane Doe",
"state" : "OH",
"school" : "wolf high school",
"region" : "IV",
},
{
"name" : "John Doe",
"state" : "PA",
"school" : "timber high school",
"region" : "III",
"schoolType.data.value" : "Public"
}
My reindex looks currently looks like below. Please bear in mind, I'm just getting started with painless. So I don't know if there is a regex pattern or something better I can use.
There must be a cleaner way to handle this. Because, as the number of fields grow in the first index, it would be preferred not to have spell out each filed in the current script.
Is there a regex or a way to iterate over all the top fields, so I can perform the same call each field? I know I will always want the path fieldname.data.value from each field.
POST _reindex
{
"source": {
"index": "test-student-info",
"_source": {
"includes": [ "*.data.value" ],
"excludes": [ "*.sources" ]
},
"query" : {
"term" : { "status.keyword" : "complete" }
}
},
"dest": {
"index": "test-studen-info2"
},
"script": {
"inline": "ctx._source.name = ctx._source.name.data.value; ctx._source.state = ctx._source.state.data.value; ctx._source.school = ctx._source.school.data.value; ctx._source.region = ctx._source.region.data.value",
"lang": "painless"
}
}