I am indexing academic papers that have a list of authors. Right now, I index the authors as a nested field type, because we want to query within the list, such as author_sequence_number: 1 and author_display_name: torrez. We also want to be able to search the author name field so that is set as field type text. What I am realizing is some aggregations on this author data is slow due to nested aggregations.
My idea is to index the author data in two ways, with the second being the flattened data type. That way I have flexibility to aggregate and query off of the flattened version when I do not need to combine fields within a single author. It seems like this would be faster, with a tradeoff in storage space to hold the extra field. Has anybody done something like this and did it work well? Here is an example of my data:
{
"fields": {
"year": [
2019
],
"authors": [{
"author_display_name.keyword": [
"jiar kan"
],
"author_id": [
"209777"
],
"author_display_name": [
"Jiar Kan"
],
"author_sequence_number": [
1
]
},
{
"author_display_name.keyword": [
"hui jian"
],
"author_id": [
"302777"
],
"author_display_name": [
"Hui Jian"
],
"author_sequence_number": [
2
]
}
],
"journal.title": [
"Journal of power electronics"
],
"work_id": [
"3021658105"
],
"work_title": [
"Comparison of Three Active-Frequency-Drift Islanding Detection Methods for Single-Phase Grid-Connected Inverters"
]
}
}