Hi all,
I need to ingest a large volume of records from one data source to another and one topic that I am currently debating is regarding the usage of Nested Field
with Arrays or using Multi match and search across multiple fields
For example, given the following raw document:
{
"addresses" : {
"home": {
"city" : "Miami",
"zip" : "33314"
},
"work": {
"city" : "New York",
"zip" : "33314"
}
}
We have the option to ingest that document as is or to transform the document first and convert it to an array :
{
"addresses" : [
{
"city" : "Miami",
"zip" : "33314",
"type" : "home"
},
{
"city" : "New York",
"zip" : "33314",
"type" : "work"
},
]
}
In terms of requirements, the typical search will include one or more fields as well as the type in some instances. For example for the input New York && 33313
. We should not match 33313 && Miami
Based on the understanding above, if we ingest the document as is, we would need to use a multi-match
query using two separate boolean clauses, and if we store it using arrays, we would need to use nested queries.
Based on various warnings from the Nested documentation, it seems to me that the multi-match option would be better, but I am looking for a second opinion.
In case it helps:
- I am not too worried about index performance(as long as it is reasonable)
- Skipping data transformations and indexing as is would be a plus but my goal is to prioritize search performance whenever possible.
- The final document will be presented using the original format after retrieving it (one more reason to store it as is)
Any help would be appreciated. Thanks!