Impact of JSON modeling on search performance

Assume I have the following JSON document with an array of objects.

{
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
]
}

What is the impact if I add an extra object "number" between the array and its objects and change this document to:

{
"phoneNumbers": {
"number": [
{
"type": "home",
"value": "212 555-1234"
},
{
"type": "office",
"value": "646 555-4567"
},
{
"type": "mobile",
"value": "123 456-7890"
}
]
}
}

How much does the addition of extra levels of JSON objects have a cost in terms of search performance? The less you store, the better. But I wonder what the impact is if we add extra levels of depth.

Hi,

this question is way too broad to answer and depends on too many factors. Some questions for you to ponder: What's your Elasticsearch version? How do your queries look like? How do your access patterns look like (for starters: do you have bulk indexing run in parallel)? What query throughput are you targeting (latency is not independent of throughput, google for Little's law)? What hardware do you use? What OS? How is your cluster configured? What's your network topology? And tons of other questions too.

If you really bother you could write a benchmark. We've just released Rally to help you with that (see also the Rally release blog post. But I guess this change has so little impact, it will completely vanish in the (unavoidable) measurement noise. I'd also urge you to be very careful with your benchmarking methodology (silent machine, silent network, multiple trial runs, check for accidental bottlenecks, etc.).

Daniel