Query internet bandwidth usage?

allenmchan · October 20, 2015, 1:49am

Hi all,

My company is about to move to multi-datacenter implementation of Elasticsearch. Each datacenter will have a local cluster that stores the data. We will have tribe nodes (that kibana connects to) in 1 or 2 of the datacenters to combine data from multiple datacenters.

My network team is asking what kind of bandwidth utilization will querying across datacenter take up? Each of the datacenters will have multiple TBs of data. The network team is worried about someone running * query over a month of indices and ES data nodes trying to send that much data over the pipe.

I know there is a limit for recovery in "indices.recovery.max_bytes_per_sec". Does ES cap how much bytes per sec is used for querys?

Thanks for reading

warkolm · October 20, 2015, 4:47am

There's no limits you can apply to the network traffic that ES sends, it'll just pull over everything that it needs, which can be quite a bit.

allenmchan · October 20, 2015, 5:37am

Interesting. Thanks for that. Any chance we can add this as a feature request? I assume most people who use elasticsearch in multidatacenters do not have a dedicated pipe between the datacenters just for elasticsearch

warkolm · October 20, 2015, 5:38am

I don't think it'd work. ES has no concept of what may be higher priority traffic (eg cluster state changes), so you'd end up with nodes dropping out of the cluster when someone ran a large query.

That said, feel free to raise the concern on GH, the core team may have other ideas

Ivan · October 20, 2015, 6:01pm

In general, the amount of data passed between nodes is limited to the
number of documents selected to be returned. Read the chapter called
'Distributed Search Execution' in the official guide: [snip]

I would have posted a link, but the mailing list software does not allow
it. Sigh, please bring back Google Groups.

From the deep pagination block:

"Remember that each shard must build a priority queue of length from +
size, all of which need to be passed back to the coordinating node. And the
coordinating node needs to sort through number_of_shards * (from + size)
documents in order to find the correct size documents."

Basically there will be (number_of_shards * (from + size)) documents of
data plus overhead following on the network. The main limitation for most
cases is not the information contained in each query, but the number of
queries.

AFAIK, there is not throttling of query bytes sent.

Cheers,

Ivan

dadoonet · October 20, 2015, 6:52pm

Strange. Sounds like the spam filter found that you are a new user! So it forbids to post a first post with a link... o_O

Obviously you're not a new user!

That should be fine now...

allenmchan · October 20, 2015, 7:06pm

Thanks Ivan. i just read through the doc you are referring to.

It seems like in the case of kibana, the documents returned are limited even if user queries * from 30 days of indices. So the results are sent over the wire 1000 docs at a time (depending on the kibana config) as the user scroll through the results.
Does that sound about right?

Ivan · October 21, 2015, 6:12am

Not sure what the defaults are in kibana, but that sounds correct.
Remember, the number of shards also affects the internode communication
since the coordinating node needs to sort the values.

Of course, your document size is also a factor. I have killed lightweight
app servers before that were given 1gb result responses that elasticsearch
handled with no issues.

Cheers,

Ivan

allenmchan · October 22, 2015, 6:30am

would a tiered tribe nodes architecture minimize the amount of traffic across the datacenter link?

For example each datacenter would have a dedicated tribe node and a "master" tribe node would talk to the datacenter tribe nodes.

A request from kibana would go to the master tribe node and it would propagate to the datacenter tribe nodes.

All the shards returned from the data nodes would be sorted by its datacenter tribe node and fully sorted document size would be sent over the datacenter link to the master tribe node.

Does this make sense or complicating it too much?

allenmchan · October 22, 2015, 6:37am

nevermind. looks like tiered tribe node architecture is not supported