Geo_bounding_box on 10 billion objects

Hello,

How much resources would it take, and what topology/config would I need to
use, to provide for this, if at all possible:

Index 10 billion simple objects with the following schema, and provide
subsecond geo_bounding_box search:

item:
location: (lon, lat)
color: small_string

I have been doing some tests and I can easily get sub second searches with
10 million objects (am only interested in the hits number BTW), on single
instance, 32GB RAM, 2 vCPUs, and elasticsearch started with options -Xmx30g
-Xms30g

How many nodes, how much RAM each, would you expect to need to reach such a
goal? What would I need to learn and tweak or change from elasticsearch
default settings for sharding, indexing and searching?

Or is elasticsearch the wrong tool for such a task? would mongodb be more
appropriate? given the size of the data set?

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hard to tell on how many nodes you will need to run it. Geo bounding box is CPU heavy, the more nodes you have, the better your search performance will be (and make sure you have enough shards).

Side note: We don't recommend setting the ES_HEAP_SIZE to 30gb on a 32gb machine, you want to keep a bit for the OS file system cache. Since geo bounding box requires loading the lat/lon fields to be loaded to memory, you will need memory for it, but I wouldn't pass the 22gb mark on a 32gb box.

On Feb 11, 2013, at 9:41 PM, Mohamed Lrhazi ml623@georgetown.edu wrote:

Hello,

How much resources would it take, and what topology/config would I need to use, to provide for this, if at all possible:

Index 10 billion simple objects with the following schema, and provide subsecond geo_bounding_box search:

item:
location: (lon, lat)
color: small_string

I have been doing some tests and I can easily get sub second searches with 10 million objects (am only interested in the hits number BTW), on single instance, 32GB RAM, 2 vCPUs, and elasticsearch started with options -Xmx30g -Xms30g

How many nodes, how much RAM each, would you expect to need to reach such a goal? What would I need to learn and tweak or change from elasticsearch default settings for sharding, indexing and searching?

Or is elasticsearch the wrong tool for such a task? would mongodb be more appropriate? given the size of the data set?

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks a lot Kimchy. this might require more nodes than I could afford for
now :slight_smile:

On Tuesday, February 12, 2013 5:52:15 PM UTC-5, kimchy wrote:

Hard to tell on how many nodes you will need to run it. Geo bounding box
is CPU heavy, the more nodes you have, the better your search performance
will be (and make sure you have enough shards).

Side note: We don't recommend setting the ES_HEAP_SIZE to 30gb on a 32gb
machine, you want to keep a bit for the OS file system cache. Since geo
bounding box requires loading the lat/lon fields to be loaded to memory,
you will need memory for it, but I wouldn't pass the 22gb mark on a 32gb
box.

On Feb 11, 2013, at 9:41 PM, Mohamed Lrhazi <ml...@georgetown.edu<javascript:>>
wrote:

Hello,

How much resources would it take, and what topology/config would I need to
use, to provide for this, if at all possible:

Index 10 billion simple objects with the following schema, and provide
subsecond geo_bounding_box search:

item:
location: (lon, lat)
color: small_string

I have been doing some tests and I can easily get sub second searches with
10 million objects (am only interested in the hits number BTW), on single
instance, 32GB RAM, 2 vCPUs, and elasticsearch started with options -Xmx30g
-Xms30g

How many nodes, how much RAM each, would you expect to need to reach such
a goal? What would I need to learn and tweak or change from elasticsearch
default settings for sharding, indexing and searching?

Or is elasticsearch the wrong tool for such a task? would mongodb be more
appropriate? given the size of the data set?

Thanks a lot,
Mohamed.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.