1 and 2 - It'd probably be easiest to try this yourself
3 - not really, you should look into routing.
4 - only the index metadata is stored in memory. However doing aggregations
will pull the applicable data into memory.
5 - not sure.
As a general rule is it better to have one horse size index or a hundred
duck sized indices. I am thinking about those types of searches where you
might frequently search a subset of the data. For example keeping a
separate index for every customer because normally the app restricts itself
to only dealing with one customer at a time. Perhaps doing a compound
split based on customer and year if your searches rarely go outside of the
current year.
Thanks.
On Wednesday, April 23, 2014 4:07:58 PM UTC+12, Mark Walkom wrote:
1 and 2 - It'd probably be easiest to try this yourself
3 - not really, you should look into routing.
4 - only the index metadata is stored in memory. However doing
aggregations will pull the applicable data into memory.
5 - not sure.
It depends - on your data set, your queries, your cluster specs.Having tens
to hundreds of thousands (or millions) of indexes will have a performance
impact that will only increase with numbers, so the lower you can keep it
though planning the better. But to counter that, the bigger your indexes,
the longer it will take to query and you have a reduced agility to
manipulate said indexes
Which is why the answer to a lot of this sort of thing is - it depends.
As an example of planning, it might be better to think ahead if you are
aiming for such large sizes and give your app the ability to talk to
multiple clusters, which will allow you to move customers into high/low
performance/capacity clusters.
As a general rule is it better to have one horse size index or a hundred
duck sized indices. I am thinking about those types of searches where you
might frequently search a subset of the data. For example keeping a
separate index for every customer because normally the app restricts itself
to only dealing with one customer at a time. Perhaps doing a compound
split based on customer and year if your searches rarely go outside of the
current year.
Thanks.
On Wednesday, April 23, 2014 4:07:58 PM UTC+12, Mark Walkom wrote:
1 and 2 - It'd probably be easiest to try this yourself
3 - not really, you should look into routing.
4 - only the index metadata is stored in memory. However doing
aggregations will pull the applicable data into memory.
5 - not sure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.