So, my question is how can we improve it? Add more machines? If we add
machines, how can we estimate how many machines we need? Is there possible
to improve just for these 3 machines?
Each index has 5 shards and each shard has 2 replica. There are 670 indices
now. So there are 670 days' data. But the data for each index is not even.
Small one only has 100KB(old date). Big one may has more than 1 GB(recent
date).
curl 'localhost:9200/_cat/indices/?v'
health index pri rep docs.count docs.deleted store.size
pri.store.size
green 2014-03-22 5 1 1205 11 19.7mb
9.8mb
green 2013-11-08 5 2 58 0 4.8mb
1.6mb
green 2014-09-05 5 2 107055 5 1.3gb
473.7mb
We crawl different type social media data. Then accroding to the data's
created date write into correspondent index. Different social media type
will use different type, so there are several types per index.
We use kibana to present data. By default, It will search latest one
month's data. And I notice the search queue.size(current is 1000) is not
enough.
Each index has 5 shards and each shard has 2 replica. There are 670 indices
now. So there are 670 days' data. But the data for each index is not even.
Small one only has 100KB(old date). Big one may has more than 1 GB(recent
date).
curl 'localhost:9200/_cat/indices/?v'
health index pri rep docs.count docs.deleted store.size
pri.store.size
green 2014-03-22 5 1 1205 11 19.7mb
9.8mb
green 2013-11-08 5 2 58 0 4.8mb
1.6mb
green 2014-09-05 5 2 107055 5 1.3gb
473.7mb
We crawl different type social media data. Then accroding to the data's
created date write into correspondent index. Different social media type
will use different type, so there are several types per index.
We use kibana to present data. By default, It will search latest one
month's data. And I notice the search queue.size(current is 1000) is not
enough.
在 2014年11月4日星期二UTC+8下午2时57分29秒,Telax写道:
Hi Sonic,
How many days worth of data do you have in your system currently and how
many shards are created per daily index?
Is it just one index you create each day or do you have many for different
data types?
When you search - do you search over all indices at once? What does a
common search look like for you?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.