Hi,
Now I plan to create new index every day, index name could be yyyymmdd. So
there could be 30 indexes n a month, which mechnisam should be use to
search across indexes For instance, I need to search "a" in first 10 days
indexes of a month.
use _all to search all indexes, and in the query string to specify the
date range. will it take more time to look up all the indexes?
search multiple indexes in url, like http://ip:9200/20130701,20130702,.... , in this way, it may be not easy to
present all the index name one by one?
create alias for the days, but the search range is not static, so not
easy to create the exact alias.
What's the optimized way to handle such kind of search?
Depending on the time-frame you are searching on, you can select multiple
indices as you mentioned in your point number 2. That's how it's done by
default in logstash/Kibana.
First method that you mentioned will be horribly inefficient and kind of
defeats the purpose of using rolling indices.
Hi,
Now I plan to create new index every day, index name could be yyyymmdd. So
there could be 30 indexes n a month, which mechnisam should be use to
search across indexes For instance, I need to search "a" in first 10 days
indexes of a month.
use _all to search all indexes, and in the query string to specify the
date range. will it take more time to look up all the indexes?
search multiple indexes in url, like http://ip:9200/20130701,20130702,.... , in this way, it may be not easy
to present all the index name one by one?
create alias for the days, but the search range is not static, so not
easy to create the exact alias.
What's the optimized way to handle such kind of search?
If the index is too much, like 60 days, will be the performance impacted?
to search such a lot indexes.
Is there any better way to search in multiple days?
For example, I need to search from date 2013-03-12T15:23:11 to date
2013-05-23T13:23:11, should I add all the indexes into search url like: http://ip:9200/20130312,20130313....20130523/_search
And then in the JSON body, I need to set like this:
{
"query":{
"query_string":{
"query":"$1"
}
},
"filter":{
"range":{
"LogDate":{
"from":"2013-03-12T15:25:10",
"to":"2013-03-12T15:25:13"
}
}
}
}
Basically, your aim should be to reduce the number of shards you're
executing your query on. That's the reason, executing your query on all
indices will be a bad thing to do as you'll be executing your query on the
shards on which you know that the data surely doesn't exist. If you're
indices are daily indices, it's a perfectly fine way of doing it.
For example, I need to search from date 2013-03-12T15:23:11 to date
2013-05-23T13:23:11, should I add all the indexes into search url like: http://ip:9200/20130312,20130313....20130523/_search
And then in the JSON body, I need to set like this:
{
"query":{
"query_string":{
"query":"$1"
}
},
"filter":{
"range":{
"LogDate":{
"from":"2013-03-12T15:25:10",
"to":"2013-03-12T15:25:13"
}
}
}
}
An index by default consists of 5 shards and one replica. Shard is a way of
dividing the data into multiple independent units. (Each shard is a lucene
index). Shards play a important role in horizontal scaling as it divides
your index data across multiple nodes. Keeping more shards in the beginning
gives you the flexibility of adding new nodes when required as number of
shards cannot be changed for a index once it's created. (Actually, that's
not entirely true as you could also create one more index and create a
common alias for both index & they'll both be equivalent).
When I said that you should try to execute your queries on least number of
shards, in a way I meant indices only. I said shards because it's the
actual thing that stores the indexed data (a lucene index) and hence being
more explicit.
Having too many shards on less number of nodes is also pointless as it
creates a overhead to maintain too many lucene indices on lesser number of
nodes.
So to specify the shard which index operation should be taken is benifit in
multiple nodes, but for single node, it's default using 5 shards. And it's
not neccessary to specify the shard?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.