Scalability search


(CHI WAI HO) #1

Provided that I have a setup to create new index to record all server
logs on EC weekly. Each index are configured to have 2 replicas and 8
shards. In order to do a search for the past 4 weeks (1 month), I can
issue a REST to EC like http://localhost:9200/2011-49,2011-50,2011-51,2011-52/_search?q=servername:tomcat01.
How aobut if I want to search over a server for the past 52 weeks
EFFECTIVELY? I know that I can issue
http://localhost:9200/_all/_search?q=servername:tomcat01&alarmlevel:critical;
however, since the number of weeks (i.e. 52) times number of shards
(i.e. 8) is huge, will EC handle such kind of search effectively? I
am afraid that there will but a large number of opened file, etc,
which consume lots of memory and return the search result slowly.

Is there any best practice for search such kind of rotating index?


(Shay Banon) #2

There will be many shards to execute the request on, but yes, they will be
executed "effectively" (as far as I understand for your definition for
effectively). My question is if you really need 8 shards per week...

On Wed, Nov 30, 2011 at 12:46 PM, CHI WAI HO zeroho@gmail.com wrote:

Provided that I have a setup to create new index to record all server
logs on EC weekly. Each index are configured to have 2 replicas and 8
shards. In order to do a search for the past 4 weeks (1 month), I can
issue a REST to EC like
http://localhost:9200/2011-49,2011-50,2011-51,2011-52/_search?q=servername:tomcat01
.
How aobut if I want to search over a server for the past 52 weeks
EFFECTIVELY? I know that I can issue

http://localhost:9200/_all/_search?q=servername:tomcat01&alarmlevel:critical
;
however, since the number of weeks (i.e. 52) times number of shards
(i.e. 8) is huge, will EC handle such kind of search effectively? I
am afraid that there will but a large number of opened file, etc,
which consume lots of memory and return the search result slowly.

Is there any best practice for search such kind of rotating index?


(CHI WAI HO) #3

Since there is suggestion to keep each index/shard at 500K records and
there was currently aorund 800K that will be imported daily, that
comes up with 8 shards (1 index per week) as calculated. I am not
sure whether the assumption / estimation is correct; just would like
to see if anyone here hit the similar challenge as mentioned.

Do you have any idea how EC handle search over a large number of
index? Are there any real cases for reference? How EC performs over a
seraching across 50+ index and each index contains around 500K docs.

Thanks a lot ^.^

On Nov 30, 11:13 pm, Shay Banon kim...@gmail.com wrote:

There will be many shards to execute the request on, but yes, they will be
executed "effectively" (as far as I understand for your definition for
effectively). My question is if you really need 8 shards per week...

On Wed, Nov 30, 2011 at 12:46 PM, CHI WAI HO zer...@gmail.com wrote:

Provided that I have a setup to create new index to record all server
logs on EC weekly. Each index are configured to have 2 replicas and 8
shards. In order to do a search for the past 4 weeks (1 month), I can
issue a REST to EC like
http://localhost:9200/2011-49,2011-50,2011-51,2011-52/_search?q=serve...
.
How aobut if I want to search over a server for the past 52 weeks
EFFECTIVELY? I know that I can issue

http://localhost:9200/_all/_search?q=servername:tomcat01&alarmlevel:c...
;
however, since the number of weeks (i.e. 52) times number of shards
(i.e. 8) is huge, will EC handle such kind of search effectively? I
am afraid that there will but a large number of opened file, etc,
which consume lots of memory and return the search result slowly.

Is there any best practice for search such kind of rotating index?- Hide quoted text -

  • Show quoted text -

(Shay Banon) #4

How did you came up with 500k records per shard? The number if records a
shard can have depends on a lot of factors, including what/size of data
indexed, number of fields, nodes HW that its going to be allocate on,
memory.

Regarding how elasticsearch handles searching across 50+ indices, it will
handle it just fine, but, if you don't have enough cacpacity for it, it
means executing a lot of "per shard" search on machines, which can overload
them and make the search slower.

On Thu, Dec 1, 2011 at 4:31 AM, CHI WAI HO zeroho@gmail.com wrote:

Since there is suggestion to keep each index/shard at 500K records and
there was currently aorund 800K that will be imported daily, that
comes up with 8 shards (1 index per week) as calculated. I am not
sure whether the assumption / estimation is correct; just would like
to see if anyone here hit the similar challenge as mentioned.

Do you have any idea how EC handle search over a large number of
index? Are there any real cases for reference? How EC performs over a
seraching across 50+ index and each index contains around 500K docs.

Thanks a lot ^.^

On Nov 30, 11:13 pm, Shay Banon kim...@gmail.com wrote:

There will be many shards to execute the request on, but yes, they will
be
executed "effectively" (as far as I understand for your definition for
effectively). My question is if you really need 8 shards per week...

On Wed, Nov 30, 2011 at 12:46 PM, CHI WAI HO zer...@gmail.com wrote:

Provided that I have a setup to create new index to record all server
logs on EC weekly. Each index are configured to have 2 replicas and 8
shards. In order to do a search for the past 4 weeks (1 month), I can
issue a REST to EC like
http://localhost:9200/2011-49,2011-50,2011-51,2011-52/_search?q=serve.
..

.
How aobut if I want to search over a server for the past 52 weeks
EFFECTIVELY? I know that I can issue

http://localhost:9200/_all/_search?q=servername:tomcat01&alarmlevel:c.
..

;
however, since the number of weeks (i.e. 52) times number of shards
(i.e. 8) is huge, will EC handle such kind of search effectively? I
am afraid that there will but a large number of opened file, etc,
which consume lots of memory and return the search result slowly.

Is there any best practice for search such kind of rotating index?-
Hide quoted text -

  • Show quoted text -

(system) #5