For Efficient and High Performance Search of Logs

soodlikesjava · February 6, 2016, 11:18pm

Hi I have a scenario where I would have logs from 100 node horizontal scalable Cluster , out of below options which would be the better option to achieve efficient and high performance search ?

1)Creation of 1 Daily Index of logs with 5 primary Shards per node of cluster , Means 100 indices for 100 nodes .

2)Creation of Index of Logs from the subset of nodes like 1 daily index (5 primary shards ) per 10 nodes of Cluster , means in this case 10 daily Indices for 100 node cluster .
Logs would be searched from all the indices . Please suggest

magnusbaeck · February 7, 2016, 12:22am

The 100-node cluster you're talking about isn't your ES cluster, right? But rather the cluster whose logs you want to make searchable?

Any particular reason you want to put logs from each machine into an index of its own? The common practice is to store the hostname in a separate field and include that field in queries.

soodlikesjava · February 7, 2016, 6:47pm

That's correct 100 node cluster is not my ES cluster .I am already putting hostname in separate field to use it in queries.How should I index the logs from 100 nodes? How should I create index buckets for scalable search in future .index would be daily index

magnusbaeck · February 7, 2016, 8:57pm

With daily indexes each index will have a bounded size, but you may want to have a shard count > 1 to keep the total size of each index down (shards of up to a few tens of GB are generally okay). With a multi-node cluster that should also improve performance since multiple nodes can help out with queries affecting a particular day. On the other hand, queries that span more than one day will probably touch multiple nodes even if the shard count is 1.

The ideal number of shards depends on many factors, including the number of nodes in the ES cluster, the amount of data, and the query patterns. You may have to experiment.

soodlikesjava · February 7, 2016, 10:22pm

Thanks for the quick reply.I have a multi node cluster with dedicated eligible masters but One query I have is should I store logs from all 100 nodes into a single Daily index?

soodlikesjava · February 8, 2016, 4:34am

Hi Magnus , I would really appreciate If you can clarify the query of storing 100 Nodes logs into a Single Daily index or Small multiple Indices buckets like subset of 10 machines logs in 1 index ?

Christian_Dahlqvist · February 8, 2016, 7:37am

An index in Elasticsearch can handle large amounts of data, so storing data related to hundreds or thousands of servers/devices is generally not a problem at all. If the shards get too big (larger than a few tens of GB), you can increase the number of shards to handle more data. The same way you can reduce the number of shards from the default 5 if you have small volumes.

The decision to use multiple indices is generally based more on the nature of the data being stored. If you have different types of data that are never queried together and have very different structure and possibly mapping conflicts, storing this in multiple indices might make sense.

Each index and shard has a certain amount of overhead, so having lots of small indices/shards in generally inefficient.

soodlikesjava · February 8, 2016, 9:39am

Hi Christian thanks for your elaborative response, now it clarifies my doubt .

soodlikesjava · February 8, 2016, 9:40am

Thanks Magnus and Christian in spending some time to provide your inputs

soodlikesjava · February 8, 2016, 5:36pm

Hi Christian , I would be having 90-100 GB of logs per day but type of data will be same , so storing the 100 GB logs in 1 index with 10 shards will be fine to get the better search performance ?

soodlikesjava · February 10, 2016, 5:22am

Hi Elastic team, please let me know any suggestion on the query so that we can proceed further.

magnusbaeck · February 10, 2016, 6:36am

What results in the best query performance depends on a wide range of factors, but for time-series data like logs that you can't keep around forever the best practice is to use time-series indexes (e.g. daily indexes) and shard each index as necessary to keep the shard size manageable. Sharding the data per source host (e.g. one index per host) is not recommended, but as Christian said earlier it could make sense to shard per type (e.g. one index series for syslog and one for HTTP requests).

Topic		Replies	Views
How many shards in one node? Elasticsearch	4	2516	January 3, 2018
How to scale daily indexing Logstash	3	2337	July 6, 2017
Balance between number of indices and shards per index Elasticsearch	2	469	July 6, 2017
How should I configure ELK stack for saving logs everyday? Elasticsearch	2	387	August 21, 2018
Performance question about shards and replicas Elasticsearch	6	437	July 6, 2017

For Efficient and High Performance Search of Logs

Related topics