We use Elasticsearch to aggregate several types of logs - web server logs,
application logs, windows event logs, statistics, etc.
As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in query
specify several indices in Kibana settings;
Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;
Do more or less nothing, push all documents together without
distinguishing them explicitly;
My question is - what are advantages and disadvantages of each approach? We
have substantial amount of logs going in every second, but querying is
rather rare, at least so far.
My advice would be to keep all the logs in a single index , but apply index
tailing.
That is write logs of a day or hour ( depending upon traffic) to each index
like logstash does.
So name of the index would be of format logs-yyyy-MM-dd
This way , you wont be stuck with the fixed shard problem and dynamic
horizontal scaling can be achieved.
Also , it would be a wise idea to remove old logs using TTL facility OR
closing old index or even take a snapshot and remove the index.
TTL -
Index Close -
Thanks
Vineeth
On Thu, Sep 11, 2014 at 7:39 AM, Konstantin Erman konste@gmail.com wrote:
We use Elasticsearch to aggregate several types of logs - web server logs,
application logs, windows event logs, statistics, etc.
As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in query
specify several indices in Kibana settings;
Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;
Do more or less nothing, push all documents together without
distinguishing them explicitly;
My question is - what are advantages and disadvantages of each approach?
We have substantial amount of logs going in every second, but querying is
rather rare, at least so far.
It seems we already do everything as you said. We name indices in Logstash
style with the date. Is that what you are referring to as tailing?
Creating an index per hour would lead to hundreds of indices open. I wonder
what are the guidelines regarding the number of indices vs their size?
We also close less interesting logs and in a couple weeks delete them.
BUT STILL, with all that in place my original question still stands:
different document types in the same index or rather different indices for
different document types. What are the rules of thumb?
Konstantin
On Wednesday, September 10, 2014 9:27:05 PM UTC-7, vineeth mohan wrote:
Hello ,
My advice would be to keep all the logs in a single index , but apply
index tailing.
That is write logs of a day or hour ( depending upon traffic) to each
index like logstash does.
So name of the index would be of format logs-yyyy-MM-dd
This way , you wont be stuck with the fixed shard problem and dynamic
horizontal scaling can be achieved.
Also , it would be a wise idea to remove old logs using TTL facility OR
closing old index or even take a snapshot and remove the index.
On Thu, Sep 11, 2014 at 7:39 AM, Konstantin Erman <kon...@gmail.com
<javascript:>> wrote:
We use Elasticsearch to aggregate several types of logs - web server
logs, application logs, windows event logs, statistics, etc.
As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in
query - specify several indices in Kibana settings;
2. Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;
3. Do more or less nothing, push all documents together without
distinguishing them explicitly;
My question is - what are advantages and disadvantages of each approach?
We have substantial amount of logs going in every second, but querying is
rather rare, at least so far.
Every index has a minimum of one shard. Multiple types can live in the same shard. Shards both have maintenance overheads and slow down queries. However, if you have a lot of targeted queries you can more easily reduce the shards accessed by reducing indexes than you could if you had multi-tenancy. I could be missing something but I don't think you can have multiple routing values in a query, but someone may want to query multiple log types.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.