What is better - create several document types or several indices?

We use Elasticsearch to aggregate several types of logs - web server logs,
application logs, windows event logs, statistics, etc.

As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in query

  • specify several indices in Kibana settings;
  1. Send all logs to the same index (we turn them over every day) and give
    logs from different sources different document types;
  2. Do more or less nothing, push all documents together without
    distinguishing them explicitly;

My question is - what are advantages and disadvantages of each approach? We
have substantial amount of logs going in every second, but querying is
rather rare, at least so far.

Thank you!
Konstantin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello ,

My advice would be to keep all the logs in a single index , but apply index
tailing.
That is write logs of a day or hour ( depending upon traffic) to each index
like logstash does.
So name of the index would be of format logs-yyyy-MM-dd
This way , you wont be stuck with the fixed shard problem and dynamic
horizontal scaling can be achieved.
Also , it would be a wise idea to remove old logs using TTL facility OR
closing old index or even take a snapshot and remove the index.

TTL -

Index Close -

Thanks
Vineeth

On Thu, Sep 11, 2014 at 7:39 AM, Konstantin Erman konste@gmail.com wrote:

We use Elasticsearch to aggregate several types of logs - web server logs,
application logs, windows event logs, statistics, etc.

As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in query

  • specify several indices in Kibana settings;
  1. Send all logs to the same index (we turn them over every day) and give
    logs from different sources different document types;
  2. Do more or less nothing, push all documents together without
    distinguishing them explicitly;

My question is - what are advantages and disadvantages of each approach?
We have substantial amount of logs going in every second, but querying is
rather rare, at least so far.

Thank you!
Konstantin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kmDEJ%2BmhfX8RtGm9KAiBKEK%3DT1-1r3kj7pCnNNwMY-PA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Vineeth, thank you for your advice!

It seems we already do everything as you said. We name indices in Logstash
style with the date. Is that what you are referring to as tailing?
Creating an index per hour would lead to hundreds of indices open. I wonder
what are the guidelines regarding the number of indices vs their size?
We also close less interesting logs and in a couple weeks delete them.

BUT STILL, with all that in place my original question still stands:
different document types in the same index or rather different indices for
different document types. What are the rules of thumb?

Konstantin

On Wednesday, September 10, 2014 9:27:05 PM UTC-7, vineeth mohan wrote:

Hello ,

My advice would be to keep all the logs in a single index , but apply
index tailing.
That is write logs of a day or hour ( depending upon traffic) to each
index like logstash does.
So name of the index would be of format logs-yyyy-MM-dd
This way , you wont be stuck with the fixed shard problem and dynamic
horizontal scaling can be achieved.
Also , it would be a wise idea to remove old logs using TTL facility OR
closing old index or even take a snapshot and remove the index.

TTL -
Elasticsearch Platform — Find real-time answers at scale | Elastic
Index Close -
Elasticsearch Platform — Find real-time answers at scale | Elastic

Thanks
Vineeth

On Thu, Sep 11, 2014 at 7:39 AM, Konstantin Erman <kon...@gmail.com
<javascript:>> wrote:

We use Elasticsearch to aggregate several types of logs - web server
logs, application logs, windows event logs, statistics, etc.

As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in
query - specify several indices in Kibana settings;
2. Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;
3. Do more or less nothing, push all documents together without
distinguishing them explicitly;

My question is - what are advantages and disadvantages of each approach?
We have substantial amount of logs going in every second, but querying is
rather rare, at least so far.

Thank you!
Konstantin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5df677a5-46d9-4ecd-9bb9-a82f7897753b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Every index has a minimum of one shard. Multiple types can live in the same shard. Shards both have maintenance overheads and slow down queries. However, if you have a lot of targeted queries you can more easily reduce the shards accessed by reducing indexes than you could if you had multi-tenancy. I could be missing something but I don't think you can have multiple routing values in a query, but someone may want to query multiple log types.

So it depends.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed078e55-b9e3-4c82-8285-a08ba5f90e21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.