What is better - create several document types or several indices?

Konstantin_Erman · September 11, 2014, 2:09am

We use Elasticsearch to aggregate several types of logs - web server logs,
application logs, windows event logs, statistics, etc.

As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in query

specify several indices in Kibana settings;

Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;
Do more or less nothing, push all documents together without
distinguishing them explicitly;

My question is - what are advantages and disadvantages of each approach? We
have substantial amount of logs going in every second, but querying is
rather rare, at least so far.

Thank you!
Konstantin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

vineeth_mohan_2 · September 11, 2014, 4:26am

Hello ,

My advice would be to keep all the logs in a single index , but apply index
tailing.
That is write logs of a day or hour ( depending upon traffic) to each index
like logstash does.
So name of the index would be of format logs-yyyy-MM-dd
This way , you wont be stuck with the fixed shard problem and dynamic
horizontal scaling can be achieved.
Also , it would be a wise idea to remove old logs using TTL facility OR
closing old index or even take a snapshot and remove the index.

TTL -

Index Close -

Thanks
Vineeth

On Thu, Sep 11, 2014 at 7:39 AM, Konstantin Erman konste@gmail.com wrote:

We use Elasticsearch to aggregate several types of logs - web server logs,
application logs, windows event logs, statistics, etc.

As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in query

specify several indices in Kibana settings;

Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;

Do more or less nothing, push all documents together without
distinguishing them explicitly;

My question is - what are advantages and disadvantages of each approach?
We have substantial amount of logs going in every second, but querying is
rather rare, at least so far.

Thank you!
Konstantin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kmDEJ%2BmhfX8RtGm9KAiBKEK%3DT1-1r3kj7pCnNNwMY-PA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Konstantin_Erman · September 11, 2014, 12:57pm

Vineeth, thank you for your advice!

It seems we already do everything as you said. We name indices in Logstash
style with the date. Is that what you are referring to as tailing?
Creating an index per hour would lead to hundreds of indices open. I wonder
what are the guidelines regarding the number of indices vs their size?
We also close less interesting logs and in a couple weeks delete them.

BUT STILL, with all that in place my original question still stands:
different document types in the same index or rather different indices for
different document types. What are the rules of thumb?

Konstantin

On Wednesday, September 10, 2014 9:27:05 PM UTC-7, vineeth mohan wrote:

Hello ,

My advice would be to keep all the logs in a single index , but apply
index tailing.
That is write logs of a day or hour ( depending upon traffic) to each
index like logstash does.
So name of the index would be of format logs-yyyy-MM-dd
This way , you wont be stuck with the fixed shard problem and dynamic
horizontal scaling can be achieved.
Also , it would be a wise idea to remove old logs using TTL facility OR
closing old index or even take a snapshot and remove the index.

TTL -
Elasticsearch Platform — Find real-time answers at scale | Elastic
Index Close -
Elasticsearch Platform — Find real-time answers at scale | Elastic

Thanks
Vineeth

On Thu, Sep 11, 2014 at 7:39 AM, Konstantin Erman <kon...@gmail.com
<javascript:>> wrote:

We use Elasticsearch to aggregate several types of logs - web server
logs, application logs, windows event logs, statistics, etc.

As far as I understand I can do one of the following:
1, Send each log to its own index and when I need to combine them in
query - specify several indices in Kibana settings;
2. Send all logs to the same index (we turn them over every day) and give
logs from different sources different document types;
3. Do more or less nothing, push all documents together without
distinguishing them explicitly;

My question is - what are advantages and disadvantages of each approach?
We have substantial amount of logs going in every second, but querying is
rather rare, at least so far.

Thank you!
Konstantin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e41e4959-6a45-417a-8ba6-856abcd33350%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5df677a5-46d9-4ecd-9bb9-a82f7897753b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

smonasco_2 · September 11, 2014, 1:21pm

Every index has a minimum of one shard. Multiple types can live in the same shard. Shards both have maintenance overheads and slow down queries. However, if you have a lot of targeted queries you can more easily reduce the shards accessed by reducing indexes than you could if you had multi-tenancy. I could be missing something but I don't think you can have multiple routing values in a query, but someone may want to query multiple log types.

So it depends.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ed078e55-b9e3-4c82-8285-a08ba5f90e21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Best practices for indexing log data Logstash	5	20566	September 27, 2017
More indices vs. more types Elasticsearch	8	612	May 21, 2012
How to structure multi site documents in elasticsearch? Elasticsearch	4	648	November 2, 2015
Do not use lot of types per index? Elasticsearch	9	1044	October 4, 2016
Help me understand the use case for indices Kibana	5	1219	March 9, 2017

What is better - create several document types or several indices?

Related topics