To multiple index or not to multiple index...that is the question

kopacko · January 26, 2017, 3:01pm

I currently have the following:

syslog data coming in from network devices.
syslog data coming from applications
syslog data coming in from unix servers
syslog data coming in from ESX hosts
winlogbeat data from windows event logs

Currently all these sources get collated into ONE index.

I am looking into running Packetbeat on domain controllers for DNS logging as well as a couple other beats for metrics, etc.

My question is, in order to view them easily in Kibana, I had put them all in the same index. Is there any value/performance increase in splitting these out into their own indexes?

JKhondhu · January 26, 2017, 5:00pm

It depends.
How often does this one index rotate?
What length of queries run upon these indices - Do they span for one week, one month?

kopacko · January 26, 2017, 5:33pm

Currently, they rotate daily.

Here are some stats that may answer your questions in order to guide me better. Also, keep in mind, I just rebuilt this cluster from scratch. So it only has a few days worth of data, but the retention period is about 20 days. Eventually I will get more space to increase the retention, but this all I have at the moment.

{
  "cluster_name" : "########",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 34,
  "active_shards" : 41,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

master name         version role disk.avail heap.max ram.max ram.current cpu uptime jdk
-      DATANODE-01   5.1.2   di       1.6tb   11.9gb  15.6gb      15.5gb  31   1.7d 1.8.0_121
-      DATANODE-02   5.1.2   di       1.6tb   11.9gb  15.6gb      15.5gb  46   1.7d 1.8.0_121
-      DATANODE-03   5.1.2   di       1.6tb   11.9gb  15.6gb      15.2gb  51   1.7d 1.8.0_121
-      DATANODE-04   5.1.2   di       1.6tb   11.9gb  15.6gb      15.4gb  67   1.7d 1.8.0_121
-      KBNANODE-01   5.1.2   -       38.9gb   11.9gb  15.6gb      14.3gb   1   1.7d 1.8.0_121
-      MSTRNODE-01   5.1.2   mi      39.6gb   11.9gb  15.6gb      13.5gb   2   1.7d 1.8.0_121
*      MSTRNODE-02   5.1.2   mi      39.6gb   11.9gb  15.6gb      13.5gb   0   1.7d 1.8.0_121
-      MSTRNODE-03   5.1.2   mi      39.6gb   11.9gb  15.6gb      13.5gb   0   1.7d 1.8.0_121

health status index                          
green  open   logstash-2017.01.23
green  open   logstash-2017.01.24
green  open   logstash-2017.01.25
green  open   logstash-2017.01.26

index                           shard prirep state          ip      node
logstash-2017.01.23             2     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.23             1     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.23             3     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.23             0     p      STARTED        x.x.x.x DATANODE-04
logstash-2017.01.24             2     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.24             1     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.24             3     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.24             0     p      STARTED        x.x.x.x DATANODE-04
logstash-2017.01.25             2     p      STARTED        x.x.x.x DATANODE-04
logstash-2017.01.25             1     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.25             3     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.25             0     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.26             2     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.26             1     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.26             3     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.26             0     p      STARTED        x.x.x.x DATANODE-04

shards disk.indices disk.used disk.avail disk.total disk.percent host    ip      node
    10      225.7gb   328.9gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-01
    10      225.6gb   328.4gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-02
    11      227.1gb   330.3gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-03
    10      227.7gb   330.5gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-04

master name            indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
-      DATANODE-01               58864840                      0                     0                     0
-      DATANODE-02               58791438                      0                     0                     0
-      DATANODE-03               58715893                      0                    38                     0
-      DATANODE-04               58784564                      0                     0                     0
-      KBNANODE-01                      0                      0                     0                     0
-      MSTRNODE-01                      0                      0                     0                     0
*      MSTRNODE-02                      0                      0                     0                     0
-      MSTRNODE-03                      0                      0                     0                     0

warkolm · January 26, 2017, 7:42pm

You wouldn't put all that into a single DB table would you? So with that logic it makes sense to split stuff out, keep it hygienic, prevents mapping explosions, allows custom retention per source.

kopacko · January 26, 2017, 9:11pm

Hahah, well, you are talking to someone who knows just above 0.00001% of what he is doing or databases in general.

But I will split them out.

In terms of Kibana, is there an easy way to search multiple indexes in the Discover tab?

warkolm · January 26, 2017, 9:26pm

Only what is in index patterns, so having a similar naming structure, eg logstash-$type-$date, will help.

kopacko · January 26, 2017, 10:17pm

So like:

syslog data coming in from network devices

logstash-network-$date

syslog data coming from applications

logstash-application-$date

syslog data coming in from unix servers

logstash-unix-$date

syslog data coming in from ESX hosts

logstash-esx-$date

winlogbeat data from windows event logs

logstash-winlogbeat-$date

?

For the most part, 99% of all the stored data is system logs, connection logs, event logs, etc. The reason I have them all in the same index was so I could punch in an IP address and see all the sources that referenced it.

warkolm · January 26, 2017, 10:35pm

Yep like that.
Then if you use a logstash-* index pattern in KB, you can still see all the data, but split things out.

kopacko · January 26, 2017, 10:39pm

Awesome sauce!!!!! I can't wait to give this a try.

In terms of performance, do you think it will be a boost or a negative?

anhlqn · January 27, 2017, 10:08pm

With this pattern that includes all indexes of different types, when we filter a type in Kibana, will only the indexes that has this type be queried or will all indexes be queried?

Also, will Kibana complain about data type conflict if the same field of different types in different indexes have different data type? e.g,

logstash-index1 > type1 > my_field (string)
logstash-index2 > type2 > my_field (long)

kopacko · January 31, 2017, 4:31pm

So,
I have one data center cluster using multiple indexes and one cluster using a single index.

From what I can tell, the multi-index cluster seems to be missing some logs (not many) and is definitely SLOWER compared to the single index cluster.

system · February 28, 2017, 4:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Syslog index Elasticsearch	2	1815	July 5, 2017
Clarification on indexes and Syslog + NETFlow Kibana	4	1851	July 6, 2017
Should I run multiple indexes? Elasticsearch	7	1796	July 5, 2017
One index vs multiple indexes? Elasticsearch	7	4986	February 26, 2019
Multiple indexes or a single index Elasticsearch	2	356	November 7, 2018

To multiple index or not to multiple index...that is the question

Related topics