To multiple index or not to multiple index...that is the question


(Jason Kopacko) #1

I currently have the following:

  • syslog data coming in from network devices.
  • syslog data coming from applications
  • syslog data coming in from unix servers
  • syslog data coming in from ESX hosts
  • winlogbeat data from windows event logs

Currently all these sources get collated into ONE index.

I am looking into running Packetbeat on domain controllers for DNS logging as well as a couple other beats for metrics, etc.

My question is, in order to view them easily in Kibana, I had put them all in the same index. Is there any value/performance increase in splitting these out into their own indexes?


(Jymit Singh Khondhu) #2

It depends.
How often does this one index rotate?
What length of queries run upon these indices - Do they span for one week, one month?


(Jason Kopacko) #3

Currently, they rotate daily.

Here are some stats that may answer your questions in order to guide me better. Also, keep in mind, I just rebuilt this cluster from scratch. So it only has a few days worth of data, but the retention period is about 20 days. Eventually I will get more space to increase the retention, but this all I have at the moment.

{
  "cluster_name" : "########",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 34,
  "active_shards" : 41,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

master name         version role disk.avail heap.max ram.max ram.current cpu uptime jdk
-      DATANODE-01   5.1.2   di       1.6tb   11.9gb  15.6gb      15.5gb  31   1.7d 1.8.0_121
-      DATANODE-02   5.1.2   di       1.6tb   11.9gb  15.6gb      15.5gb  46   1.7d 1.8.0_121
-      DATANODE-03   5.1.2   di       1.6tb   11.9gb  15.6gb      15.2gb  51   1.7d 1.8.0_121
-      DATANODE-04   5.1.2   di       1.6tb   11.9gb  15.6gb      15.4gb  67   1.7d 1.8.0_121
-      KBNANODE-01   5.1.2   -       38.9gb   11.9gb  15.6gb      14.3gb   1   1.7d 1.8.0_121
-      MSTRNODE-01   5.1.2   mi      39.6gb   11.9gb  15.6gb      13.5gb   2   1.7d 1.8.0_121
*      MSTRNODE-02   5.1.2   mi      39.6gb   11.9gb  15.6gb      13.5gb   0   1.7d 1.8.0_121
-      MSTRNODE-03   5.1.2   mi      39.6gb   11.9gb  15.6gb      13.5gb   0   1.7d 1.8.0_121

health status index                          
green  open   logstash-2017.01.23
green  open   logstash-2017.01.24
green  open   logstash-2017.01.25
green  open   logstash-2017.01.26

index                           shard prirep state          ip      node
logstash-2017.01.23             2     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.23             1     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.23             3     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.23             0     p      STARTED        x.x.x.x DATANODE-04
logstash-2017.01.24             2     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.24             1     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.24             3     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.24             0     p      STARTED        x.x.x.x DATANODE-04
logstash-2017.01.25             2     p      STARTED        x.x.x.x DATANODE-04
logstash-2017.01.25             1     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.25             3     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.25             0     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.26             2     p      STARTED        x.x.x.x DATANODE-01
logstash-2017.01.26             1     p      STARTED        x.x.x.x DATANODE-02
logstash-2017.01.26             3     p      STARTED        x.x.x.x DATANODE-03
logstash-2017.01.26             0     p      STARTED        x.x.x.x DATANODE-04

shards disk.indices disk.used disk.avail disk.total disk.percent host    ip      node
    10      225.7gb   328.9gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-01
    10      225.6gb   328.4gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-02
    11      227.1gb   330.3gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-03
    10      227.7gb   330.5gb      1.6tb      1.9tb           16 x.x.x.x x.x.x.x DATANODE-04

master name            indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
-      DATANODE-01               58864840                      0                     0                     0
-      DATANODE-02               58791438                      0                     0                     0
-      DATANODE-03               58715893                      0                    38                     0
-      DATANODE-04               58784564                      0                     0                     0
-      KBNANODE-01                      0                      0                     0                     0
-      MSTRNODE-01                      0                      0                     0                     0
*      MSTRNODE-02                      0                      0                     0                     0
-      MSTRNODE-03                      0                      0                     0                     0

(Mark Walkom) #4

You wouldn't put all that into a single DB table would you? So with that logic it makes sense to split stuff out, keep it hygienic, prevents mapping explosions, allows custom retention per source.


(Jason Kopacko) #5

Hahah, well, you are talking to someone who knows just above 0.00001% of what he is doing or databases in general.

But I will split them out.

In terms of Kibana, is there an easy way to search multiple indexes in the Discover tab?


(Mark Walkom) #6

Only what is in index patterns, so having a similar naming structure, eg logstash-$type-$date, will help.


(Jason Kopacko) #7

So like:

syslog data coming in from network devices

  • logstash-network-$date

syslog data coming from applications

  • logstash-application-$date

syslog data coming in from unix servers

  • logstash-unix-$date

syslog data coming in from ESX hosts

  • logstash-esx-$date

winlogbeat data from windows event logs

  • logstash-winlogbeat-$date

?

For the most part, 99% of all the stored data is system logs, connection logs, event logs, etc. The reason I have them all in the same index was so I could punch in an IP address and see all the sources that referenced it.


(Mark Walkom) #8

Yep like that.
Then if you use a logstash-* index pattern in KB, you can still see all the data, but split things out.


(Jason Kopacko) #9

Awesome sauce!!!!! I can't wait to give this a try.

In terms of performance, do you think it will be a boost or a negative?


(Anh) #10

With this pattern that includes all indexes of different types, when we filter a type in Kibana, will only the indexes that has this type be queried or will all indexes be queried?

Also, will Kibana complain about data type conflict if the same field of different types in different indexes have different data type? e.g,

logstash-index1 > type1 > my_field (string)
logstash-index2 > type2 > my_field (long)

(Jason Kopacko) #11

So,
I have one data center cluster using multiple indexes and one cluster using a single index.

From what I can tell, the multi-index cluster seems to be missing some logs (not many) and is definitely SLOWER compared to the single index cluster.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.