Currently all these sources get collated into ONE index.
I am looking into running Packetbeat on domain controllers for DNS logging as well as a couple other beats for metrics, etc.
My question is, in order to view them easily in Kibana, I had put them all in the same index. Is there any value/performance increase in splitting these out into their own indexes?
Here are some stats that may answer your questions in order to guide me better. Also, keep in mind, I just rebuilt this cluster from scratch. So it only has a few days worth of data, but the retention period is about 20 days. Eventually I will get more space to increase the retention, but this all I have at the moment.
{
"cluster_name" : "########",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 8,
"number_of_data_nodes" : 4,
"active_primary_shards" : 34,
"active_shards" : 41,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
master name version role disk.avail heap.max ram.max ram.current cpu uptime jdk
- DATANODE-01 5.1.2 di 1.6tb 11.9gb 15.6gb 15.5gb 31 1.7d 1.8.0_121
- DATANODE-02 5.1.2 di 1.6tb 11.9gb 15.6gb 15.5gb 46 1.7d 1.8.0_121
- DATANODE-03 5.1.2 di 1.6tb 11.9gb 15.6gb 15.2gb 51 1.7d 1.8.0_121
- DATANODE-04 5.1.2 di 1.6tb 11.9gb 15.6gb 15.4gb 67 1.7d 1.8.0_121
- KBNANODE-01 5.1.2 - 38.9gb 11.9gb 15.6gb 14.3gb 1 1.7d 1.8.0_121
- MSTRNODE-01 5.1.2 mi 39.6gb 11.9gb 15.6gb 13.5gb 2 1.7d 1.8.0_121
* MSTRNODE-02 5.1.2 mi 39.6gb 11.9gb 15.6gb 13.5gb 0 1.7d 1.8.0_121
- MSTRNODE-03 5.1.2 mi 39.6gb 11.9gb 15.6gb 13.5gb 0 1.7d 1.8.0_121
health status index
green open logstash-2017.01.23
green open logstash-2017.01.24
green open logstash-2017.01.25
green open logstash-2017.01.26
index shard prirep state ip node
logstash-2017.01.23 2 p STARTED x.x.x.x DATANODE-01
logstash-2017.01.23 1 p STARTED x.x.x.x DATANODE-02
logstash-2017.01.23 3 p STARTED x.x.x.x DATANODE-03
logstash-2017.01.23 0 p STARTED x.x.x.x DATANODE-04
logstash-2017.01.24 2 p STARTED x.x.x.x DATANODE-01
logstash-2017.01.24 1 p STARTED x.x.x.x DATANODE-02
logstash-2017.01.24 3 p STARTED x.x.x.x DATANODE-03
logstash-2017.01.24 0 p STARTED x.x.x.x DATANODE-04
logstash-2017.01.25 2 p STARTED x.x.x.x DATANODE-04
logstash-2017.01.25 1 p STARTED x.x.x.x DATANODE-03
logstash-2017.01.25 3 p STARTED x.x.x.x DATANODE-02
logstash-2017.01.25 0 p STARTED x.x.x.x DATANODE-01
logstash-2017.01.26 2 p STARTED x.x.x.x DATANODE-01
logstash-2017.01.26 1 p STARTED x.x.x.x DATANODE-02
logstash-2017.01.26 3 p STARTED x.x.x.x DATANODE-03
logstash-2017.01.26 0 p STARTED x.x.x.x DATANODE-04
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
10 225.7gb 328.9gb 1.6tb 1.9tb 16 x.x.x.x x.x.x.x DATANODE-01
10 225.6gb 328.4gb 1.6tb 1.9tb 16 x.x.x.x x.x.x.x DATANODE-02
11 227.1gb 330.3gb 1.6tb 1.9tb 16 x.x.x.x x.x.x.x DATANODE-03
10 227.7gb 330.5gb 1.6tb 1.9tb 16 x.x.x.x x.x.x.x DATANODE-04
master name indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
- DATANODE-01 58864840 0 0 0
- DATANODE-02 58791438 0 0 0
- DATANODE-03 58715893 0 38 0
- DATANODE-04 58784564 0 0 0
- KBNANODE-01 0 0 0 0
- MSTRNODE-01 0 0 0 0
* MSTRNODE-02 0 0 0 0
- MSTRNODE-03 0 0 0 0
You wouldn't put all that into a single DB table would you? So with that logic it makes sense to split stuff out, keep it hygienic, prevents mapping explosions, allows custom retention per source.
For the most part, 99% of all the stored data is system logs, connection logs, event logs, etc. The reason I have them all in the same index was so I could punch in an IP address and see all the sources that referenced it.
With this pattern that includes all indexes of different types, when we filter a type in Kibana, will only the indexes that has this type be queried or will all indexes be queried?
Also, will Kibana complain about data type conflict if the same field of different types in different indexes have different data type? e.g,
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.