Pipeline Performance & jdbc_static filter


(Hermann) #1

Hello everyone,

I am new to ELK and have a number of issues that I would like to share with you.

My difficulties concern the following points:

  • Pipeline Performance (FileBeat & Jbdc --> Logstash--> Elk-->Kibana)
  • Setting the nodes (Master, Data, ingest)
  • Using the jdbc_static plugin filter
  • The loading speed of kibana visualizations when we have a lot of documents in the index.

I. Pipeline Performance (FileBeat & Jbdc -->à Logstash--> Elk-->Kibana)

Context

A . Pipeline Architecture

We currently have an ELK pipeline based on the following architecture, (FileBeat & Jbdc -->à Logstash--> Elk-->Kibana)

We have 4 jdbc connectors at the logstash level that retrieve data from different tables to store them in different indexes.

B. System

Logstash--> Elk-->Kibana : run on, OS: Ubuntu, RAM 8G, CPU: Dual Core

FileBeat: run on, OS: windows, RAM 28G, CPU 2.3GX2

C. volumetric

For all our current jdbc connectors, we have one, which alone can collect to average 100 Million data in one fell swoop.

We can estimate an overall volume of 300 million data per month on average on this pipeline.

D. Nodes

All this traffic is managed by a single node (Default) currently.

E. Issue

The problem we are currently facing is that of performance. Loading 160 Million data on a jdbc connector took us 4 days (96 hours). It's huge for us.

We would like to load this data in 24 hours maximum.

F. My questions

What is the best sizing (A, B, D) we need to achieve our 160 million / 24-hour goals.

II. Setting the nodes

1 Is it possible to build a pipeline of several nodes (Master, Data, Ingest) on a single virtual machine?
2 Can you guide me to build a robust pipeline :blush:.

III. Using the jdbc_static plugin

So far we use jdbc plugins in the logstash input block for each sql table.

Input {
            Jbdc connector 1 for table 1
            Type 1

            Jbdc connector 2 for table 2
            Type 2 
}
Output {
           If (Type 1) {
              Index 1 for table 1
            }

             If (Type 2) {
              Index 2 for table 2
            }

}

But according to the documentation, it is possible to use at the filter level, jdbc_static to connect to several tables of the same base.

Filter{
jdbc_static {
#Loade Data From remote Database
loaders => [
{
query => "load data from remote table 1"
local_table => "save to lacal table 1"
}
{
query => "load data from remote table 2"
local_table => "save to local table 2"
}
]

#Set local Tabla to loade Data
local_db_objects => [
{
name => "set local table 1"
}
{
name => "set local table 2"
}
]
#Set loop Table
local_lookups => [
{
Query => "loop local table 1 put on field"
}
{
Query => "loop local table 2 put on field "
}
]

staging_directory => "****"
loader_schedule => "* * * * *" 
jdbc_user => "logstash"
jdbc_password => "example"
jdbc_driver_class => "*****"
jdbc_driver_library => "****"
jdbc_connection_string => "****"

      }

My questions

  1. Can we store the local table 1 and table 2 data in a single index ?
  2. How to put the type key word, to retrieve the data in an index at the output block ?
  3. How put each local table to one index ?
 Output {
               If (Type) {
                  Data local table to one index 
               }
   }}

IV. kibana Slider loading kibana visualizations

When we have a lot of data in the index, Kibana takes too much time to load the visualizations. It even looks like the visualizations do not load all the data.
How to solve this problem.


(Christian Dahlqvist) #2

What type of hardware is this running on? How many hosts? What does the load on the host look like, e.g. CPU and disk I/O and iowait?


(Hermann) #3

Let me check the exact information with the system administrator and come back to you.
I gave some system information in my post already ( B. System). for host, we have only one


(Hermann) #4

Hello Christian Dahlqvist

Sorry for late reply, about you question, i have this information.
Thank for you help.

Size : Standard_D2s_v3

Virtual processors: 2

Memory : 8G

Temporary storage (SSD) in Gio: 16G

Data discs max: 4

Temporary storage rate and max cache: I / O per second / Mbps (cache size in Gio) : 4 000 / 32 (50)

Maximum disk speed without caching: I / O / MBps : 3 200 / 48

Max Number of NICs / Expected Network Bandwidth (MBps): 2 / 1 000

Premium Disks Type: P4 | P10
Disk size : 32 GB | 128 GB
IOPS per disk: 120 |500
Throughput per disk: 25 MB per second | 100 MB per second|
Disk usage: OS | Data


(Christian Dahlqvist) #5

What does CPU usage and disk performance look like? Do you have any monitoring installed?


(Hermann) #6

No i not have monitoring


(Christian Dahlqvist) #7

It is a very small host given you are running all the components on it, so I would suspect you may not have sufficient resources. Log into the host and use the top and iostat tools to give you an idea what is going on (assuming you are using Linux).