Importing data to elasticsearch from multiple dynamic databases

enricog84 · November 3, 2021, 10:55am

Hi,

I am currently evaluating (and have no further experience so far) the ELK stack.
I am wondering if it would be possible to import data from multiple dynamic created MariaDB databases into Elasticsearch.

Short backgrond, we have a multi-tenant application where for each of our customer the data is kept in a separate database. So whenever a new customer comes, a new database is created. The available customers are managed in a central database.

What I want to do now is, fetch the available Customer (can go into the thousands) on the input and then enrich with data from the customers respective database. The databases of the customers do all look the same.

So what I did try to achieve this was:

input {
    jdbc {
        jdbc_connection_string => "jdbc:mariadb://127.0.0.1:3306/eg_test?sessionVariables=sql_mode=ANSI_QUOTES"
        jdbc_user => "user"
        jdbc_password => "secret"
        jdbc_driver_library => "/dataservice/mariadb-java-client-2.7.4.jar"
        jdbc_driver_class => "org.mariadb.jdbc.Driver"
        jdbc_page_size => 200000
        jdbc_paging_enabled => true
        statement => "SELECT
            customer
            FROM eg_test.customers"
    }
}
filter {
    jdbc_streaming {
        jdbc_connection_string => "jdbc:mariadb://127.0.0.1:3306/db_%{[customer]}?sessionVariables=sql_mode=ANSI_QUOTES"
        jdbc_user => "user"
        jdbc_password => "secret"
        jdbc_driver_library => "/dataservice/mariadb-java-client-2.7.4.jar"
        jdbc_driver_class => "org.mariadb.jdbc.Driver"
        statement => "SELECT
                data,
                COUNT(something) AS cnt
            FROM customer_table
            WHERE ctime >= UNIX_TIMESTAMP('${START_DATE}')
                AND ctime < UNIX_TIMESTAMP('${END_DATE}')
            GROUP BY data"
        parameters => { "customer" => "[customer]"}
        target => "data"
    }
}
output {
    # Add to the elastic search index
    elasticsearch {
        "hosts" => "127.0.0.1:9200"
        "index" => "customer_data_test"
    }
}

Tough this fails as the jdbc_streaming plugin is unable to resolve the field reference "%{[customer]}" in the jdbc_connection_string .

The error I get back is:

DatabaseConnectionError: Java::JavaSql::SQLSyntaxErrorException: Could not connect to address=(host=127.0.0.1)(port=3306)(type=master) : (conn=165) Unknown database 'db_%{[customer]}'>

So is there a possibility to do this?
The other way I could think of is, create the logstash config dynamically with another script and add each database as input. But would this scale with a few thousand customers?

Thanks for any input,
Enrico

Badger · November 3, 2021, 4:09pm

Correct. The filter (or rather, the mixin) does not sprintf the connection string. That gets called during initialization, so there is no event from which fields can be referenced.

enricog84 · November 4, 2021, 8:31am

Thank you for the response and confirming this.
Well then I will try to create several inputs and see how it performs

system · December 2, 2021, 8:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.