Load multiple tables from psql to elastic search

I have a config file like this:

input {
    jdbc {
        jdbc_connection_string => "jdbc:postgresql://localhost:5432/mydb"
        jdbc_user => "postgres"
        jdbc_driver_library => "/path/to/postgresql-9.4-1201.jdbc41.jar"
        jdbc_driver_class => "org.postgresql.Driver"
        statement => "SELECT * from <table>"
    }
}
output {
    stdout { codec => json_lines }
}

This copies data from a table in psql to an index in Elasticsearch. But I have thousands of tables in my DB so this approach is not efficient, I came up with a python program but even that copies data row by row from each table which is slow. Can anyone suggest the best solution for my problem?

Hi Rajesh03,

Have you already checked our PostgreSQL connector?

It has a capability to copy all tables from your deployment + tries to do it in an optimal way.

You can either have it in Elastic Cloud or host the connector yourself on your own hardware. It's also open code, so you can fork it and update the implementation if it does not suit you.

1 Like

What is the problem you are trying to solve by indexing the data into Elasticsearch?

Copying data from a relational database to Elasticsearch table by table is often not the best approach as Elasticsearch does not support joins. I would therefore recommend identifying what you want to be searching for and index documents that way. This often means that you join multiple table together in the relational database in order to create larger documents that you index into Elasticsearch. This reduces the number of statements you need to run, but is naturally not easy to automate.

1 Like

Thank you. I will check it out.

I have the data of a e-commerse website in psql, in order to fetch the search results quickly I am trying to index them in Elasticsearch. The data is spread across multiple tables and it will also be updated regularly, so this isnt a one time migration, I need it to be like a trigger, whenever data is added in psql I need it to be added in Elasticsearch as well.

The jdbc plugin can select data based on the timestamp it was last processed (see sql_last_value in the docs), so you can create a statement that can be run periodically and only return updated documents. This does however assume that you have an update timestamp on your tables and require you to select based on this/these in your queries. As it is run periodically there will always be a lag though. If you want to avoid this lag I believe you will need to update Elasticsearch from your application at the same time you make changes to your database.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.