Load multiple tables from psql to elastic search

Rajesh03 · May 6, 2024, 9:55am

I have a config file like this:

input {
    jdbc {
        jdbc_connection_string => "jdbc:postgresql://localhost:5432/mydb"
        jdbc_user => "postgres"
        jdbc_driver_library => "/path/to/postgresql-9.4-1201.jdbc41.jar"
        jdbc_driver_class => "org.postgresql.Driver"
        statement => "SELECT * from <table>"
    }
}
output {
    stdout { codec => json_lines }
}

This copies data from a table in psql to an index in Elasticsearch. But I have thousands of tables in my DB so this approach is not efficient, I came up with a python program but even that copies data row by row from each table which is slow. Can anyone suggest the best solution for my problem?

Artem_Shelkovnikov · May 6, 2024, 10:10am

Hi Rajesh03,

Have you already checked our PostgreSQL connector?

It has a capability to copy all tables from your deployment + tries to do it in an optimal way.

You can either have it in Elastic Cloud or host the connector yourself on your own hardware. It's also open code, so you can fork it and update the implementation if it does not suit you.

Christian_Dahlqvist · May 6, 2024, 10:11am

What is the problem you are trying to solve by indexing the data into Elasticsearch?

Copying data from a relational database to Elasticsearch table by table is often not the best approach as Elasticsearch does not support joins. I would therefore recommend identifying what you want to be searching for and index documents that way. This often means that you join multiple table together in the relational database in order to create larger documents that you index into Elasticsearch. This reduces the number of statements you need to run, but is naturally not easy to automate.

Rajesh03 · May 6, 2024, 1:47pm

Thank you. I will check it out.

Rajesh03 · May 6, 2024, 1:48pm

I have the data of a e-commerse website in psql, in order to fetch the search results quickly I am trying to index them in Elasticsearch. The data is spread across multiple tables and it will also be updated regularly, so this isnt a one time migration, I need it to be like a trigger, whenever data is added in psql I need it to be added in Elasticsearch as well.

Christian_Dahlqvist · May 6, 2024, 2:56pm

The jdbc plugin can select data based on the timestamp it was last processed (see sql_last_value in the docs), so you can create a statement that can be run periodically and only return updated documents. This does however assume that you have an update timestamp on your tables and require you to select based on this/these in your queries. As it is run periodically there will always be a lag though. If you want to avoid this lag I believe you will need to update Elasticsearch from your application at the same time you make changes to your database.

system · June 3, 2024, 2:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JDBC input multiple tables with table name as a parameter Logstash	5	2225	October 25, 2017
MySQL to ElasticSearch One DB multiple tables Logstash	3	1838	July 6, 2017
PostgreSQL integration Elasticsearch	19	9127	July 6, 2017
Load multiple tables data from oltp to elasticsearch using logstash Logstash	1	488	January 21, 2022
Postgresql Stored Procedure integrate with elastic search Elasticsearch	1	440	July 5, 2017

Load multiple tables from psql to elastic search

Related topics