How to filter data while migrating from SQL to Elasticsearch using Logstash to obtain a nested strings array in a JSON object

maklil · September 7, 2021, 10:25pm

Hi, I'm trying to migrate data from SQL Server to Elasticsearch using Logstash into a certain format. My data is normalized and essentially involves a central table which is linked to other tables by a foreign key, and most of the data I'm trying to migrate is textual data, and what I'm trying to achieve is to have a single elastic type in a specific JSON format, which regroups all the necessary fields (aka SQL columns) coming from multiple SQL Tables.

For now and because of the multiple tables I use as a source, I'm doing this by an iterative process : I first created an elastic type on an index with an IDs mapping with the central table's primary keys. Then, each time I "import" the specific data for a field, I use an SQL Query to retrieve the column information and via Logstash, update the corresponding type in elastic using the primary-key by adding the field's information. When it involves only one field (string, number, boolean), it's an easy task but now it's tougher when I'm dealing with more complex information. What I'm looking for, is that I have 3 tables, Alpha, Jobs and Tasks. Each Alpha has multiple jobs and each job multiple tasks and they're linked to each other by foreign keys, so by a simple join query I'm able to have this figure :

And as an output, I'm trying to have this JSON format file as an Elasticsearch type :

{
"_id": "id1",
"jobs": [
    {
      "jobname":"identifier des tendances...",
      "tasks":["Connaissances de base  en stylisme","Chiffrage et calcul de coût","dessin                
       artistique"]
    },
    {
      "jobname":"Vêtements professionnels",
      "tasks":[ ]
    },
    {
      "jobname":"Coordonner la fabrication..",
      "tasks":["Techniques d'animation d'une équipe"]
    },
    {
      "jobname":"Elaborer une fiche ou une...",
      "tasks":["Technique de calcul des composantes...","Techniques rédactionnelles"]
    }
    ]
}

When it concerned only one nested field, aka regrouping the jobs into the same id, I managed to do it by a SQL query that merged jobs with common id into the same String and separated them by ",", then I splitted it using the filter mutate and remerged it in the output to obtain a JSON format like this :

{
"_id":"id1",
"jobs":["job1","job2","job3"]
}

I used this Logstash configuration file to do so :

    input {
  jdbc {
    jdbc_connection_string => "jdbc:sqlserver://localhost:1433"
    jdbc_driver_library => ""
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_user => "xxx"
    statement => 
"SELECT alphaid as id, jobs = 
STUFF((SELECT DISTINCT ', ' + jobname
           FROM Jobs b 
           WHERE b.alphaid = a.alphaid
          FOR XML PATH('')), 1, 2, '')
FROM Jobs a
GROUP BY alphaid
ORDER BY alphaid;"
  }
}

filter {
  mutate {   
        split => { "jobs" => "," }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "testjobs"
    document_id => "%{id}"
    action => "update"
    
    upsert => '{"jobs":"%{jobs}"}'
 
  }
}

Any idea on how to obtain the upper format using Logstash ?

system · October 5, 2021, 10:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Copy data from ms-sql to elasticsearch using logstash for Nested type Logstash	1	381	May 27, 2020
How to write a json values from a field in sql into elasticsearch using logstash Logstash	4	1891	July 6, 2017
Inserting a Complex Nested Json from Postgres to Elasticsearch via Logstash Logstash	12	7394	July 6, 2017
Periodically indexing SQL data to ES using Logstash Elasticsearch	5	749	November 8, 2017
Data from Mysql JSON column to Elasticsearch Logstash	2	1655	July 30, 2017

How to filter data while migrating from SQL to Elasticsearch using Logstash to obtain a nested strings array in a JSON object

Related topics