How to filter data while migrating from SQL to Elasticsearch using Logstash to obtain a nested strings array in a JSON object

Hi, I'm trying to migrate data from SQL Server to Elasticsearch using Logstash into a certain format. My data is normalized and essentially involves a central table which is linked to other tables by a foreign key, and most of the data I'm trying to migrate is textual data, and what I'm trying to achieve is to have a single elastic type in a specific JSON format, which regroups all the necessary fields (aka SQL columns) coming from multiple SQL Tables.

For now and because of the multiple tables I use as a source, I'm doing this by an iterative process : I first created an elastic type on an index with an IDs mapping with the central table's primary keys. Then, each time I "import" the specific data for a field, I use an SQL Query to retrieve the column information and via Logstash, update the corresponding type in elastic using the primary-key by adding the field's information. When it involves only one field (string, number, boolean), it's an easy task but now it's tougher when I'm dealing with more complex information. What I'm looking for, is that I have 3 tables, Alpha, Jobs and Tasks. Each Alpha has multiple jobs and each job multiple tasks and they're linked to each other by foreign keys, so by a simple join query I'm able to have this figure :
img1

And as an output, I'm trying to have this JSON format file as an Elasticsearch type :

{
"_id": "id1",
"jobs": [
    {
      "jobname":"identifier des tendances...",
      "tasks":["Connaissances de base  en stylisme","Chiffrage et calcul de coût","dessin                
       artistique"]
    },
    {
      "jobname":"Vêtements professionnels",
      "tasks":[ ]
    },
    {
      "jobname":"Coordonner la fabrication..",
      "tasks":["Techniques d'animation d'une équipe"]
    },
    {
      "jobname":"Elaborer une fiche ou une...",
      "tasks":["Technique de calcul des composantes...","Techniques rédactionnelles"]
    }
    ]
}

When it concerned only one nested field, aka regrouping the jobs into the same id, I managed to do it by a SQL query that merged jobs with common id into the same String and separated them by ",", then I splitted it using the filter mutate and remerged it in the output to obtain a JSON format like this :

{
"_id":"id1",
"jobs":["job1","job2","job3"]
}

I used this Logstash configuration file to do so :

    input {
  jdbc {
    jdbc_connection_string => "jdbc:sqlserver://localhost:1433"
    jdbc_driver_library => ""
    jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
    jdbc_user => "xxx"
    statement => 
"SELECT alphaid as id, jobs = 
STUFF((SELECT DISTINCT ', ' + jobname
           FROM Jobs b 
           WHERE b.alphaid = a.alphaid
          FOR XML PATH('')), 1, 2, '')
FROM Jobs a
GROUP BY alphaid
ORDER BY alphaid;"
  }
}

filter {
  mutate {   
        split => { "jobs" => "," }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "testjobs"
    document_id => "%{id}"
    action => "update"
    
    upsert => '{"jobs":"%{jobs}"}'
 
  }
}

Any idea on how to obtain the upper format using Logstash ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.