Hi, I'm trying to migrate data from SQL Server to Elasticsearch using Logstash into a certain format. My data is normalized and essentially involves a central table which is linked to other tables by a foreign key, and most of the data I'm trying to migrate is textual data, and what I'm trying to achieve is to have a single elastic type in a specific JSON format, which regroups all the necessary fields (aka SQL columns) coming from multiple SQL Tables.
For now and because of the multiple tables I use as a source, I'm doing this by an iterative process : I first created an elastic type on an index with an IDs mapping with the central table's primary keys. Then, each time I "import" the specific data for a field, I use an SQL Query to retrieve the column information and via Logstash, update the corresponding type in elastic using the primary-key by adding the field's information. When it involves only one field (string, number, boolean), it's an easy task but now it's tougher when I'm dealing with more complex information. What I'm looking for, is that I have 3 tables, Alpha, Jobs and Tasks. Each Alpha has multiple jobs and each job multiple tasks and they're linked to each other by foreign keys, so by a simple join query I'm able to have this figure :
And as an output, I'm trying to have this JSON format file as an Elasticsearch type :
{
"_id": "id1",
"jobs": [
{
"jobname":"identifier des tendances...",
"tasks":["Connaissances de base en stylisme","Chiffrage et calcul de coût","dessin
artistique"]
},
{
"jobname":"Vêtements professionnels",
"tasks":[ ]
},
{
"jobname":"Coordonner la fabrication..",
"tasks":["Techniques d'animation d'une équipe"]
},
{
"jobname":"Elaborer une fiche ou une...",
"tasks":["Technique de calcul des composantes...","Techniques rédactionnelles"]
}
]
}
When it concerned only one nested field, aka regrouping the jobs into the same id, I managed to do it by a SQL query that merged jobs with common id into the same String and separated them by ",", then I splitted it using the filter mutate and remerged it in the output to obtain a JSON format like this :
{
"_id":"id1",
"jobs":["job1","job2","job3"]
}
I used this Logstash configuration file to do so :
input {
jdbc {
jdbc_connection_string => "jdbc:sqlserver://localhost:1433"
jdbc_driver_library => ""
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => "xxx"
statement =>
"SELECT alphaid as id, jobs =
STUFF((SELECT DISTINCT ', ' + jobname
FROM Jobs b
WHERE b.alphaid = a.alphaid
FOR XML PATH('')), 1, 2, '')
FROM Jobs a
GROUP BY alphaid
ORDER BY alphaid;"
}
}
filter {
mutate {
split => { "jobs" => "," }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "testjobs"
document_id => "%{id}"
action => "update"
upsert => '{"jobs":"%{jobs}"}'
}
}
Any idea on how to obtain the upper format using Logstash ?