Logstash: Index complex structure from jdbc input

Lucas_Dehandschutter · August 8, 2018, 7:54am

Hi,

I'd like to inject data from sql databases into Elasticsearch.

Let say I have 3 tables: Company, Department and Employee.
Each company has several department and each department has several employees.
This example has 2 level of nested informations.

I'd like to build and index a Company document like this:

   {
"comp_name": "World Company",
"comp_address": " 123 blablabla street...",
"departments": [
 {
     "dep_id": 1,
     "dep_name": "Human ressources",
     "employees": [
           {   "name": "John Smith", "age": "35", ...   },
           {   "name": "Peter Nash", "age": "46", ...   }
           ...
      ]  
 },
 
 {
     "dep_id": 2,
     "dep_name": "Accounting",
     "employees": [
           {   "name": "Lisa  Paulson", "age": "26", ...   },
           {   "name": "Mary Davids", "age": "32", ...   }
           ...
      ]  
 }
...
 ]
}

I cannot find a Logstash example explaining how to aggregate a TWO LEVEL nested structure.

Can anyone help me ?

Regards,

Lucas

Christian_Dahlqvist · August 8, 2018, 8:09am

Why are you looking to index a whole company as a single document? How do you want to search your data?

Would it not make sense to denormalise and store each employee as a separate document together with the relevant company and department information (which I would expect to rarely change)?

Lucas_Dehandschutter · August 8, 2018, 9:10am

Hi Christian,

I want to be able to:

Search in which company/dept is working an employee
Find all companies having an employee whose lastname is equal to xxx
Find all companies where a specific employee is working
Find all companies whose employees having less than 35 years represent more than 50% of the company's staff
Get the average age of company's/dept's employees
Display employees repartition in all depts of a company by gender, age, etc.
Display all informations from a company (Kibana or other)

Initially, i wanted to create a company document with parent-child relations for department and parent-child relation between dept and employees but that functionality was deprecated in last versions of Elastic.

Lucas

Christian_Dahlqvist · August 8, 2018, 9:19am

I would recommend looking into denormalising it the way I described earlier. As Elasticsearch is not a relational system, it is important to not try to model data in it based on a relational mindset and try using nested documents and/or parent-child as a replacement for foreign keys.

Lucas_Dehandschutter · August 8, 2018, 9:43am

Hi Christian,

It seems that the parent-child relation has been replaced by the join relations which is only used to build relations between documents of the same type which is clearly not my case.

As far as I understand the way nested documents works, it does not really change my problem as they are embedded in the whole document.

Christian_Dahlqvist · August 8, 2018, 9:48am

I do not necessarily think any of those approaches are required here and would recommend you to look into/consider flattening/denormalizing the model and index a document per employee something like this:

{
  "comp_name": "World Company",
  "comp_address": " 123 blablabla street...",
  "dep_id": 1,
  "dep_name": "Human ressources",
  "employee": {   
    "name": "John Smith",
    "age": "35"
  }
}

This will make querying as well as indexing a lot easier and most likely also more efficient.

Lucas_Dehandschutter · August 8, 2018, 1:09pm

Ok I understand what you mean but in this case, creating a document for each combination of employee-department-company isn't replicating too much the data?

As the employee is the grandchild of the structure, for each employee, the document will hold the entire company and department which will be repeated for each employee!

Is Elasticsearch suited for such data replication?

I guess that the id of my document must be autogenerated then?

Christian_Dahlqvist · August 8, 2018, 1:22pm

Denormalising data this is very common when modelling data in Elasticsearch, and Elasticsearch is able to handle this quite well. I would recommend trying it out to see how it performs.

Lucas_Dehandschutter · August 8, 2018, 2:20pm

Ok so i'm trying to reproduce your proposition but when i inject the data, logstash is not respecting the structure of the employee field.

Here is my aggregate function:

    filter{
  aggregate{
    task_id => "%{taskid}"
    code => "
    map[‘comp_id’] = event.get(‘comp_id’)
    map[‘comp_name’] = event.get(‘comp_name’)
    map[‘dept_id’] = event.get(‘dept_id’)
    map[‘dept_name’] = event.get(‘dept_name’)
    map[‘employee’] << {
      ‘emp_id’ => event.get(‘emp_id’),
      ‘emp_name’ => event.get(‘emp_name’),
      ‘emp_age’ => event.get(‘emp_age'),
    }
    event.cancel();
    "
    push_previous_map_as_event => true
    timeout => 60
  }
}

And here is a sample of what was indexed

{
  "comp_name": "World Company",
  "comp_address": " 123 blablabla street...",
  "dep_id": 1,
  "dep_name": "Human ressources",
  "emp_id »: "A454607 »
  "emp_name": "John Smith",
  "emp_age": "35 »,
  }

}

So the employee level is not there.

Any idea of what's wrong?

Christian_Dahlqvist · August 8, 2018, 2:23pm

Why are you using the aggregate filter? Why not use a perform a query joining all tables so you get the right structure directly from the database?

Lucas_Dehandschutter · August 8, 2018, 2:28pm

I just removed all the filter block but the hierarchy of my document is still not respected. Il is flatened and i don't have the employee level in de document??

Christian_Dahlqvist · August 8, 2018, 2:29pm

Can you show us the full config and what the data looks like?

Lucas_Dehandschutter · August 8, 2018, 2:38pm

The data are from a select where everything is like in the document meaning I have no employee reference.

Column are:

Here is logstash config

input {
  jdbc {
jdbc_connection_string => "jdbc:sqlserver://192.168.1.99:1433;database=companies"
jdbc_driver_library => "/usr/local/Cellar/logstash/6.2.4/bin/mssql-jdbc-7.0.0.jre8.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_user => "my_user"
jdbc_password => "my_password"

statement_filepath => "/usr/local/Cellar/logstash/6.2.4/data/emp.sql"

  }
}


output {
  stdout { codec => rubydebug }
  elasticsearch {
document_type => "employe_line"
hosts => ["localhost:9200"]
index => "employee"
codec => json
  }
}

And here is the mapping of the employee index

PUT /employe
{
  "mappings": {
"employe_line": {
  "properties": {
    "comp_id": {
      "type": "integer"
    },
    "comp_name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "dept_id": {
      "type": "integer"
    },
    "dept_name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "employee": {
      "properties": {
        "emp_id": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "emp_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "emp_age": {
          "type": "integer"
        }
      }
    }
  }
}
  }
}

system · September 5, 2018, 2:39pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash 2.3.x how to load nested objects in elasticsearch index using logstash-jdbc plugin Logstash	5	1959	June 11, 2019
Indexing double nested relation using logstash Logstash	1	446	March 20, 2020
How to configure logstash file to make nested object inside another nested filed from sql query? Logstash	3	845	May 4, 2021
Logstash jdbc plugin - nested data Logstash	9	1955	July 6, 2017
Logstash Pipeline for Nested data from sql Logstash	3	320	June 24, 2022

Logstash: Index complex structure from jdbc input

Related topics