Indexing double nested relation using logstash

Hello there,

I'm configuring to use logstash to index all the data I've got in the database, and I'm currently having problems trying to find a solution for what I'm doing.

I'm trying to join 3 tables together, to create one single document with all of the information that's nested and only update those that have changed since :sql_last_value.

My tables have a relationship something like this (ignore the fact this could all just be one table w/ different types):

Parent
id
name
age
updated_at
Child
id
parent_id
name
age
GrandChild
id
child_id
name
age

Ideally, I would like 1 document PER parent, and parents can have 0 to any children, and children can have 0 to many grandchildren. I've gotten this to work with 1 layer of Parent + Child both populating into one document, but I'm having problems trying to get GrandChild to be another nested field within Child

The following is what my mapping would look like. One document for each parent, with a list of children, and each of those children can have a list of grandchildren.

"mappings": {
  "properties": {
    "id": {"type": "keyword"},
    "name": {"type": "text"},
    "age": {"type": "integer"},
    "updated_at": {"type": date},
    "children": {
      "type": "nested",
      "properties": {
        "id": {"type": "keyword"},
        "name": {"type": "text"},
        "age": {"type": "integer"},
        "grandchildren": {
          "type": "nested",
          "properties": {
            "id": {"type": "keyword"},
            "name": {"type": "text"},
            "age": {"type": "integer"}
          }
        }
      }
    }
  }
}

This is currently what I have in my conf file:

input {
  jdbc {
    id => "jdbc-topic"
    jdbc_driver_class => "${...}"
    jdbc_connection_string => "${...}"
    jdbc_user => "${...}"
    jdbc_password => "${...}"
    schedule => "* * * * *"

    statement => "SELECT p.id AS id, p.name AS p_name, p.age AS p_age, 
                         c.id AS c_id, c.name AS c_name, c.age AS c_age, 
                         gc.id AS gc_id, gc.name AS gc_name, gc.age AS gc_age 
                  FROM parent AS p LEFT JOIN child AS c ON p.id = c.parent_id 
                              LEFT JOIN grandchild AS gc ON c.id = gc.child_id 
                  WHERE p.updated_at > :sql_last_value;"

    use_column_value => false
  }
}

filter {
  aggregate {
    task_id => "%{id}"
    code => "
      map['id']         = event.get('id')
      map['age']        = event.get('p_age')
      map['name']       = event.get('p_name')
      map['children'] ||= []
      map['children'] << {
        'id'     => event.get('c_id'),
        'age'    => event.get('c_age')
        'name'   => event.get('c_name')
      }
      map['children']['grandchildren'] ||= []
      map['children']['grandchildren'] << {
        'id'     => event.get('gc_id'),
        'age'    => event.get('gc_age')
        'name'   => event.get('gc_name')
      }
      event.cancel()
    "
    push_previous_map_as_event => true
    timeout => 3
  }
}

output {
  elasticsearch {
    hosts => ["${...}"]
    action => "update"
    doc_as_upsert => true
    document_id => "%{id}"
    index => "test"
  }
  stdout { codec => json }
}

At this time of running something like the above, it is indexing X documents, where X is the number of parents that exist in the database, but I'm having difficulties getting the list of grandchildren added to the children of each of the parents.

So if my my database looks like the following:

Parent
id            |                 age                |                   name
----------------------------------------------------------------------------------------
1             |                 60                 |                   Joe
2             |                 63                 |                   Ben
3             |                 65                 |                   Carl
Child
id            |                 age                |                   name              |         parent_id
-----------------------------------------------------------------------------------------------------------------
1             |                 42                 |                   Will              |            1
2             |                 35                 |                   Lisa              |            1
3             |                 43                 |                   Ned              |            2
GrandChild
id            |                 age                |                   name              |         child_id
-----------------------------------------------------------------------------------------------------------------
1             |                 2                  |                   Fred              |            1
2             |                 5                  |                   Pam               |            1
3             |                 3                  |                   Jake              |            1
4             |                 3                  |                   Lori              |            3
5             |                 3                  |                   Max               |            3

So basically Joe, Ben, and Carl are parents.

Joe's document in ES would show...
Joe has 2 kids, Will and Lisa.
Will has 3 kids: Fred, Pam, and Jake.
Lisa has no kids - still trying to figure out what to do with her life.

Ben's document in ES would show...
Ben has 1 kid, Ned.
Ned has 2 kids: Lori and Max.

Carl's document in ES would show...
Carl has no kids, he's single and lonely.

Is there a way to get this relationship working with logstash to correctly push a document with the data joined from the 3 tables?

Thanks in advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.