Hi everybody,
I have multiple tables in my database which I would like to import into Elasticsearch using a single document type. One of the tables has a one-to-many relation to another table.
I will use the example as provided in the guide to explain the problem, because my problem is similar (https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html). So we have blogposts (in one table) and each blogpost can have multiple comments (in another table, linked with a one-to-many relation). Representing comments as Arrays of Inner Objects does not work for us, because we would loose the correlation between the different attributes of a comment as stated in the Guide. A solution is the use of Nested Objects, by mapping the comments field as type nested.
This would be a valid output to Elasticsearch:
{
 { 
   "comments.id":      [ 11 ],
   "comments.name":    [ john, smith ],
   "comments.comment": [ article, great ],
   "comments.age":     [ 28 ],
   "comments.stars":   [ 4 ],
   "comments.date":    [ 2014-09-01 ]
 }
 { 
   "comments.id":      [ 12 ],
   "comments.name":    [ alice, white ],
   "comments.comment": [ like, more, please, this ],
   "comments.age":     [ 31 ],
   "comments.stars":   [ 5 ],
   "comments.date":    [ 2014-10-22 ]
 }
 { 
   "id":               [ 1 ],
   "title":            [ eggs, nest ],
    "body":            [ making, money, work, your ],
   "tags":             [ cash, shares ]
 }
}
I used the following nested mapping: (mostly obtained from the guide)
PUT /my_index
{
  "mappings": {
    "blogpost": {
      "properties": {
        "comments": {
          "type": "nested", 
          "properties": {
	   "id":       { "type": "number"  },
            "name":    { "type": "string"  },
            "comment": { "type": "string"  },
            "age":     { "type": "short"   },
            "stars":   { "type": "short"   },
            "date":    { "type": "date"    }
          }
        }
      }
    }
  }
}
The statement below shows the relevant configuration to import data from jdbc using logstash:
input {
  jdbc {
    ...
     statement => "SELECT * FROM blogpost LEFT JOIN comments ON blogpost.id = comments.id ORDER BY blogpost.id"
    ...
  }
}
However, when looking what has been imported into ES I see that for each comment another event is generated instead of a single event per blogpost with comments as nested objects. I have not found how to implement this functionality, but here (https://github.com/jprante/elasticsearch-jdbc#structured-objects) this functionality is used. How should I proceed with importing to get a single event per blogpost with its comments as nested objects?
Thank you.
Daan