Index parent child relation using Logstash (one to many) Configuration issue


(Samta Malhotra) #1

I am using parent-child relation between 2 entities. They are fetched from the database where they come from 2 different tables. I use the SQL join statement for them in logstash config file. Details of the config file:

input {
jdbc {
jdbc_connection_string => "url"
jdbc_user => "user"
jdbc_password => "pswd"
jdbc_driver_library => "./ifxjdbc-3-50-JC7.jar"
jdbc_driver_class => "com.informix.jdbc.IfxDriver"
statement =>  ["SELECT st1.name as s_name, st1.pnumber, st1.mnumber, st2.name as comp_name, zen.s_id, zen.comp_id, zen.conc_1, zen.conc_2 FROM sub_zen zen join sub st1 on st1.id = zen.s_id join sub st2 on st2.id = zen.comp_id"]}}

filter {mutate {remove_field =>["@timestamp"]}}

output {
stdout { codec => json_lines }
elasticsearch {
"manage_template" => "false"
"hosts" => "url"
"index" => "test-migrate"
"parent"  => "%{s_id}"}}

So, what I am doing is I am joining 2 tables first on "id" which is primary key in Table sub (st1 & st2) with table sub_zen first on "s_id" and then "comp_id". So, id from sub can be either s_id or comp_id in sub_zen.

I define the mapping in ES to have dynamic strict mapping and set manage_template false in logstash. Now, I would like to index the data into ES where I will have 2 different types.

Sub with fields: s_id,s_name,mnumber and pnumber
Comp with fields: comp_id, comp_name, conc_1,conc_2

I have defined my mapping in ES already:

curl -GET url:9200/test-migrate/_mapping?pretty 

{
"test-migrate" : {
"mappings" : {
"sub" : {
"dynamic" : "strict",
"properties" : {
"mnumber" : {
"type" : "long"
 },
 "pnumber" : {
 "type" : "long"
 },
 "s_id" : {
"type" : "long"
 },
"s_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"comp" : {
"dynamic" : "strict",
"_parent" : {
"type" : "sub"
},
"_routing" : {
"required" : true
},
"properties" : {
"comp_id" : {
"type" : "long"
},
"comp_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"conc_1" : {
"type" : "float"
},
"conc_2" : {
"type" : "float"
}
}
}
}
}
}

The current logstash config file does not index the data correctly as i would like, what shall i change? Any pointers? How can i guide logstash to index fields (id,name,pnumber,mnumber) to sub and (id,name,conc_1,conc_2) to comp and also respect parent child relation. Here sub is a parent which can have many comp child. Any pointers?


(Fabien Baligand) #2

Hi,

Have you tried to use aggregate plugin, using this sample use case ?
https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html#plugins-filters-aggregate-example4

Then, could you give an example of the target document (or target documents) you wish in elasticsearch, for one zen related to 1-n st1 and 1-m st2 ?
Please format it in json or rubydebug format.


(Samta Malhotra) #3

Hi,

I have solved my problem yesterday already (was short on time). But, thanks for your help and suggestion.


(Fabien Baligand) #4

Ok, Nice to see you solved your problem !
For info, did you use aggregate plugin ?


(Samta Malhotra) #5

Yeah, I did. I used nested structure. Used the elasticsearch template provided by logstash. Took hints from your
reply to other posts..:smiley: The output works and gives me results. But, I am still wondering if there is better way to organize the mapping. So, I have say sub as a parent with many comp childs. Right now, I have structure like this:

[
{
"parent": {
"child": [
{
"conc_1": 1,
"conc_2": 2,
"comp_id": 129,
"comp_name": "abc"
},
{
"conc_2": 1,
"conc_1": 15,
"comp_id": 11,
"comp_name": "abc1"
}
]
},
"pnumber": 200,
"@timestamp": "abc",
"sub_name": "Eb",
"@version": "1",
"mnumber": 21,
"s_id": 22
},
{
"parent": {
"child": [
{
"conc_2": 6,
"conc_1": 5,
"comp_id": 25,
"comp_name": "Water"
},
{
"conc_2": 1,
"con2_1": 14,
"comp_id": 129,
"comp_name": "cdf"
},
{
"conc_2": 10,
"conc_1": 7,
"comp_id": 1686,
"comp_name": "Mag"
}
]
},
"pnumber": 207,
"@timestamp": "2017-07-19T08:33:30.890Z",
"s_name": "KMW",
"@version": "1",
"mnumber": 21,
"s_id": 22913
}

And so on....

now, I just did a quick search using Kibana. When i search for say give me all comp_id:129, I get all the records along with other comp_ids too as they are in array. Is there any better way to search or organize mapping??


(Fabien Baligand) #6

If you use nested type for your parent/child need, that is normal.
Kibana queries elasticsearch which returns all documents which match your query.
But for all matches, the whole elasticsearch document is returned, so you see also other array entries (in the same document).

If you want only children which match your query, you maybe can use parent/child elasticsearch mapping.
It implies that parents and children are different documents. It implies that in your ES response, you get only child info and not parent info.
If you want both, you have to use has_parent query with inner_hits :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#parent-child-inner-hits


(Samta Malhotra) #7

Yeah, i guessed so. Just wanted to have a working backend before I jump into the ES complex queries. I guess, Kibana (discover) is not yet there to have refined queries.

My goal is to have parent-child-grandchild relationship in the end. Thanks for your suggestions so far. Will keep you updated!


(Fabien Baligand) #8

In kibana, You can do has_parent queries with inner_hits.
But to see parent data in kibana, you have to expand a result document and click on "JSON" view to see parent inner hit


(Samta Malhotra) #9

Ok! I need to ask you something with aggregate plugin. Can we make parent-child-grandchild structure? I tried it, but, does not work yet. This is what i have tried so far.

map['parent'] ||= {}    
     map['parent]['child'] ||= []
          map['parent']['child']['grandchild'] ||= []
          map['parent']['child']['grandchild'] << {'conc_1' => event.get('conc_1'),'conc_2' => event.get('conc_2')}

(Fabien Baligand) #10

Sure, you can do this using aggregate plugin.

The problem in your code is this line :
map['parent]['child'] ||= []

If you set an array, then in the next line, you can't set a 'grandchild' field as if 'child' was a structure {}.

So either you set map['parent]['child'] ||= {}
either you insert 'grandchild' as an array entry : map['parent']['child'] << {'grandchild': []}


(Samta Malhotra) #11

Hi again,

So, now I have extensively tested the parent-child-grandchild relationship etc. Works great! I want to ask something additionally and dont want to open new issue for it.

When we map the attributes to array, is it possible to do it for all the attributes automatically without writing the name individually. so what i mean...

map['parent']['child']['grandchild'] << {'conc_1' => event.get('conc_1'),'conc_2' => event.get('conc_2')}

Here instead of writing name of conc_1 and conc_2, individually i can map all the attributes automatically


(Fabien Baligand) #12

This should do the job :

map['parent']['child']['grandchild'] << event.to_hash


(Samta Malhotra) #13

Thanks again. I have some other issue to resolve. Posted it here https://stackoverflow.com/questions/45750086/change-null-value-from-database-for-one-field-using-logstash-and-insert-into-ela

If you can help...


(Fabien Baligand) #14

As it is a very different issue, I invite you to open a new topic for that, and notify me, with @fbaligand


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.