Parent-Child query for 6.3.2

I have created a template file with mapping for a parent-child relationship:

### Template file
{
  "index_patterns" : [ "test-*" ],
  "settings" : {
    "index" : {
      "number_of_shards" : "1",
      "number_of_replicas" : "0"
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "family": {
          "type": "join",
          "relations": {
             "customer": [ "file", "picture" ]
          }
        }
      }
    }
  }
}

I have two config files, one for customers and one for files:

### Customer config
input {
  stdin {
    type => "customer"
  }
}
filter {
    csv {
      columns => [ "customer_id", "first", "last", "contact_time" ]
      separator => ","
      remove_field => [message]
    }
    mutate {
      add_field => { "family" => "customer" }
    }
    date {
      match => [ "contact_time", "UNIX" ]
    }
}
output {
    elasticsearch {
        hosts => ["elastic:9200"]
        index => "test-%{+YYYY.MM}"
    }
}

### File Config
input {
  stdin {
    type => "file"
  }
}
filter {
    csv {
      columns => [ "customer_id", "filename", "contact_time" ]
      separator => ","
      remove_field => [message]
    }
    date {
      match => [ "contact_time", "UNIX" ]
    }
    mutate {
      add_field => {"[family][name]" => "file"}
      add_field => {"[family][parent]" => "%{customer_id}"}
    }
}
output {
    elasticsearch {
        hosts => ["elastic:9200"]
        index => "test-%{+YYYY.MM}"
        routing => "%{customer_id}"
    }
}   

Test CSV data for customers is:

  • 5001,Joe,Bogel,1533339450
  • 5002,Jim,Bogel,1533339476
  • 5003,Jil,Bagel,1533339510

and for files:

  • 5001,login.txt,1533339451
  • 5001,logout.txt,1533339477
  • 5001,session.txt,1533339511
  • 5002,login.jpg,1533339452
  • 5002,logout.jpg,1533339478
  • 5002,session.jpg,1533339512
  • 5003,login.gif,1533339453
  • 5003,logout.gif,1533339479
  • 5003,session.gif,1533339513

The data loads and is indexed without errors but the following query from Kibana does not return ANY documents:

{
  "query": {
    "has_child": {
      "query": {
        "term": {
          "filename": "session.jpg"
        }
      },
      "type": "file"
    }
  }
}

I would expect it to return the record for Jim Bogel with customer_id of 5002.

Here is the data for those records:

  {
    "_index" : "test-2018.08",
    "_type" : "doc",
    "_id" : "-aihAmUB5LAMSjzPZe2-",
    "_score" : 1.0,
    "_source" : {
      "last" : "Bogel",
      "contact_time" : "1533339476",
      "@version" : "1",
      "host" : "alpha",
      "first" : "Jim",
      "family" : "customer",
      "customer_id" : "5002",
      "type" : "customer",
      "@timestamp" : "2018-08-03T23:37:56.000Z"
    }
  },
  {
    "_index" : "test-2018.08",
    "_type" : "doc",
    "_id" : "A6ihAmUB5LAMSjzP3O77",
    "_score" : 1.0,
    "_routing" : "5002",
    "_source" : {
      "contact_time" : "1533339510",
      "@timestamp" : "2018-08-03T23:38:30.000Z",
      "customer_id" : "5002",
      "filename" : "session.jpg",
      "family" : {
        "parent" : "5002",
        "name" : "file"
      },
      "host" : "alpha",
      "@version" : "1",
      "type" : "file"
    }
  }

This used to work well in 5.6.10 using the _parent field, but 6.3.2 join has not been friendly to me at all. Is there something obvious that I'm missing here? Any help would be greatly appreciated.

I think you will get a better response if you move this to the elasticsearch category.

It's worth a try. Thank you for the suggestion. I guess it might even be a Kibana issue since the apparent failure is manifested there.

I am becoming increasingly convinced that the problem is with the has_parent and has_child queries and not the parent-child mapping or logstash configurations since the parent_id query below works fine.

{
  "query": {
    "parent_id": {
      "type": "file",
      "id": "5002"
    }
  }
}

Results:

  {
    "_index" : "test-2018.08",
    "_type" : "doc",
    "_id" : "V6hUBWUB5LAMSjzPn-7t",
    "_score" : 0.35667494,
    "_routing" : "5002",
    "_source" : {
      "customer_id" : "5002",
      "type" : "file",
      "filename" : "login.jpg",
      "host" : "alpha",
      "@version" : "1",
      "family" : {
        "name" : "file",
        "parent" : "5002"
      },
      "contact_time" : "1533339452",
      "@timestamp" : "2018-08-03T23:37:32.000Z"
    }
  },
  {
    "_index" : "test-2018.08",
    "_type" : "doc",
    "_id" : "WKhUBWUB5LAMSjzPn-7t",
    "_score" : 0.35667494,
    "_routing" : "5002",
    "_source" : {
      "customer_id" : "5002",
      "type" : "file",
      "filename" : "logout.jpg",
      "host" : "alpha",
      "@version" : "1",
      "family" : {
        "name" : "file",
        "parent" : "5002"
      },
      "contact_time" : "1533339478",
      "@timestamp" : "2018-08-03T23:37:58.000Z"
    }
  },
  {
    "_index" : "test-2018.08",
    "_type" : "doc",
    "_id" : "WahUBWUB5LAMSjzPn-7t",
    "_score" : 0.35667494,
    "_routing" : "5002",
    "_source" : {
      "customer_id" : "5002",
      "type" : "file",
      "filename" : "session.jpg",
      "host" : "alpha",
      "@version" : "1",
      "family" : {
        "name" : "file",
        "parent" : "5002"
      },
      "contact_time" : "1533339512",
      "@timestamp" : "2018-08-03T23:38:32.000Z"
    }
  }

Are the has_parent/child queries just broken now? Do I need to try to move on to another solution or will the has_parent and has_child queries be fixed for 6.3+?

I migrated some indexes from 5.6.10 that were created using the old style _parent field and the has_child/parent queries worked for that data using the 6.3.2 Kibana. That caused me believe I had misconfigured something when making the new mapping and Logstash configurations.

You have got this family property in your example file document:

"family" : {
  "parent" : "5002",
  "name" : "file"
}

The parent is equal to 5002, which is the customer_id of your customer. However, the parent should be set to the _id of the parent document, which in your example is V6hUBWUB5LAMSjzPn-7t.

That V6hUBWUB5LAMSjzPn-7t is an autogenerated ID. What you may want to do is update your Logstash configuration so that it sets the document _id of the customer documents equal to the customer_id. You could do that by setting this to the elasticsearch output:

document_id => "%{customer_id}"

You are spot on. Such an obvious oversight on my part. I actually added fingerprint to my final config:

In the Customer filter config I added:

fingerprint {
  source => "customer_id"
  target => "[@metadata][fingerprint]"
  method => "MURMUR3"
}

and changed the output to:

elasticsearch {
    hosts => ["elastic:9200"]
    index => "test-%{+YYYY.MM}"
    document_id => "%{[@metadata][fingerprint]}"
}

For the file filter I added and changed:

fingerprint {
  source => "customer_id"
  target => "[@metadata][fingerprint]"
  method => "MURMUR3"
}
mutate {
  add_field => {"[family][name]" => "file"}
  add_field => {"[family][parent]" => "%{[@metadata][fingerprint]}" }
}

and changed the output section to:

elasticsearch {
    hosts => ["elastic:9200"]
    index => "test-%{+YYYY.MM}"
    routing => "%{[@metadata][fingerprint]}"
}

It works as advertised. You just saved my bacon. I really do enjoy and appreciate the work you guys are doing to create this terrific product and all the support you guys on this forum provide to keep us on track.

Thanks again.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.