Updating a field in a doc with the fields been obtained from another dataset (basically joins like feature)

I have an index with data as :
{
"_index" : "qstat-2019.03.29",
"_type" : "doc",
"_source" : {
"Job_Number" : "150153525",
"Job_State" : "pending",
"hard_req_queue" : "all.q",
"Submission_time" :
}

As you could see that "Submission_time" field is empty which I would get from another dataset and was getting that data set into elastic into a different index and trying to create join between both the indexes based on "Job_Number" field. But as joins are not supported so is there a way that I can stream in the second data source and update the value of "Submission_time" in above shown index by matching the "job_Number" field.

My both the data sources are xml files so I am parsing the xml file in logstash before writing the data into elastic. So, I would parse the second xml in logstash to extract "job_Number" and "Submission_time" and write the later to the index doc shown above based on the matching "job_Number"

Any help on this please

I'd use logstash to read from elasticsearch, add an elasticsearch filter to do lookups and an elasticsearch output to write to elasticsearch the result.

Maybe you should think about maintaining an entity-centric index built from these events?

In your case the entity would be a "job". The example document looks like a state-change event and several of these could be summarised in a job entity. The advantage of this approach is:

  1. You can reduce the cost of joining fast-changing data (multiple events can be batched into a single update to the job entity)
  2. Custom attributes can be derived from multiple events e.g. "jobDuration" or "lastStatus".
  3. Your update scripts can see all job history and tag anomalies eg where state changes don't follow expected sequences

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.