One to Many table relationship pre-processing

jdemp0913 · May 21, 2019, 1:15pm

We have a use case where we need to query and join two database tables in a 1-to-many relationship and upload and index those documents (records) to ElasticSearch every 3 minutes for use by a Kibana dashboard.

The total number of documents (records) that need to be indexed is ~240,000 every three minutes.

My question is:

Where is the best place to do this join pre-processing? In Logstash using a jdbc plug-in or in a c# application that does the query, join and converts the result set to JSON and then uses NEXT to bulk upload the JSON to ElasticSearch.

Mark_Harwood · May 21, 2019, 2:01pm

Hi John,
If you share a common field e.g. "customer_id" one way to do an efficient join purely in elasticsearch is using a sorted scroll across 2 indices e.g:

DELETE customers
POST customers/_doc/_bulk
{"index":{}}
{"customer_id": 1,"name": "Fred"}
{"index":{}}
{"customer_id": 2,"name": "Jim"}

DELETE orders
POST orders/_doc/_bulk
{"index":{}}
{"customer_id": 1,"order_value": 11}
{"index":{}}
{"customer_id": 2,"order_value": 22}
{"index":{}}
{"customer_id": 1,"order_value": 111}

POST customers,orders/_search?scroll=1m
{
  "sort": [
	{
	  "customer_id": {
		"order": "desc"
	  }
	}
  ]
}

A bespoke client script would process this stream, fusing docs that share a common customer_id into a single document which can then be added using a bulk index operation.

jdemp0913 · May 21, 2019, 8:57pm

Hi Mark, Thank you for your response. I have to admit I am new to Elastic Search, so I apologize in advance for any silly statements/questions.

Yes, I have a common field.

Using your example, what I am trying to do is combine 10,000 customers who have 25 orders each for a total of about 250,000 records (documents). The data currently exists in two database tables which is queried every three minutes. I need to refresh (re-index) all 250,000 records every 3 minutes (or shorter if possible), because the data in the orders is changing.

I am looking for the best way to bulk upload this data to ElasticSearch.

I do not know how to

Mark_Harwood · May 21, 2019, 9:13pm

It will probably involve writing some code using your favourite choice of language. We have client side libraries for a wide variety of languages that will call the apis I mentioned

jdemp0913 · May 21, 2019, 9:26pm

Right, I am currently using .Net (NEST) and c# and am having some good initial results. You have confirmed what I have believed to be true. Thanks.

system · June 18, 2019, 9:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.