How to update denormalized data in Elasticsearch?


(Lovro Mažgon) #1

Hi logstash community! Actually I have asked this question on StackOverflow, but noone seems to know the answer. Maybe you can help me.


Use case explanation

We have a relational database with data about our day-to-day operations. The goal is to allow users to search the important data with a full-text search engine. The data is normalized and thus not in the best form to make full-text queries, so the idea was to denormalize a subset of the data and copy it in real-time to Elasticsearch, which allows us to create a fast and accurate search application.

We already have a system in place that enables Event Sourcing of our database operations (inserts, updates, deletes). The events only contains the changed columns and primary keys (on an update we don't get the whole row). Logstash already gets notified for each event so this part is already handled.


Actual problem

Now we are getting to our problem. Since the plan is to denormalize our data we will have to make sure updates on parent objects are propagated to the denormalized child objects in Elasticsearch. How can we configure logstash to do this?

Example

Lets say we maintain a list of Employees in Elasticsearch. Each Employee is assigned to a Company. Since the data is denormalized (for the purpose of faster search), each Employee also carries the name and address of the Company. An update changes the name of a Company - how can we configure logstash to update the company name in all Employees, assigned to the Company?

Practical example: We have 3 employees in Elasticsearch:

{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}

Then an update happens in the source DB.

UPDATE company SET name = 'Company NEW' WHERE cmp_id = 1;

We get an event in logstash, where it says something like this:

{type:'company',cmp_id:'1',old.name:'Company A',new.name:'Company NEW'}

This should then be propagated to Elasticsearch, so that the resulting employees are:

{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}

Notice that the field company.name changed.

Possible solution

I see that elasticsearch supports Update by query, which would be very useful in our case. We could create a query to search for all employees with cmp.id:'1' and change the name. The problem is just that the elasticsearch-output plugin doesn't support this operation (at least not yet, but there is an issue).

So, does anybody know how we could create an update by query with existing filter/output plugins?

Thanks!


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.