Sync Elasticsearch with MySQL Database

Hello!

As per our application requirement, we need to sync Elasticsearch with MySQL. We will apply the best optimal way according to your advice.

Below is the DB EER Diagram just for the reference

Now we need to apply the below things

  1. Elasticsearch will contain Index with Denormalised form, something like below
{
  "shops": [
    {
      "shop_id": 1001,
      "shop_name": "XYZ Pharmacy",
      "shop_type": "Pharmacy Store",
      "latlong": {
        "lat": 12.345678,
        "lon": 12.345678
      },
      "timing": "8:00 to 8:00",
      "working_days": "Mon - Sun",
      "products": [
        {
          "product_id": 201,
          "product_name": "AAAAA 10 mg",
          "product_tag": [
            "Drug",
            "Strip",
            "XXXX"
          ]
        },
        {
          "product_id": 202,
          "product_name": "BBBBB 20 mg",
          "product_tag": [
            "Syrup",
            "Cough",
            "XXXX"
          ]
        }
      ]
    },
    {
      "shop_id": 1002,
      "shop_name": "ABC Fastfood",
      "shop_type": "Fastfood",
      "latlong": {
        "lat": 12.345678,
        "lon": 12.345678
      },
      "timing": "8:00 to 8:00",
      "working_days": "Mon - Sun",
      "products": [
        {
          "product_id": 302,
          "product_name": "CCCC",
          "product_tag": [
            "Sandwich",
            "Wrap",
            "XXXX"
          ]
        }
      ]
    }
  ]
}
  1. If there's any change (Insert, Update) either in Shop or Product Table of Mysql it should sync into Elasticsearch.
  2. We will follow timestamp with is_updated, is_deleted field for syncing.

But the problem is,

  • How to denormalize the Index?
  • How to configure to detect any changes in Shop, Product_Category, Category or Product table in MySQL and sync with ES.

Usecases

  1. User can search by Product Name,
  2. Product Category
  3. Shop Name
  4. Shop Type

We highly appreciate your best advice on this.

Thanks in Advance!

There is a good blog post about that here and for denormalizing you can use a JOIN between those two tables in your statement option of the JDBC input plugin.

Dear Rahul Kumar,

Thank you very much for your prompt response. Sorry I was just editing my earlier post and before I finish editing, I got your response

Could you please throw your sight once again on my edited query

How you structure this will depend on how you expect to search the data and how you want the results back. If you primarily are searching for products and want to filter based on shop details and categories I would recommend looking into fully flattening when you denormalize. Your example would then be transformed into a number of distinct documents like this that consist of data from all tables:

{
	"shop_id": 1001,
	"shop_name": "XYZ Pharmacy",
	"shop_type": "Pharmacy Store",
	"latlong": {
		"lat": 12.345678,
		"lon": 12.345678
	},
	"timing": "8:00 to 8:00",
	"working_days": "Mon - Sun",
	"product_id": 201,
	"product_name": "AAAAA 10 mg",
	"product_tag": [
		"Drug",
		"Strip",
		"XXXX"
	]
}

You can store this with a document ID created from shop_id, product_id and potentially also category_id. If you add a timestamp field to this query that is the maximum of the modified_on fields from the tables you will be able to identify which documents that have changed and update just these.

This means that some information is duplicated across documents, but that will provide you will simpler querying and maintenance, so is often a worthwhile tradeoff.

This is an example of how you often need to step away from a relational thinking when working with Elasticsearch.

Hello @Christian_Dahlqvist,
Thanks for the reply.
How is it possible to have one-many without relating to each other? If theres any way pls suggest me.

I can managed to do One-One by putting the value of itself in the table rather then referencing it.

If I use, join query and if there is any change in join table, I think JDBC plugin will not listen to it. I think JDBC plugin can listen to only field for any changes?

Perform a full join across all tables and store each record as a single document in Elasticsearch. If you include a single field that is the maximum of the modified timestamp for all 4 included tables the JDBC query should identify any document affected by an update and update Elasticsearch accordingly. The Logstash JDBC plugin records the last time it ran and will find documents having a newer timestamp and update these every time it runs.

1 Like

@Christian_Dahlqvist I was also thinking of a similar solution as you suggested.

But I hope this will not create performance issues and there will be a repetition of fields.

Yes, there will be a repetition of fields, but Elasticsearch is quite good at compressing that. It does make handling updates quite easy and is ideally suited for efficiently searching for products. The query would be structured something like this, although you may want to add the categories as a list:

SELECT t.type_name, s.shop_id, s.shop_name, p.product_id, p.name, GREATEST(t.modified_on, s.modified_on, p.mofified_on) as last_modified_on 
WHERE s.shop_id = p.shop_id AND s.shop_id = t.shop_id;

Updates get a bit more expensive as many documents need to be updated if details of shops and categories change, but I would assume this is a reasonably rare thing so it will just add load at times. These updates are likely to be done in bulk anyway which is quite fast and efficient.

Thanks again, you are super quick. Let me give a try and will update you on how it works.

I stripped it down to the base minimum as I have not written a lot of SQL lately. Hopefully it is a useful starting point though.

Now, let me try with logstash, for me logstash is completely new though.
If I need any help will bother you again :sunglasses:.

@Christian_Dahlqvist ,
I implemented the logic exactly as we discussed.
But the problem is, it is creating a new Document everytime it finds an updated value of last_modified_on instead of updating the document.
Due to this, ie, even if the product name is changed or say product is deleted; even though it's deleted or updated the product with new name and old name or deleted it will return in the JSON list.

Would you please guide me on this?

Select fields that uniquely identifies a document and use this as a document_id in your Elasticsearch output.

Then I think I need to maintain Composite Key as relation is One-Many.
Let me check.
Thanks for immediate kind response.

Hello @Christian_Dahlqvist,
MySQL database has two double type fields latitude and longitude for location.
But how to map it to elasticsearch in one field as a location with datatype geo_point using JDBC Plugin.
To achieve similar to below,

    "location" : {
          "lat" : 23.026313,
          "lon" : 72.553062
        }

Thanks in Advance!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.