Sync Elasticsearch with MySQL Database

Punj_Shah · May 24, 2020, 8:08am

Hello!

As per our application requirement, we need to sync Elasticsearch with MySQL. We will apply the best optimal way according to your advice.

Below is the DB EER Diagram just for the reference

Now we need to apply the below things

Elasticsearch will contain Index with Denormalised form, something like below

{
  "shops": [
    {
      "shop_id": 1001,
      "shop_name": "XYZ Pharmacy",
      "shop_type": "Pharmacy Store",
      "latlong": {
        "lat": 12.345678,
        "lon": 12.345678
      },
      "timing": "8:00 to 8:00",
      "working_days": "Mon - Sun",
      "products": [
        {
          "product_id": 201,
          "product_name": "AAAAA 10 mg",
          "product_tag": [
            "Drug",
            "Strip",
            "XXXX"
          ]
        },
        {
          "product_id": 202,
          "product_name": "BBBBB 20 mg",
          "product_tag": [
            "Syrup",
            "Cough",
            "XXXX"
          ]
        }
      ]
    },
    {
      "shop_id": 1002,
      "shop_name": "ABC Fastfood",
      "shop_type": "Fastfood",
      "latlong": {
        "lat": 12.345678,
        "lon": 12.345678
      },
      "timing": "8:00 to 8:00",
      "working_days": "Mon - Sun",
      "products": [
        {
          "product_id": 302,
          "product_name": "CCCC",
          "product_tag": [
            "Sandwich",
            "Wrap",
            "XXXX"
          ]
        }
      ]
    }
  ]
}

If there's any change (Insert, Update) either in Shop or Product Table of Mysql it should sync into Elasticsearch.
We will follow timestamp with is_updated, is_deleted field for syncing.

But the problem is,

How to denormalize the Index?
How to configure to detect any changes in Shop, Product_Category, Category or Product table in MySQL and sync with ES.

Usecases

User can search by Product Name,
Product Category
Shop Name
Shop Type

We highly appreciate your best advice on this.

Thanks in Advance!

Rahul_Kumar4 · May 24, 2020, 11:01am

There is a good blog post about that here and for denormalizing you can use a JOIN between those two tables in your statement option of the JDBC input plugin.

Punj_Shah · May 24, 2020, 11:07am

Dear Rahul Kumar,

Thank you very much for your prompt response. Sorry I was just editing my earlier post and before I finish editing, I got your response

Could you please throw your sight once again on my edited query

Christian_Dahlqvist · May 24, 2020, 11:22am

How you structure this will depend on how you expect to search the data and how you want the results back. If you primarily are searching for products and want to filter based on shop details and categories I would recommend looking into fully flattening when you denormalize. Your example would then be transformed into a number of distinct documents like this that consist of data from all tables:

{
	"shop_id": 1001,
	"shop_name": "XYZ Pharmacy",
	"shop_type": "Pharmacy Store",
	"latlong": {
		"lat": 12.345678,
		"lon": 12.345678
	},
	"timing": "8:00 to 8:00",
	"working_days": "Mon - Sun",
	"product_id": 201,
	"product_name": "AAAAA 10 mg",
	"product_tag": [
		"Drug",
		"Strip",
		"XXXX"
	]
}

You can store this with a document ID created from shop_id, product_id and potentially also category_id. If you add a timestamp field to this query that is the maximum of the modified_on fields from the tables you will be able to identify which documents that have changed and update just these.

This means that some information is duplicated across documents, but that will provide you will simpler querying and maintenance, so is often a worthwhile tradeoff.

This is an example of how you often need to step away from a relational thinking when working with Elasticsearch.

Punj_Shah · May 24, 2020, 11:58am

Hello @Christian_Dahlqvist,
Thanks for the reply.
How is it possible to have one-many without relating to each other? If theres any way pls suggest me.

I can managed to do One-One by putting the value of itself in the table rather then referencing it.

If I use, join query and if there is any change in join table, I think JDBC plugin will not listen to it. I think JDBC plugin can listen to only field for any changes?

Christian_Dahlqvist · May 24, 2020, 12:07pm

Perform a full join across all tables and store each record as a single document in Elasticsearch. If you include a single field that is the maximum of the modified timestamp for all 4 included tables the JDBC query should identify any document affected by an update and update Elasticsearch accordingly. The Logstash JDBC plugin records the last time it ran and will find documents having a newer timestamp and update these every time it runs.

Punj_Shah · May 24, 2020, 12:14pm

@Christian_Dahlqvist I was also thinking of a similar solution as you suggested.

But I hope this will not create performance issues and there will be a repetition of fields.

Christian_Dahlqvist · May 24, 2020, 12:21pm

Yes, there will be a repetition of fields, but Elasticsearch is quite good at compressing that. It does make handling updates quite easy and is ideally suited for efficiently searching for products. The query would be structured something like this, although you may want to add the categories as a list:

SELECT t.type_name, s.shop_id, s.shop_name, p.product_id, p.name, GREATEST(t.modified_on, s.modified_on, p.mofified_on) as last_modified_on 
WHERE s.shop_id = p.shop_id AND s.shop_id = t.shop_id;

Updates get a bit more expensive as many documents need to be updated if details of shops and categories change, but I would assume this is a reasonably rare thing so it will just add load at times. These updates are likely to be done in bulk anyway which is quite fast and efficient.

Punj_Shah · May 24, 2020, 12:23pm

Christian_Dahlqvist:

SELECT t.type_name, s.shop_id, s.shop_name, p.product_id, p.name, GREATEST(t.modified_on, s.modified_on, p.mofified_on) as last_modified_on 
WHERE s.shop_id = p.shop_id AND s.shop_id = t.shop_id;

Thanks again, you are super quick. Let me give a try and will update you on how it works.

Christian_Dahlqvist · May 24, 2020, 12:24pm

I stripped it down to the base minimum as I have not written a lot of SQL lately. Hopefully it is a useful starting point though.

Punj_Shah · May 24, 2020, 12:35pm

Now, let me try with logstash, for me logstash is completely new though.
If I need any help will bother you again .

Punj_Shah · May 25, 2020, 1:53pm

@Christian_Dahlqvist ,
I implemented the logic exactly as we discussed.
But the problem is, it is creating a new Document everytime it finds an updated value of last_modified_on instead of updating the document.
Due to this, ie, even if the product name is changed or say product is deleted; even though it's deleted or updated the product with new name and old name or deleted it will return in the JSON list.

Would you please guide me on this?

Christian_Dahlqvist · May 25, 2020, 2:21pm

Select fields that uniquely identifies a document and use this as a document_id in your Elasticsearch output.

Punj_Shah · May 25, 2020, 3:09pm

Then I think I need to maintain Composite Key as relation is One-Many.
Let me check.
Thanks for immediate kind response.

Punj_Shah · May 26, 2020, 7:54am

Hello @Christian_Dahlqvist,
MySQL database has two double type fields latitude and longitude for location.
But how to map it to elasticsearch in one field as a location with datatype geo_point using JDBC Plugin.
To achieve similar to below,

    "location" : {
          "lat" : 23.026313,
          "lon" : 72.553062
        }

Thanks in Advance!

system · June 23, 2020, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Realtime Sync MySQL Database to elasticsearch Elasticsearch	3	4013	August 14, 2019
Syncing Elasticsearch with MySQL database Elasticsearch	2	6145	March 15, 2017
Take snapshot every day of Mysql datatbase in ElasticSearch Elasticsearch	2	367	April 3, 2020
Integrate elastic search with MySQL Elasticsearch	3	441	January 16, 2019
Sync MySQL to Elasticsearch Logstash	7	3879	July 6, 2017

Sync Elasticsearch with MySQL Database

Related topics