There is a good blog post about that here and for denormalizing you can use a JOIN between those two tables in your statement option of the JDBC input plugin.
How you structure this will depend on how you expect to search the data and how you want the results back. If you primarily are searching for products and want to filter based on shop details and categories I would recommend looking into fully flattening when you denormalize. Your example would then be transformed into a number of distinct documents like this that consist of data from all tables:
You can store this with a document ID created from shop_id, product_id and potentially also category_id. If you add a timestamp field to this query that is the maximum of the modified_on fields from the tables you will be able to identify which documents that have changed and update just these.
This means that some information is duplicated across documents, but that will provide you will simpler querying and maintenance, so is often a worthwhile tradeoff.
This is an example of how you often need to step away from a relational thinking when working with Elasticsearch.
Hello @Christian_Dahlqvist,
Thanks for the reply.
How is it possible to have one-many without relating to each other? If theres any way pls suggest me.
I can managed to do One-One by putting the value of itself in the table rather then referencing it.
If I use, join query and if there is any change in join table, I think JDBC plugin will not listen to it. I think JDBC plugin can listen to only field for any changes?
Perform a full join across all tables and store each record as a single document in Elasticsearch. If you include a single field that is the maximum of the modified timestamp for all 4 included tables the JDBC query should identify any document affected by an update and update Elasticsearch accordingly. The Logstash JDBC plugin records the last time it ran and will find documents having a newer timestamp and update these every time it runs.
Yes, there will be a repetition of fields, but Elasticsearch is quite good at compressing that. It does make handling updates quite easy and is ideally suited for efficiently searching for products. The query would be structured something like this, although you may want to add the categories as a list:
SELECT t.type_name, s.shop_id, s.shop_name, p.product_id, p.name, GREATEST(t.modified_on, s.modified_on, p.mofified_on) as last_modified_on
WHERE s.shop_id = p.shop_id AND s.shop_id = t.shop_id;
Updates get a bit more expensive as many documents need to be updated if details of shops and categories change, but I would assume this is a reasonably rare thing so it will just add load at times. These updates are likely to be done in bulk anyway which is quite fast and efficient.
@Christian_Dahlqvist ,
I implemented the logic exactly as we discussed.
But the problem is, it is creating a new Document everytime it finds an updated value of last_modified_on instead of updating the document.
Due to this, ie, even if the product name is changed or say product is deleted; even though it's deleted or updated the product with new name and old name or deleted it will return in the JSON list.
Hello @Christian_Dahlqvist,
MySQL database has two double type fields latitude and longitude for location.
But how to map it to elasticsearch in one field as a location with datatype geo_point using JDBC Plugin.
To achieve similar to below,
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.