How to model a Elasticsearch database for ecommerce

(Alberto) #1

I'm developing a ecommerce application and i want to use MySQL database as primary and Elasticsearch for frontend and some features of the admin panel.

After spend a lot of time, trying to find the best schema for Elasticsearch database i can`t be able to find a great solution.

The problem comes with modeling products and categories. since a product can belong to multiple categories and a category obviously has multiple products. I have some posible scenarios, but any one solves my problematic.

  1. Inside the shop index, the product type has collection as nested objects. With this the problem comes when the shop makes some changes in the category, for example change name or description, this forces to make changes in all the products belong the category, and with a large catalog of products could spend lot of time to index all the changes.

  2. The second scenario, it's to make a parent/child relation between collections and products, something like this;
    "id":{ "type": "integer" },
    "name":{ "type": "string" },
    "price":{ "type": "double" }
    "id":{ "type": "integer" },
    "name":{ "type": "string" },
    "description":{ "type": "string" },

With this a product only could belong to one category so no use to me.

  1. The last scenario is, Application-side Joins, but also here i don`t know how to handle many-to-may relationships.

Sorry for my poor english.
Thanks in advance for any help.

(David Pilato) #2

Do you have like 10 millions of products or more?

I mean that without reading any database, I can index 1 million of documents in some minutes on my laptop. Of course, reading a "slow" database would slowdown dramatically your process but I think that you should really try to see how much slow it is before taking any design decision.

IMHO having one single document type is much better.

Elasticsearch does not support many-to-many relationships. It's not a relational system but document oriented system although parent/child feature introduces some relational features but with a cost (memory wise, complexity of requests...)

My advice: keep it simple first. Try to optimize the reads from your database so you can have a better indexing throughput (parallel reads...).

I believe that reindexing a category does not mean reindexing the full catalog. So how many documents at most you'll have to reindex? How long does it take for real?

(Alberto) #3

Thanks for your response David.

I'll try to test a single document type with nested object as yo recommend. Some days ago i try this, but i can be abled to import my mysql database into elastic search. I used JDBC plugin for this, but when i tried to import nested object the imports process fails. In the sql query for the import i use JSON_OBJECTS. I have opened a issue on Github but i don`t have response, this is the link for the issue:

I would be very grateful if you can guide me a bit, where may be the problem with the import.

Thanks again for your response.

(David Pilato) #4

If you can change the application which is creating the data, modify it and send your JSON docs within the same "transaction".

I wrote something here:

(system) #5