I am probably like many people that have more of a SQL background, and am attempting to branch out into using Elasticsearch. I am attempting to make a design choice that I'm hoping I can get some input on. We have a product database of around 50K SKU's (with a need to scale). We are a B2B wholesaler, so pricing varies by customer and by branch. From what I understand, nested documents have potential performance issues, so I'm considering if it would be better to flatten it and use the key to make the pricing unique? For example, I can do the following structure:
Would that likely be better than using nested documents to maintain the unique pricing for each customer/branch? It will always be a 1 to 1 mapping. Pricing doesn't change a ton, but it would need a periodic update as price changes come in.
depending on the number of customer ids and branches you are having this may end up with a huge mapping as every customer id/branch combination becomes an own field.
Using an array based approach and nested datatype like this might be better.
You may want to read about the nested datatype first, as searching in arrays with 'nested' documents is a bit different to prevent retrieving unexpected results.
Thank you for your feedback. I could see the potential issue with having a large mapping. Some of the more common products could easily have hundreds of customers with 20 or more branches (with potential growth for both). With those types of numbers, would nested documents also present potential issues, especially with re-indexing?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.