Design Decision: Serve Search Result from ES or cached DB


we are developing a new service right now, which essentially represents a catalogue for online marketing purpose.

The catalogue consists of 4 Million highly structured and individual customer specified products.
The catalogue can be searched with elasticsearch with about 50 search filters including aggregations and histograms.

From a design point of view, we see two options right now:

  1. Serving the search results completely from the ES index (incl. caching) OR
  2. Only retrieving the IDs of the documents from the ES index and serving the documents representing the search results from our cache supported database (Hazelcast => Hibernate => Oracle).

Right now we would prefere (1.) with a future option of (2.).
Is (1.) a valid option concerning scalibility and throughput?


I prefer option 1 but if you want get back managed entities, 2 is better. Good news: hibernate search now supports elasticsearch so it can be easy to implement option 2.

Still, if you just want to display results to the user, I'd go to option 1.

Keep it simple.

It's not a design decision but a cost-benefit-ratio decision. You have to do it for yourself.

To find out the ratio, execute a performance benchmark on option 1 und 2, and measure much throughput you can achieve. Take the future growth of your requirements into account (You want scalability but do not give any estimated target)

Then add the people and effort (licenses, staff, machines) you need to maintain in option 1 (ES) versus option 2 (ES/Hazelcast/Hibernate/Oracle stack) and write down the costs. Here, you have also take the future growth of costs into account.

If the benefit of performance is so high that it justifies the cost, go for it.

We are aware of the Option (2.) scenario performance: As it utilizes a local memory and filesystem cache, it is fast and will scale (independent of ES). For showing single products, we will have that implementation in any case.

As far as I understood option (1.) "should" work.

@jprante: The idea of benchmarking option (1.) and having a look when it breaks is indeed a very good idea.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.