Advice on how to best store my data

I am having a problem with how I should best store my data in Elasticsearch. I want it to be able to store the data in the exact format that it will be returned to the user in. I also don't want to do any processing after getting results from Elasticsearch. I will be aggregating on the query made to get a count of data based on the possible filters.

I have an API endpoint which allows users to search for companies by name and returns results in this format:

{
	"company": {
		"name": "KFC",
		"status": "running",
		"Cuisine": "Kentucky Fried Chicken"
	},
	"mainRestaurant": {
		"location": {
			"road": "main road",
			"city": "New York City",
			"state": "New York",
			"country": "USA"
		},
		"status": "running",
		"type": "Flagship restaurant"
	}
}

This data is pushed together from a company database and a restaurant database. The mainRestaurant is selected from a list of restaurants based on the following criteria:

  • If there is a Flagship restaurant, use that.
  • If there isn't, then use the first restaurant found in the database where there is a certain flag called important set to true.
  • If there is none of them, choose the first one you find in the database.

All companies will have a single Flagship restaurant so the above logic isn't a problem when there is no location filters applied because the mainRestaurant will always be the Flagship restaurant of the company. However, if a location is passed in then it is likely that we will move onto later parts of the logic.

What I want to be able to do is take a single result from Elasticsearch for each company per search and my plan was to add a single result per company per location. So the index would only hold the main restaurant for KFC (and each other company) in each location. So for New York it would store the above result and then for London, it might have a result that looks like the following:

{
	"company": {
		"name": "KFC",
		"status": "running",
		"Cuisine": "Kentucky Fried Chicken"
	},
	"mainRestaurant": {
		"location": {
			"road": "main road",
			"city": "London",
			"state": "England",
			"country": "UK"
		},
		"status": "running",
		"type": "important restaurant"
	}
}

The problem here is that there is now multiple results in Elasticsearch for each company with no easy way to grab the best one in a single query. My solution for this was to add a wrapper around the object with some meta data. So for the first result (KFC Flagship):

{
	"bestRestaurantIn": ["*", "USA", "New York", "New York City"],
	"result": {
		... data ...
	}
}

And for the London important restaurant, it would look like this:

{
	"bestRestaurantIn": ["UK", "England", "London"],
	"result": {
		... data ...
	}
}

And for the best restaurant in Manchester, it would be like this:

{
	"bestRestaurantIn": ["Manchester"],
	"result": {
		... data ...
	}
}

So if you don't pass a location, I could add a term clause where there was a * so that for the company "KFC", only the one with the* in the bestRestaurantIn field would be returned when a search was made. If you passed "Manchester" as the location filter, it would get only the KFC in Manchester because there should only be one result in Manchester for KFC.

Here are some examples when searching for "kf" in the name. It should give:

  • When location filter is empty -> the Flagship restaurant in New York should be returned as it is the overall main restaurant of KFC
  • When location filter is New York -> the Flagship restaurant in New York as it is the main restaurant in New York For KFC.
  • When location filter is Manchester -> the KFC in Manchester should be returned.

So, if there was another restaurant called "KFB" (assuming there was no other results in Elasticsearch where the name matched the search "KF") then there should only ever a maximum of 2 results returned per location and in some cases 1 or 0 if the companies didn't have a restaurant in a particular location.

This above solution works correctly. However, it becomes a problem when more filters are added. Each restaurant in the restaurant database has a status. Which can either be "running" or "shutdown". This means that we would need to add a "running" restaurant and a "shutdown" restaurant (where there is one) for each company in each location. We also need to add a flag to the wrapper which is something like "isBestWithNoFilters" which would probably be the "running" restaurant.

My issue is that this does no scale well at all. If I need to add multiple on the restaurant level then this would become incredibly complicated. Is there a pattern that I should be following for this?

Sorry for the long post. I hope I have made it clear!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.