Some help for a beginner to structuring his data for elasticsearch

Stefan_Kruse_2 · October 7, 2016, 7:36am

Hi,
i need to move from a mysql db to elasticsearch.
I have the follow situation:
We have these tables: companies, categories and listings.
Companies includes datas from the company like name, address etc.
Subcategories includes a name like "Festmacher","Zollagenturen".
Listings includes the mapping from companies to subcategories like "Company"=>1 and "Subcategory"=>8 and so on. This table ist very big because every relation from a company to a subcategory was saved here.

Ok, now i need to structure this to es. In the endresult i need the following options.

I have to search for subcategories with the result of how many times the founded subcategories are listed in listings. Like "Festmacher => 60" or "Zollagenturen => 101".
The next thing is a overview of all comapnies for a subcategory id. I have a get param like ?scat=1. Now i need all companies listet for this subcategogry.

How can i make this the best way with es? How was a good way to index the datas with an eye of performance?

Thanks

dadoonet · October 7, 2016, 7:55am

If I understand correctly, listings is only a technical table. It does not really mean anything in term of user perspective, right?

So, I'd index companies with all their categories.

If you want to get the number of companies which have a given category, it's then super easy. And having a flat model in elasticsearch is better in term of performance IMO.

Stefan_Kruse_2 · October 7, 2016, 7:59am

Hi, yes you are right, listings is only technical, only for the relations.But i found the solution with saving the categories by a company is not so good. What if a subcategory is changing? Then i have to reindex the companies with the new cateogryname, right? Or do i only index the id by the company?

dadoonet · October 7, 2016, 8:18am

Yes. Reindex.

What is the cost of reindexing?
Let's say you have 1 million companies. How long does it take to reindex 1 million docs in elasticsearch? 1 or 2 minutes? (I'm not speaking about reading data from the source which can be slow but only on the elasticsearch side).

As it depends on many factors you have to test that but as an example, I'm indexing locally on my laptop around 10k to 15k documents per second.

So, basically don't try to solve issue you actually don't have by over engineering your model.
You can use parent / child feature to create relations in elasticsearch but for the use case you presented here, I won't use that. It comes with a cost.

Stefan_Kruse_2 · October 7, 2016, 9:16am

Ok, i will give it a try. And you mean i should index the categoryname directly by the company. Ok, but how can i now search for a subcategory with the count of companies where the subcategory are stored?

dadoonet · October 7, 2016, 9:39am

This?

GET companies/_search
{
  "query": {
    "match": {
       "subcategory":  "mysubcategory"
    }
  }
}

Can you describe with a sample JSON document how your model would be like?
Something like:

{
  "name": "xyz",
  "category": "abc",
  "subcategory": "def"
}

?

Stefan_Kruse_2 · October 7, 2016, 9:44am

This looks good, but subcategories would be an array. But i need all subcategories for a given searchword with the counter of the related companies. Something like:

Scat 1 = 10 Companies
Scat 2 = 6 Companies

dadoonet · October 7, 2016, 10:02am

I think that a Terms aggregation on subcategory field will answer to that.

Stefan_Kruse_2 · October 7, 2016, 10:07am

yeah, this could work, ok thanks a lot for helping me out. i will give this a try. MAybe i come back later thanks

Stefan_Kruse_2 · October 7, 2016, 11:18am

Hi, okay this works fine but one thing i dont understand. I need the aggregation only for the query match result, not for all subcategory fields. You understand?

dadoonet · October 7, 2016, 11:38am

Aggregations are computed on the resultset.

Stefan_Kruse_2 · October 7, 2016, 11:50am

ok, but if i stored the subcategories as array like

{
"subcategories":["Scat One","Scat Two","Scat Three"]
}

then i could search for example Scat One but if i made aggreagtion to the field subcategories he took all Subcategories in the founded docs. But i need only the founded one. In my case "Scat One".

Maybe i am wrong

dadoonet · October 7, 2016, 12:04pm

Unsure but may be using nested type could help?

I'm sorry I can't really help more. May be create a full example as a script like here would help?

Topic		Replies	Views
Need some help for data structuring Elasticsearch	4	359	July 5, 2017
Need help architecting an index/schema (migrating data from MySQL) Elasticsearch	7	424	July 6, 2017
Need some help for creating my model Elasticsearch	8	388	July 6, 2017
How should I structure my shop data? Elasticsearch	3	629	June 20, 2018
Advice on how to best structure my mappings Elasticsearch	1	370	December 28, 2018

Some help for a beginner to structuring his data for elasticsearch

Related topics