Search keyword classification problem

Dmitry_Uglatch · May 28, 2019, 7:05am

Dear guys. I have ~10M product index with bunch of fields (mpn, gtin, family, series, brandName, categoryName, summary)

And I'm just trying to find the answer to the next question: what is the best way to parse and claasify user search keyword? I just don't want to search for "Dolce & Gabana" in summary, it must be brandName.

For example how to find brand="My Intelligent Dogs", in keyword "My Intelligent Dogs hut"? Or brand="HP", category="Notebook" in keywird "HP Green Notebook".
It looks like classic ML task for me.
Please share your ideas, what's the best solution to start with?

Mark_Harwood · May 28, 2019, 9:13am

Hi Dmitry,
I've been thinking about these sorts of problems lately.
Generally the approach is "facet snapping" - the automated application of structured filters given an unstructured piece of query text.
Facet snapping would be a query pre-processing pipeline armed with a set of rules - "if you see X in the incoming query, apply filter Y and remove X from the query string". These rules can be generated from your content but you would have to review them.
Let's take your brand name example - this script can examine which structured brand names are also mentioned in the unstructured text of product names in your product catalog. We can see there are issues with some brand names because they are used in product names where the brand is something different. This can occur because of the following reasons:

The brand name is ambiguous e.g. Jigsaw is a clothing brand and many things in the toy department, "MAC" is makeup and a computer.
The brand name is licensed in many different products made by other brands e.g. Apple iphone cases or Disney lunchboxes

You could automatically generate facet snapping rules where my script shows brand names are reliably used (only ever appear in product names/descriptions that are also tagged with that brand). The more ambiguous brand names will need to be human-reviewed for effectiveness in facet-snapping,

Dmitry_Uglatch · May 29, 2019, 12:40pm

Thank you very much Mark,

Fantastic script, simple and obvious. It have to be adapted for my nested supplierLocalNames a bit. Anyway it put some ideas into my head.
I think I got your point. Performed some investigations and see. Looks like it is really necessary to human-review brands like "ME", "WE" and others.
Earlier it caused a problem when I try to auto-filter such brands.

Mark_Harwood · May 29, 2019, 2:03pm

FYI - I made a simple update to the script to export a set of brand names that are reliable phrases (i.e are not too ambiguous) so that they could be used as query-rewriting rules.

system · June 26, 2019, 2:03pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
User search for products Elasticsearch	1	337	July 6, 2017
Deduce filters from search string Elasticsearch	5	363	March 24, 2021
How to query on {brand} and/or {category string} efficiently? Elasticsearch	1	471	July 5, 2017
Extracting brands in documents using keyword and shingles Elasticsearch	9	1661	August 2, 2017
Facet help Elasticsearch	6	342	July 6, 2017

Search keyword classification problem

Related topics