Complex similary search with "control group"

MikeT1234 · April 8, 2023, 10:47am

I'm trying to understand how should I implement following search feature.

User can add companies to a control group (for example 100 companies) and search similar companies (max 300) based on
criteria groups like location (region, municipality), industry, revenue and some other fields.
Location documents contain official code (string) + name.
Financial fields like revenue are numeric.

Result should contain scores from different criteria groups so that data can be used in the application' s UI. User should how well location/industry/revenue is matched to the control group and total score.

Could somebody point me to the right direction how this could be done?

MikeT1234 · April 11, 2023, 9:42pm

I would appreciate you comments about this

MikeT1234 · April 14, 2023, 11:33am

Any ideas?

Kathleen_DeRusso · April 18, 2023, 6:59pm

Hi @MikeT1234 - Thanks for your question!

Assuming that the control group and other companies are in the same index (or different indexes with the same alias), you could index flags like is_control_group as a field in your document alongside your criteria fields like location, industry, etc. This way you could easily exclude your control documents from your search results.

From the sounds of it, you may be interested in experimenting with the More Like This query. You can use the More Like This query to specify specific fields (your criteria) that you want to pull similar results for, and even compare them with specific documents in your index.

I'm not quite sure how feasible it would be to get a lot of different scores in a single API response call though - at least not in any way that would be remotely performant. It might be easier to return the search results with their criteria values, and let your UI use rules to highlight the criteria values that matched in a way that makes sense.

Good luck, I'd be interested to know if this strategy works for you!

MikeT1234 · April 20, 2023, 10:44am

Hi Kathleen. Thanks for your input

Using more_like_this - query was one of my first idea how to solve the problem.
According to the documentation I got the impression that it works mainly with string -fields and not with numeric field (+range condition) so I rejected this idea.

Another idea was to make a search per category (location, industry, revenue etc) since number of categories is not that high (max 4) -> it sounded feasible.
Problem here was that I couldn't figure out how to combine these results / how to get the union (max 300 best results), so I rejected this idea.

So far my best idea is to do one query/search using minimum_should_match. It should return best results + do category scoring without Elasticsearch at the back-end side.
It sounded like you ended up to a similar resolution.

I really appreciated your comment. Elasticsearch's documentation is quite vast and I wasn't sure if missed something important

Kathleen_DeRusso · April 20, 2023, 9:18pm

minimum_should_match should work to return only documents that meet your minimum allowed set of matched fields - and you're definitely on the right track that combining results of different searches is a very hard problem as the scores are very different.

The other thing you could potentially find useful is using function_score to boost, say, revenue closest to the average of the control group (but that would probably have to be known at query time).

Good luck!

system · May 18, 2023, 9:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a more_like or similar_to api in enterprise search? Elastic Search	2	292	November 4, 2022
Searching complex parent and child docs in the same query Elasticsearch	3	493	July 6, 2017
Problem with filter Elasticsearch	7	340	April 23, 2021
Aggregate records type based on a field type - but allow for searching with normal filters Elasticsearch	1	274	April 26, 2021
Can I limit the search on sub items based on other fields of subitems? Elasticsearch	2	160	October 20, 2023

Complex similary search with "control group"

Related topics