Query and score similar documents based on hierarchical data

Borwick_5576 · July 9, 2019, 6:22pm

I'm working in creating and simplifying probabilistic topic models for large corpora of data. Every document I would index will contain a field having the following format:

...
"topics" : {
"level0" : ["keywords"],
"level1" : ["keywords"],
"level2" : ["keywords"]
}
...

I would want to make a query that, given a document id (let us call this doc D), would give me the documents (let us call all possible hits H) which are similar based on this field. In order to get a match, one of all level keywords from D has to be present in any of the levels. Then, the score of each hit should be higher if the keyword they share is at a lower level in D. It's should get higher if the keyword they share is at a lower level in H.

I'm currently using the following query

"query": {
	"bool" : {
		"should" : [
	{
		"multi_match" : {
			"boost" : 3,
			"query":    "Keyword_A", 
			"fields": [ "topics.l0", "topics.l1", "topics.l2" ] 
	} },
	{
		"multi_match" : {
			"boost": 2,
			"query":    "Keyword_B", 
			"fields": [ "topics.l0", "topics.l1", "topics.l2" ] 
	} },
	{
		"multi_match" : {
		"query":    "Keyword_C", 
		"fields": [ "topics.l0", "topics.l1", "topics.l2" ] 
	}
	}
			
		]
	}
	
} #for now, Keywords_[A, B, C] are taken from **D** manually as I don't know how to fetch this fields directly into a query

In combination with index boost in each of the field.

Is there a better way for me to define this query or the score?

Thanks in advance

system · August 6, 2019, 6:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query hierarchical data Elasticsearch	5	3037	August 21, 2019
Matching multiple topics (sets of keywords) Elasticsearch	4	1139	May 28, 2021
Phrase_query for multiple should and must Elasticsearch	1	235	August 16, 2022
How to provide more score when the "terms" query has multiple match? Elasticsearch	1	326	February 11, 2019
Boost should query Elasticsearch	2	1880	November 19, 2019

Query and score similar documents based on hierarchical data

Related topics