Elasticsearch nested phrase search within a certain distance

Sumit_Jha1 · August 24, 2022, 4:44am

Sample ES Document

{
    // other properties

    "transcript" : [
      {
        "id" : 0,
        "user_type" : "A",
        "phrase" : "hi good afternoon"
      },
      {
        "id" : 1,
        "user_type" : "B",
        "phrase" : "hey"
      }
      {
        "id" : 2,
        "user_type" : "A",
        "phrase_analyzed" : "hi "
      }
      {
        "id" : 3,
        "user_type" : "B",
        "phrase" : "my name is john"
      }
    ]
  }

transcript is a nested field whose mapping looks like

{
   "type":"nested",
   "properties": {
      "id":{
         "type":"integer"
      }
      "phrase": {
         "type":"text",
         "analyzer":"standard"
      },
      "user_type": {
         "type":"keyword"
      }
   }
}

I need to search for two phrases inside transcript that are apart by at max a given distance d.

For example:

If the phrases are hi and name and d is 1, the above document match because hi is present in third nested object, and name is present in fourth nested object. (Note: hi in first nested object and name in fourth nested object is NOT valid, as they are apart by more than d=1 distance)
If the phrases are good and name and d is 1, the above document does not match because good and name are 3 distance apart.
If both phrases are present in same sentence, the distance is considered as 0.

Possible Solution:

I can fetch all documents where both phrases are present and on the application side, I can discard documents where phrases were more than the given threshold(d) apart. The problem in this case could be that I cannot get the count of such documents beforehand in order to show in the UI as found in 100 documents out of 1900 (as without processing from application side, we can't be sure if the document is indeed a match or not, and it's not feasible to do processing for all documents in index)
Second possible solution is:

{
	"query": {
		"bool": {

			// suppose d = 2

			// if first phrase occurs at 0th offset, second phrase can occur at 
			// ... 0th, 1st or 2nd offset

			// if first phrase occurs at 1st offset, second phrase can occur at 
			// ... 1st, 2nd or 3rd offset

			// any one of above permutation should exist

			"should": [
				{
					// search for 1st permutation
				},
				{
					// search for 2nd permutation
				},
				...
			]
		}
	}
}

This is clearly not scalable as if d is large, and if the transcript is large, the query is going to be very very big.

Kindly suggest any approach.

system · September 21, 2022, 4:45am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Query match_phrase in Nested filed Elastic Search elastic-app-search	3	180	April 10, 2024
Query phrase order Elasticsearch	1	350	December 7, 2018
Help with Elasticsearch Query Elasticsearch	2	299	March 15, 2019
Elasticsearch Nested Queries Elasticsearch	2	369	December 3, 2018
Search for documents matching some fields in a nested array Elasticsearch	5	228	January 20, 2023

Elasticsearch nested phrase search within a certain distance

Related topics