Query to ignore fields with empty content

I have the following requirements
a) Show suggestions if a query has hits in both query_terms and trigger_terms
b) Show suggestions if a query has hits in query_terms and trigger_terms is an empty array with no content

Here is my mapping

{
     "mappings": {
       "concepts": {
            "properties": {
                "query_terms": {
                    "type": "keyword"
                },
               "trigger_terms": {
                    "type": "keyword"
                },
                "suggestions": {
                    "type": "keyword"
                }
            }
        }
    }
}

Here is the sample data

Document 1:
_source
{
	"query_terms" : [
	"audit",
	"audits"
	"audited"
	],
	"trigger_terms" : [
	"audit",
	"return"
	],
	"suggestions" : '[
	"examination",
	"inspection",
	"investigation"
	]
}

Document 2:
_source
{
	"query_terms" : [
	"next",
	"test"
	"audited"
	],
	"trigger_terms" : [
	"next",
	"return",
	"test"
	],
	"suggestions" : '[
	"same offense",
	"same evidence",
	"investigation"
	]
}

Document 3:
_source
{
	"query_terms" : [
	"best",
	"fit"
	"perfect",
	"audit"
	],
	"trigger_terms" : [ ],
	"suggestions" : '[
	"double jeopardy",
	"same elements",
	"same evidence"
	]
}

And my query

{
	"query": {
		"bool": {
			"must": [
			    {
				  "terms": 
				  {
				    	"query_terms": ["audit"]
				  }
			   }, 
			   {
				  "terms": 
				  {
					"trigger_terms": ["audit"]
				  } 
			   }
			]
		}
	}
}

when I execute the query against Elasticsearch 5.2, I expect to see Document1 ( since audit has hits in both query_terms and trigger_terms ) and Document3 ( since audit has hits in query_terms and trigger_terms is an empty array) but I only see Document1

I modified the query but that doesn't work either

{
	"query": {
		"bool": {
			"must": [
			    {
				  "terms": 
				  {
				      "query_terms": ["audit"]
				  }
			   }, 
			   {
				  "terms": 
				  {
					  "trigger_terms": ["audit"]
				  } 
			   },
			   {
				  "exists" : 
				  {
			 	      "field" : "trigger_terms"
			      }
			   }
			]
		}
	}
}

could someone please help?

Is it fair to say, you want a result if there is a hit in query_terms but it will be nicer if there is also a hit in trigger_terms? Is so, you should have query_terms in must and trigger_terms in should. This way all your hits will definitely have a match in query_terms. Hits with a match in trigger_terms will be scored higher.

Thanks for replying Jaspreet! A hit in both query_terms and trigger_terms is a must in order to return suggestions and the only exception to that is when trigger_terms is an empty array.

Ok, lets try this.
Btw I added a 4th document, that has "audit" in trigger_terms but not in query_terms. This is another negative case, just to ensure we cover all scenarios.
We need something like ... (query_terms="audit" AND (trigger_terms="audit OR trigger_terms="")
So here is the query that worked for me ...

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "query_terms": "audit"
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "bool": {
                                    "must_not": [
                                        {
                                            "exists": {
                                                "field": "trigger_terms"
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "term": {
                                                "trigger_terms": "audit"
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

I see the following correct results ...

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.287682,
    "hits": [
      {
        "_index": "test1",
        "_type": "concepts",
        "_id": "3",
        "_score": 1.287682,
        "_source": {
          "query_terms": [
            "best",
            "fit",
            "perfect",
            "audit"
          ],
          "trigger_terms": [],
          "suggestions": [
            "double jeopardy",
            "same elements",
            "same evidence"
          ]
        }
      },
      {
        "_index": "test1",
        "_type": "concepts",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "query_terms": [
            "audit",
            "audits",
            "audited"
          ],
          "trigger_terms": [
            "audit",
            "return"
          ],
          "suggestions": [
            "examination",
            "inspection",
            "investigation"
          ]
        }
      }
    ]
  }
}
1 Like

Thanks a lot Jaspreet! This query is working as expected.

Hi Jaspreet,

There seems to be an edge case where this is not working as expected.

Let's say we have the following document

{
	"query_terms" : [
	"force",	
	"audited"
	],
	"trigger_terms" : [
	"audit"	
	],
	"suggestions" : '[
	"examination",
	"inspection",
	"investigation"
	]
}

and my query terms are ["force", "audit"], I don't expect the above document in the response ( since force is prersent in query_terms but not in trigger_terms and audit is present in trigger_terms but not in query_terms ) but it does show up in the response.

Here is the query

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "query_terms":  ["force", "audit"]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "bool": {
                                    "must_not": [
                                        {
                                            "exists": {
                                                "field": "trigger_terms"
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "term": {
                                                "trigger_terms": ["force", "audit"]
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Could you please help?

Hi Jaspreet,

Did you get a chance to look at this?

Thanks,
Pradeep

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.