Query to ignore fields with empty content


#1

I have the following requirements
a) Show suggestions if a query has hits in both query_terms and trigger_terms
b) Show suggestions if a query has hits in query_terms and trigger_terms is an empty array with no content

Here is my mapping

{
     "mappings": {
       "concepts": {
            "properties": {
                "query_terms": {
                    "type": "keyword"
                },
               "trigger_terms": {
                    "type": "keyword"
                },
                "suggestions": {
                    "type": "keyword"
                }
            }
        }
    }
}

Here is the sample data

Document 1:
_source
{
	"query_terms" : [
	"audit",
	"audits"
	"audited"
	],
	"trigger_terms" : [
	"audit",
	"return"
	],
	"suggestions" : '[
	"examination",
	"inspection",
	"investigation"
	]
}

Document 2:
_source
{
	"query_terms" : [
	"next",
	"test"
	"audited"
	],
	"trigger_terms" : [
	"next",
	"return",
	"test"
	],
	"suggestions" : '[
	"same offense",
	"same evidence",
	"investigation"
	]
}

Document 3:
_source
{
	"query_terms" : [
	"best",
	"fit"
	"perfect",
	"audit"
	],
	"trigger_terms" : [ ],
	"suggestions" : '[
	"double jeopardy",
	"same elements",
	"same evidence"
	]
}

And my query

{
	"query": {
		"bool": {
			"must": [
			    {
				  "terms": 
				  {
				    	"query_terms": ["audit"]
				  }
			   }, 
			   {
				  "terms": 
				  {
					"trigger_terms": ["audit"]
				  } 
			   }
			]
		}
	}
}

when I execute the query against Elasticsearch 5.2, I expect to see Document1 ( since audit has hits in both query_terms and trigger_terms ) and Document3 ( since audit has hits in query_terms and trigger_terms is an empty array) but I only see Document1

I modified the query but that doesn't work either

{
	"query": {
		"bool": {
			"must": [
			    {
				  "terms": 
				  {
				      "query_terms": ["audit"]
				  }
			   }, 
			   {
				  "terms": 
				  {
					  "trigger_terms": ["audit"]
				  } 
			   },
			   {
				  "exists" : 
				  {
			 	      "field" : "trigger_terms"
			      }
			   }
			]
		}
	}
}

could someone please help?


(Jaspreet Singh) #2

Is it fair to say, you want a result if there is a hit in query_terms but it will be nicer if there is also a hit in trigger_terms? Is so, you should have query_terms in must and trigger_terms in should. This way all your hits will definitely have a match in query_terms. Hits with a match in trigger_terms will be scored higher.


#3

Thanks for replying Jaspreet! A hit in both query_terms and trigger_terms is a must in order to return suggestions and the only exception to that is when trigger_terms is an empty array.


(Jaspreet Singh) #4

Ok, lets try this.
Btw I added a 4th document, that has "audit" in trigger_terms but not in query_terms. This is another negative case, just to ensure we cover all scenarios.
We need something like ... (query_terms="audit" AND (trigger_terms="audit OR trigger_terms="")
So here is the query that worked for me ...

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "query_terms": "audit"
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "bool": {
                                    "must_not": [
                                        {
                                            "exists": {
                                                "field": "trigger_terms"
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "term": {
                                                "trigger_terms": "audit"
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

I see the following correct results ...

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.287682,
    "hits": [
      {
        "_index": "test1",
        "_type": "concepts",
        "_id": "3",
        "_score": 1.287682,
        "_source": {
          "query_terms": [
            "best",
            "fit",
            "perfect",
            "audit"
          ],
          "trigger_terms": [],
          "suggestions": [
            "double jeopardy",
            "same elements",
            "same evidence"
          ]
        }
      },
      {
        "_index": "test1",
        "_type": "concepts",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "query_terms": [
            "audit",
            "audits",
            "audited"
          ],
          "trigger_terms": [
            "audit",
            "return"
          ],
          "suggestions": [
            "examination",
            "inspection",
            "investigation"
          ]
        }
      }
    ]
  }
}

#5

Thanks a lot Jaspreet! This query is working as expected.


#6

Hi Jaspreet,

There seems to be an edge case where this is not working as expected.

Let's say we have the following document

{
	"query_terms" : [
	"force",	
	"audited"
	],
	"trigger_terms" : [
	"audit"	
	],
	"suggestions" : '[
	"examination",
	"inspection",
	"investigation"
	]
}

and my query terms are ["force", "audit"], I don't expect the above document in the response ( since force is prersent in query_terms but not in trigger_terms and audit is present in trigger_terms but not in query_terms ) but it does show up in the response.

Here is the query

{
    "query": {
        "bool": {
            "must": [
                {
                    "term": {
                        "query_terms":  ["force", "audit"]
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "bool": {
                                    "must_not": [
                                        {
                                            "exists": {
                                                "field": "trigger_terms"
                                            }
                                        }
                                    ]
                                }
                            },
                            {
                                "bool": {
                                    "must": [
                                        {
                                            "term": {
                                                "trigger_terms": ["force", "audit"]
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Could you please help?


#7

Hi Jaspreet,

Did you get a chance to look at this?

Thanks,
Pradeep