How to return only matched texts in Elasticsearch aggregation and in source too


(Jay) #1

My query :

POST /testqueryidx/testQuery/_search
{
  "size" : 10,
  "query" : {
    "bool" : {
      "must" : [ {
         "multi_match": {
          "query": "sales*",
          "fields": ["skills"]
      }
     }, {
          "query_string" : {
          "query" : "jay12",
          "fields" : [ "idNum" ]
         }
      } ]
   }
 },
"aggregations" : {
    "aggs" : {
      "terms" : {
          "field" : "skills_sort",
           "size" : 0,
           "order" : {
               "_term" : "asc"
               }
            }
       }
   }
}

Query Results :

 {
    "took": 3,
    "timed_out": false,
      "_shards": {
      "total": 5,
       "successful": 5,
       "failed": 0
  },
      "hits": {
      "total": 1,
       "max_score": 0.9734945,
       "hits": [
        {
        "_index": "testqueryidx",
        "_type": "testQuery",
        "_id": "56909fbdaecb813e8c64e1e8",
        "_score": 0.9734945,
        "_source": {
           "skills": [
              "Account Management",
              "Sales force",
              "Adobe Creative Suite"
           ],
           "_id": "56909fbdaecb813e8c64e1e8",
           "idNum": "jay12"
        }
     }
    ]
   },
  "aggregations": {
  "aggs": {
     "doc_count_error_upper_bound": 0,
     "sum_other_doc_count": 0,
     "buckets": [
        {
           "key": "Account Management",
           "doc_count": 1
        },
        {
           "key": "Adobe Creative Suite",
           "doc_count": 1
        },
        {
           "key": "Sales force",
           "doc_count": 1
        }
      ]
   }
  }
}

Here I searched for keyword Sales in field skills and I got matched documents. You can see one matched sample below:

 "skills": [
              "Account Management",
              "Sales force",
              "Adobe Creative Suite"
           ],

But I dont want "Account Management" and "Adobe Creative Suite" in query results as well in query aggregations. See below aggregation results:

"buckets": [
        {
           "key": "Account Management",
           "doc_count": 1
        },
        {
           "key": "Adobe Creative Suite",
           "doc_count": 1
        },
        {
           "key": "Sales force",
           "doc_count": 1
        }
     ]

Same way I don't want above "key": "Account Management" and "key": "Adobe Creative Suite" in aggregation results as I searched only for Sales .

I got above highlighted texts because skills field in my document has all these three skills but I am interested only in searched keywords. Please help me if anyone has solution for this.


(Ashish Goel) #2

For the source:
You want only "Sales" to be returned. Now, you searched for "Sales", so if any document is being returned by the query, then you already know for sure that "Sales" exist inside the "skills" set. So, you can just get rid of the "skills" set in the source altogether and just make use of the knowledge that "Sales" is there for all the document returned by the query.

For the aggregation:
If the fields inside "skills" are not being analysed (default setting), then you can make use of a filter aggregation with a term filter. It will be faster than terms aggregation and give you exact data that you need.
If they are being analysed (you might have done so to provide text search in this field), then you can use include parameter in your terms filter. Use the entire field text (eg: "sales"), no regular expression needed in your case. You might have to take into account the case of the field ( eg: Sales vs sales or Account Management vs account management)


(Jay) #3

Thanks @Ashish_Goel. But my actual requirement is to get all text which matches with "sales*". Here start(*) is for regular expression. Then I want distinct texts which matched with searched keyword (sales in this case). That's why I used multi_match query in Elasticsearch.

 {
            "multi_match": {
                "query": "sales*",
                 "fields": ["skills"]
    }      

I don't get the meaning of what you replied me => ( get rid of the "skills" set in the source altogether ). Please elaborate to achieve it.

My search will not be case sensitive. It should match either lower case or upper case. Can you please provide the query? Your help is greatly appreciated.


(Ashish Goel) #4

Ok, I guess I misunderstood your case earlier.
I am not sure if you can achieve what you need by multi_match. But maybe, if your search field array is going one entry only, then you can use a nested query with skills as path, match query inside and also a inner_hits clause in the nested query block.
For the case insensitive query, match can have include clause, where you can provide your regex pattern and some flags. Eg:
{ "terms": { "field": "profile.field_val", "include": { "pattern": profile_fields[i].field_val.toLowerCase(), "flags": "LITERAL" } } }
In my case, I needed the query to treat special characters like * to be treated as plain characters, so I used the LITERAL flag. When I was reading about this, there was definitely a flag for making a case insensitive query, I do not remember it exactly though.


(Jay) #5

Thanks @Ashish_Goel


(system) #6