Recommendations returning the matched item

I send a request for suggestions using 'product A'.
My results include the item I am attempting to get recommendations for.

Here is the command:

POST order_history/ORDERHISTORY/_search 
      {
            "query": {

            "match": {

                "products.keyword": "2013WS7 CASE BASED APPROACH TO DIAGNOSING AND TREATING PATIENTS WITH NON-TUBERCULOUS MYCOBACTERIAL LUNG INFECTIONS"

            }

        },

        "aggregations": {

            "products_like_atspbr": {

                "significant_terms": {

                    "field": "products.keyword",

                    "min_doc_count": 1

                }

            }

        }
      }

Here are the results, with the matched item as the first result record:

"aggregations": {
    "products_like_atspbr": {
      "doc_count": 6,
      "bg_count": 714,
      "buckets": [
        {
          "key": "2013WS7 CASE BASED APPROACH TO DIAGNOSING AND TREATING PATIENTS WITH NON-TUBERCULOUS MYCOBACTERIAL LUNG INFECTIONS",
          "doc_count": 6,
          "score": 118,
          "bg_count": 6
        },
        {
          "key": "2013WS5 DIAGNOSIS AND MANAGEMENT OF PNEUMONIA IN THE IMMUNOCOMPROMISED HOST",
          "doc_count": 3,
          "score": 44.125,
          "bg_count": 4
        },
        {
          "key": "2013PG20 RESPIRATORY PHYSIOLOGY MASTER CLASS",
          "doc_count": 4,
          "score": 25.77777777777777,
          "bg_count": 12
        },
        {
          "key": "2013PG14 PLEURAL DISORDERS",
          "doc_count": 3,
          "score": 21.8125,
          "bg_count": 8
        },
        {
          "key": "2013PG15 LUNG CANCER: STATE OF THE ART IN 2013",
          "doc_count": 3,
          "score": 21.8125,
          "bg_count": 8
        },
        {
          "key": "2013WS3 LUNG CANCER TUMOR BOARD: HOW DO EXPERTS MANAGE DIFFICULT CASES?",
          "doc_count": 1,
          "score": 19.666666666666664,
          "bg_count": 1
        },
        {
          "key": "2016A9 CONTROVERSIES IN SLEEP MEDICINE: DAVIDS, GOLIATHS, AND SOME BLOOD ON THE FLOOR!",
          "doc_count": 1,
          "score": 19.666666666666664,
          "bg_count": 1
        },
        {
          "key": "2013A12 SEVERE ASTHMA:  GRADING THE CURRENT EVIDENCE AND PLANNING FOR THE FUTURE",
          "doc_count": 1,
          "score": 19.666666666666664,
          "bg_count": 1
        },
        {
          "key": "2013PG8 UNDER PRESSURE: THE RIGHT VENTRICLE IN HEALTH, EXERCISE, AND DISEASE",
          "doc_count": 1,
          "score": 19.666666666666664,
          "bg_count": 1
        },
        {
          "key": "2013C83 THE NEW ENGLAND JOURNAL OF MEDICINE AND THE JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION.  DISCUSSION ON THE EDGE: REPORTS OF RECENTLY PUBLISHED PULMONARY RESEARCH",
          "doc_count": 1,
          "score": 19.666666666666664,
          "bg_count": 1
        }
      ]
    }
  }
}

Why is the item showing in the results?

The aggregation doesn't "know" about the query whose results it is summarizing. It could, for example, have been a complex regex query or complex Boolean query etc so it doesn't attempt to parse the query logic to know what to exclude as terms. You can typically use the exclude setting of the significant_terms agg to provide an array of terms you don't want to see in the results.

thanks!

works like a charm!

POST order_history/ORDERHISTORY/_search 
  {
        "query": {

        "match": {

            "products.keyword": "2013WS7 CASE BASED APPROACH TO DIAGNOSING AND TREATING PATIENTS WITH NON-TUBERCULOUS MYCOBACTERIAL LUNG INFECTIONS"

        }

    },

    "aggregations": {

        "products_like_atspbr": {

            "significant_terms": {

                "field": "products.keyword",

                "min_doc_count": 1,
                **"exclude**": "2013WS7 CASE BASED APPROACH TO DIAGNOSING AND TREATING PATIENTS WITH NON-TUBERCULOUS MYCOBACTERIAL LUNG INFECTIONS"

            }

        }

    }
  }

It works for that string but may be slow and not work for other strings.
This is because single-string exclude values are interpreted as regex expressions while an array is presumed to contain a list of exact-match strings. Based on this you should put your single string in an array expression

1 Like

Point taken!
Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.