Problem in match query

Hello, I am a beginner in Elasticsearch.

I'm encountering an issue with searching with specific words within a text field in my Elasticsearch index.

Currently, I'm attempting to search for certain words within the field "toxicity". For instance, when I search for "sleep", I can see it is in the toxicity field of Fluvoxamine.

GET /drugbank/_search
{
  "query": {
    "match_phrase": {
      "toxicity": "sleep"
    }
  }
}
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 8,
      "relation": "eq"
    },
    "max_score": 6.042759,
    "hits": [
      {
        "_index": "drugbank",
        "_id": "DB00176",
        "_score": 6.042759,
        "_source": {
          "id": "DB00176",
          "name": "Fluvoxamine",
          "indication": "For management of depression and for Obsessive Compulsive Disorder (OCD). Has also been used in the management of bulimia nervosa.",
          "toxicity": "Side effects include anorexia, constipation, dry mouth, headache, nausea, nervousness, skin rash, sleep problems, somnolence, liver toxicity, mania, increase urination, seizures, sweating increase, tremors, or Tourette's syndrome.",
          "atc_code": "N06AB08"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB09014",
        "_score": 5.2875047,
        "_source": {
          "id": "DB09014",
          "name": "Captodiame",
          "indication": "Captodiame is indicated for the treatment of anxiety. ",
          "toxicity": """-TDLo oral 17mg/kg (human) BEHAVIORAL: ALTERED SLEEP TIME (INCLUDING CHANGE IN RIGHTING REFLEX). PMID: 13535337.
-LD50 intraperitoneal 116mg/kg (mouse) BEHAVIORAL: CONVULSIONS OR EFFECT ON SEIZURE THRESHOLD. PMID: 13062090.
-LD50 intravenous 72mg/kg (mouse) BEHAVIORAL: ATAXIA. PMID: 14109651.""",
          "atc_code": "N05BB02"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB09034",
        "_score": 5.1327043,
        "_source": {
          "id": "DB09034",
          "name": "Suvorexant",
          "indication": "Suvorexant is indicated for the treatment of insomnia characterized by difficulties with sleep onset and/or sleep maintenance.",
          "toxicity": "Dose-related somnolence and CNS depression are the most common adverse effects associated with the use of suvorexant. It has also been shown to impair driving skills and may increase the risk of falling asleep while driving. Next-day impairments are found to be highest if suvorexant is taken with less than a full night of sleep remaining, with higher doses, or if co-administered with other CNS depressants or CYP3A inhibitors. Complex behaviours such as sleep driving, preparing and eating food, and making phone calls have been reported in association with the use of hypnotics such as suvorexant. A dose-dependant increase in suicidal ideation has been observed, especially in patients with a previous diagnosis of depression. Sleep paralysis, hypnagogic/hypnopompic hallucinations including vivid and disturbing perceptions, and mild cataplexy have also been reported. There are no adequate studies in pregnant women to ensure its safety during pregnancy or breast feeding. ",
          "atc_code": ""
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB01234",
        "_score": 4.8343396,
        "_source": {
          "id": "DB01234",
          "name": "Dexamethasone",
          "indication": """<B>Injection:</B> for the treatment of endocrine disorders, rheumatic D=disorders, collagen diseases, dermatologic diseases, allergic statesc, ophthalmic diseases, gastrointestinal diseases, respiratory diseases, hematologic disorders, neoplastic diseases, edematous states, cerebral edema.
<br><B>Ophthalmic ointment and solution:</B> for the treatment of steroid responsive inflammatory conditions of the palpebral and bulbar conjunctiva, cornea, and anterior segment of the globe.
<br><B>Ophthalmic solution only:</B> for the treatment of steroid responsive inflammatory conditions of the external auditory meatus
<br><B>Topic cream:</B> for relief of the inflammatory and pruritic manifestations of corticosteroid-responsive dermatoses
<br><B>Oral aerosol:</B> for the treatment of bronchial asthma and related corticosteroid responsive bronchospastic states intractable to adequate trial of conventional therapy
<br><B>Intranasal aerosol:</B> for the treatment of allergic ot inflammatory nasal conditions, and nasal polyps""",
          "toxicity": "Oral, rat LD<sub>50</sub>: >3 gm/kg. Signs of overdose include retinal toxicity, glaucoma, subcapsular cataract, gastrointestinal bleeding, pancreatitis, aseptic bone necrosis, osteoporosis, myopathies, obesity, edemas, hypertension, proteinuria, diabetes, sleep disturbances, psychiatric syndromes, delayed wound healing, atrophy and fragility of the skin, ecchymosis, and pseudotumor cerebri.",
          "atc_code": "R01AD03"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB09185",
        "_score": 4.8343396,
        "_source": {
          "id": "DB09185",
          "name": "Viloxazine",
          "indication": "Indicated for the treatment of clinical depression.",
          "toxicity": "Common adverse effects are nausea and vomiting. Other side effects are dry mouth, dizziness, headache, drowsiness, sleep disturbances, bad taste, anorexia, heartburn and indigestion, constipation, diarrhoea, ataxia, tremor, dyskinesia, paraesthesia, confusion, restlessness, irritability, hypomania and mania, sweating, palpitation, tachycardia, increased and decreased blood pressure, pruritus and skin rashes [A19786].",
          "atc_code": "N06AX09"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB09118",
        "_score": 4.49212,
        "_source": {
          "id": "DB09118",
          "name": "Stiripentol",
          "indication": "Indicated for use in conjunction with clobazam and valproate as adjunctive therapy of refractory generalized tonic-clonic seizures in patients with severe myoclonic epilepsy in infancy (SMEI, Dravet’s syndrome) whose seizures are not adequately controlled with clobazam and valproate.",
          "toxicity": "Most common adverse effects include anorexia, loss of appetite, nausea, vomiting, weight loss, reversible neutropenia, insomnia, drowsiness, ataxia, dystonia, hyperkinesia, hypotonia. Aggression, irritability, behaviour disorders, opposing behaviour, hyper excitability and sleep disorders may also be observed. Stiripentol does not demonstrate teratogenic, mutagenic, clastogenic, or carcinogenic potential [L880]. Oral LD50 in rats is >3 g/kg [MSDS].",
          "atc_code": "N03AX17"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB06700",
        "_score": 3.9350076,
        "_source": {
          "id": "DB06700",
          "name": "Desvenlafaxine",
          "indication": "Desvenlafaxine is indicated for the treatment of major depressive disorder in adults.",
          "toxicity": "The safety and tolerability of desvenlafaxine is similar to other SNRIs. Common side effects upon initiation or dose increase include increased blood pressure and heart rate, agitation, tremor, sweating, nausea, headache, and sleep disturbances. May cause sexual dysfunction and weight loss in some patients. May cause increases in fasting serum total cholesterol, LDL cholesterol, and triglycerides. Withdrawal effects may occur and thus, the dose of desvenlafaxine should be titrated down prior to discontinuation. ",
          "atc_code": "N06AX23"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB01048",
        "_score": 3.3177977,
        "_source": {
          "id": "DB01048",
          "name": "Abacavir",
          "indication": "For the treatment of HIV-1 infection, in combination with other antiretroviral agents.",
          "toxicity": "Some myocardial degeneration has been noticed in rats and mice. The most commonly reported adverse reactions of at least moderate intensity (incidence ≥10%) in adult HIV-1 clinical trials were nausea, headache, malaise and fatigue, nausea and vomiting, and dreams/sleep disorders. Serious hypersensitivity reactions have been associated with abacavir which has been strongly linked to the presence of the HLA-B*57:01 allele. This reaction manifests itself in patients within the first 6 weeks of treatment. Patients should be tested for the presence of this allele as recommended by the U.S Food and Drug Administration (FDA). ",
          "atc_code": "J05AR13"
        }
      }
    ]
  }
}

However, when I search for "nausea", I'm unable to retrieve Fluvoxamine:

GET /drugbank/_search
{
  "query": {
    "match_phrase": {
      "toxicity": "nausea"
    }
  }
}
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 385,
      "relation": "eq"
    },
    "max_score": 2.326837,
    "hits": [
      {
        "_index": "drugbank",
        "_id": "DB00286",
        "_score": 2.326837,
        "_source": {
          "id": "DB00286",
          "name": "Conjugated estrogens",
          "indication": "Conjugated Equine Estrogens (CEEs) are indicated for the following conditions: treatment of moderate to severe vasomotor symptoms and vulvovaginal atrophy associated with menopause; hypoestrogenism due to hypogonadism, castration or primary ovarian failure; palliation of metastatic breast cancer; palliation of advanced androgen-dependent carcinoma of the prostate; and for prevention of postmenopausal osteoporosis.",
          "toxicity": "Nausea and vomiting",
          "atc_code": "G03CC07"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB01213",
        "_score": 2.326837,
        "_source": {
          "id": "DB01213",
          "name": "Fomepizole",
          "indication": "Antizol is indicated as an antidote for ethylene glycol (such as antifreeze) or methanol poisoning, or for use in suspected ethylene glycol or methanol ingestion, either alone or in combination with hemodialysis",
          "toxicity": "Headache, nausea, dizziness",
          "atc_code": "V03AB34"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB01140",
        "_score": 2.181436,
        "_source": {
          "id": "DB01140",
          "name": "Cefadroxil",
          "indication": "For the treatment of the following infections (skin, UTI, ENT) caused by; <i>S. pneumoniae, H. influenzae, staphylococci, S. pyogenes</i> (group A beta-hemolytic streptococci), <i>E. coli, P. mirabilis, Klebsiella</i> sp, coagulase-negative staphylococci and <i>Streptococcus pyogenes</i>",
          "toxicity": "Nausea, vomiting, diarrhoea, allergic rashes may occur",
          "atc_code": "J01DB05"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB01200",
        "_score": 2.1671572,
        "_source": {
          "id": "DB01200",
          "name": "Bromocriptine",
          "indication": "For the treatment of galactorrhea due to hyperprolactinemia, prolactin-dependent menstrual disorders and infertility, prolactin-secreting adenomas, prolactin-dependent male hypogonadism, as adjunct therapy to surgery or radiotherapy for acromegaly or as monotherapy is special cases, as monotherapy in early Parksinsonian Syndrome or as an adjunct with levodopa in advanced cases with motor complications. Bromocriptine has also been used off-label to treat restless legs syndrome and neuroleptic malignant syndrome.",
          "toxicity": "Symptoms of overdosage include nausea, vomiting, and severe hypotension. The most common adverse effects include nausea, headache, vertigo, constipation, light-headedness, abdominal cramps, nasal congestion, diarrhea, and hypotension. ",
          "atc_code": "G02CB01"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB06654",
        "_score": 2.1504695,
        "_source": {
          "id": "DB06654",
          "name": "Safinamide",
          "indication": "Safinamide is indicated as an add-on treatment to levodopa with or without other medicines for Parkinson’s disease",
          "toxicity": """uncontrolled involuntary movement, falls, nausea, and trouble sleeping or falling asleep (insomnia)
Expected overdose effects are hypertension (high blood pressure), orthostatic hypotension, hallucinations, psychomotor agitation, nausea, vomiting, and dyskinesia. """,
          "atc_code": ""
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB00304",
        "_score": 2.1478813,
        "_source": {
          "id": "DB00304",
          "name": "Desogestrel",
          "indication": "For the prevention of pregnancy in women who elect to use this product as a method of contraception.",
          "toxicity": "Symptoms of overdose include nausea and vaginal bleeding.",
          "atc_code": "G03AC09"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB00520",
        "_score": 2.1478813,
        "_source": {
          "id": "DB00520",
          "name": "Caspofungin",
          "indication": "For the treatment of esophageal candidiasis and invasive aspergillosis in patients who are refractory to or intolerant of other therapies.",
          "toxicity": "Side effects include rash, swelling, and nausea (rare)",
          "atc_code": "J02AX04"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB00609",
        "_score": 2.1478813,
        "_source": {
          "id": "DB00609",
          "name": "Ethionamide",
          "indication": "For use in the treatment of pulmonary and extrapulmonary tuberculosis when other antitubercular drugs have failed.",
          "toxicity": "Symptoms of overdose include convulsions, nausea, and vomiting.",
          "atc_code": "J04AD03"
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB13163",
        "_score": 2.1478813,
        "_source": {
          "id": "DB13163",
          "name": "Terpin hydrate",
          "indication": "Terpin hydrate is an expectorant, used in the treatment of acute and chronic bronchitis, pneumonia, bronchiectasis, chronic obstructive pulmonary disease, infectious and inflammatory diseases of the upper respiratory tract. It is typically formulated with an antitussive (e.g., codeine) as a combined preparation.",
          "toxicity": "Overdose can cause nausea, vomiting and abdominal pain. ",
          "atc_code": ""
        }
      },
      {
        "_index": "drugbank",
        "_id": "DB00448",
        "_score": 2.115343,
        "_source": {
          "id": "DB00448",
          "name": "Lansoprazole",
          "indication": "For the treatment of acid-reflux disorders (GERD), peptic ulcer disease, H. pylori eradication, and prevention of gastroinetestinal bleeds with NSAID use.",
          "toxicity": "Symptoms of overdose include abdominal pain, nausea and diarrhea.",
          "atc_code": "A02BC53"
        }
      }
    ]
  }
}

Upon investigating further, I looked into the tokens associated with the document and found that "nausea" is indeed present:

GET /drugbank/_termvectors/DB00176?fields=toxicity
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
  "_index": "drugbank",
  "_id": "DB00176",
  "_version": 1,
  "found": true,
  "took": 2,
  "term_vectors": {
    "toxicity": {
      "field_statistics": {
        "sum_doc_freq": 50525,
        "doc_count": 1638,
        "sum_ttf": 64651
      },
      "terms": {
        "anorexia": {
          "doc_freq": 32,
          "ttf": 34,
          "term_freq": 1,
          "tokens": [
            {
              "position": 3,
              "start_offset": 21,
              "end_offset": 29
            }
          ]
        },
        "constipation": {
          "doc_freq": 67,
          "ttf": 68,
          "term_freq": 1,
          "tokens": [
            {
              "position": 4,
              "start_offset": 31,
              "end_offset": 43
            }
          ]
        },
        "dry": {
          "doc_freq": 78,
          "ttf": 83,
          "term_freq": 1,
          "tokens": [
            {
              "position": 5,
              "start_offset": 45,
              "end_offset": 48
            }
          ]
        },
        "effects": {
          "doc_freq": 386,
          "ttf": 503,
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 5,
              "end_offset": 12
            }
          ]
        },
        "headache": {
          "doc_freq": 205,
          "ttf": 213,
          "term_freq": 1,
          "tokens": [
            {
              "position": 7,
              "start_offset": 56,
              "end_offset": 64
            }
          ]
        },
        "include": {
          "doc_freq": 588,
          "ttf": 658,
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 13,
              "end_offset": 20
            }
          ]
        },
        "increase": {
          "doc_freq": 51,
          "ttf": 56,
          "term_freq": 2,
          "tokens": [
            {
              "position": 18,
              "start_offset": 149,
              "end_offset": 157
            },
            {
              "position": 22,
              "start_offset": 188,
              "end_offset": 196
            }
          ]
        },
        "liver": {
          "doc_freq": 50,
          "ttf": 61,
          "term_freq": 1,
          "tokens": [
            {
              "position": 15,
              "start_offset": 126,
              "end_offset": 131
            }
          ]
        },
        "mania": {
          "doc_freq": 4,
          "ttf": 4,
          "term_freq": 1,
          "tokens": [
            {
              "position": 17,
              "start_offset": 142,
              "end_offset": 147
            }
          ]
        },
        "mouth": {
          "doc_freq": 81,
          "ttf": 82,
          "term_freq": 1,
          "tokens": [
            {
              "position": 6,
              "start_offset": 49,
              "end_offset": 54
            }
          ]
        },
        "nausea": {
          "doc_freq": 385,
          "ttf": 406,
          "term_freq": 1,
          "tokens": [
            {
              "position": 8,
              "start_offset": 66,
              "end_offset": 72
            }
          ]
        },
        "nervousness": {
          "doc_freq": 27,
          "ttf": 27,
          "term_freq": 1,
          "tokens": [
            {
              "position": 9,
              "start_offset": 74,
              "end_offset": 85
            }
          ]
        },
        "or": {
          "doc_freq": 420,
          "ttf": 716,
          "term_freq": 1,
          "tokens": [
            {
              "position": 24,
              "start_offset": 207,
              "end_offset": 209
            }
          ]
        },
        "problems": {
          "doc_freq": 18,
          "ttf": 20,
          "term_freq": 1,
          "tokens": [
            {
              "position": 13,
              "start_offset": 104,
              "end_offset": 112
            }
          ]
        },
        "rash": {
          "doc_freq": 92,
          "ttf": 101,
          "term_freq": 1,
          "tokens": [
            {
              "position": 11,
              "start_offset": 92,
              "end_offset": 96
            }
          ]
        },
        "seizures": {
          "doc_freq": 65,
          "ttf": 69,
          "term_freq": 1,
          "tokens": [
            {
              "position": 20,
              "start_offset": 169,
              "end_offset": 177
            }
          ]
        },
        "side": {
          "doc_freq": 150,
          "ttf": 164,
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4
            }
          ]
        },
        "skin": {
          "doc_freq": 145,
          "ttf": 168,
          "term_freq": 1,
          "tokens": [
            {
              "position": 10,
              "start_offset": 87,
              "end_offset": 91
            }
          ]
        },
        "sleep": {
          "doc_freq": 8,
          "ttf": 10,
          "term_freq": 1,
          "tokens": [
            {
              "position": 12,
              "start_offset": 98,
              "end_offset": 103
            }
          ]
        },
        "somnolence": {
          "doc_freq": 55,
          "ttf": 57,
          "term_freq": 1,
          "tokens": [
            {
              "position": 14,
              "start_offset": 114,
              "end_offset": 124
            }
          ]
        },
        "sweating": {
          "doc_freq": 40,
          "ttf": 41,
          "term_freq": 1,
          "tokens": [
            {
              "position": 21,
              "start_offset": 179,
              "end_offset": 187
            }
          ]
        },
        "syndrome": {
          "doc_freq": 49,
          "ttf": 55,
          "term_freq": 1,
          "tokens": [
            {
              "position": 26,
              "start_offset": 221,
              "end_offset": 229
            }
          ]
        },
        "tourette's": {
          "doc_freq": 1,
          "ttf": 1,
          "term_freq": 1,
          "tokens": [
            {
              "position": 25,
              "start_offset": 210,
              "end_offset": 220
            }
          ]
        },
        "toxicity": {
          "doc_freq": 230,
          "ttf": 283,
          "term_freq": 1,
          "tokens": [
            {
              "position": 16,
              "start_offset": 132,
              "end_offset": 140
            }
          ]
        },
        "tremors": {
          "doc_freq": 25,
          "ttf": 26,
          "term_freq": 1,
          "tokens": [
            {
              "position": 23,
              "start_offset": 198,
              "end_offset": 205
            }
          ]
        },
        "urination": {
          "doc_freq": 12,
          "ttf": 12,
          "term_freq": 1,
          "tokens": [
            {
              "position": 19,
              "start_offset": 158,
              "end_offset": 167
            }
          ]
        }
      }
    }
  }
}

Here is the Python code I used to create the index :

mappings = {
            "properties": {
                "id": {"type": "keyword", "index": False},
                "name": {"type": "text"},
                "indication": {"type": "text"},
                "toxicity": {"type": "text"},
                "atc_code": {"type": "keyword", "index": False}
            }
        }

        es.indices.create(index='drugbank', mappings=mappings)

        # Insert data into the database
        bulk_data = []
        for index, row in df.iterrows():
            bulk_data.append({
                "_index": "drugbank",
                "_id": row.id,
                "_source": row.to_dict()
            })
        bulk(es, bulk_data)

        es.indices.refresh(index="drugbank")

I'd appreciate any insights into why I'm encountering this issue and how I might resolve it.
Thank you in advance for your help !

Bonjour Paul,

Actually, you might got the document in the resultset but Elasticsearch by default only shows the first 10 hits.
You could change the size of the page to 500 for example with:

GET /drugbank/_search
{
  "size": 500,
  "query": {
    "match_phrase": {
      "toxicity": "nausea"
    }
  }
}

And I think you should see it somewhere.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.