How do I ensure 'stop' word are not picked up in my searches

bilpor · November 21, 2017, 8:42am

HI All,

I have created the following index:

    put newsindex
{
  "settings" : {
    "number_of_shards":3,
    "number_of_replicas":2,
    "analysis": {
 "filter": {
     "my_stop": {
         "type":      "stop",
        "stopwords":  "_english_"
     }
 }
        }
  },
  "mappings" : {
    "news": {
      "properties": {
        "newsid": {
          "type": "integer"
        },
        "newstype": {
          "type": "text"
        },
        "bodytext": {
          "type": "text"
        },
        "caption": {
          "type": "text"
        },
        "headline": {
          "type": "text"
        },
        "approved": {
          "type": "text"
    },
"author": {
  "type": "text"
},
"contact": {
  "type": "text"
},
"datecreated": {
  "type": "date",
  "format": "date_time"
},
"datesubmitted": {
  "type": "date",
  "format": "date_time"
},
"lastmodifieddate": {
  "type": "date",
  "format": "date_time"
}
  }
}
  }
}

Now when I perform a query, if I just use stop words such as

'is', 'it', 'the'

on their own in the search nothing is returned as expected. However, if I use a stop word with a non-stop word, then anything with the stop word will be returned along with those that have my non-stop word. so if I query against 'is finished' I have returned anything with 'is finished', 'finished' and 'is'. How do I stop those documents with just 'is' in them from being returned.

dadoonet · November 21, 2017, 8:58am

Could you provide a full recreation script as described in

It will help to better understand what you are doing.
Please, try to keep the example as simple as possible.

bilpor · November 21, 2017, 9:16am

I've amended my question to show how the index has been created. Using Kibana it's infact worse than I thought. In my app I am building up a wildcard search, but for a simple test in Kibana I did the following and thousands of hits were returned when I expected zero.

get newsindex/_search
{
  "query": { "query": {
    "bodytext": "and"
  } }
}

dadoonet · November 21, 2017, 12:27pm

Could you try with GET and not get?

A full example would help

bilpor · November 21, 2017, 1:36pm

HI dadoonet.

I tried with GET and it made no difference. I have also placed this up on stackoverflow. I've tried to introduce search analyzers with some strange results trying to get to the root of this problem
question in stackoverflow

jpountz · November 21, 2017, 1:41pm

The bodytext field does not specify an analyzer, so it is using the default analyzer which does not have stop words. You need to set an analyzer that removes stop words on the bodytext field.

bilpor · November 21, 2017, 2:35pm

HI jpountz. can you please take a look at my link to stackoverflow my question there is more comprehensive. I have assigned the analyzer to the mapping of the property. When I do that I have even fewer documents returned than expected.

system · December 19, 2017, 2:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anaysis on stops words Elasticsearch	2	305	July 6, 2017
Filter stop words in simple_query_string Elasticsearch	5	29	July 20, 2024
Stop words not working Elasticsearch	1	488	July 5, 2017
Stopwords in analyzer doesn't seem to work Elasticsearch	3	384	June 26, 2020
Remove stopwords while querying using GET Request in Elasticsearch Elasticsearch	1	367	September 30, 2019

How do I ensure 'stop' word are not picked up in my searches

Related topics