Stop words are not working

Hello EveryOne,

I just started with elastic search. Its quite interesting and joy to learn.

However i have been stuck implementing the stopwords.

Here's my index:

url -XPUT 'http://localhost:9200/my_indice' -d'
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": [ "&=> and "]
}},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [ "the", "a" ,"from" ,"is"]
}},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip", "&_to_and" ],
"tokenizer": "whitespace",
"filter": [ "lowercase", "my_stopwords" ]
}}
}}}'

and here are my mappings:

curl -XPUT 'http://localhost:9200/my_indice/_mapping/test' -d '
{
"properties": {
"tags": {
"type": "string",
"analyzer": "my_analyzer"
},
"displayName": {
"type": "string"
}
}'

I have cross checked whether my custom analyzer is working , here's the result:

curl -XGET 'http://localhost:9200/my_indice/_analyze?analyzer=my_analyzer' -d '
The quick & brown is a fox'

"tokens": [{
"token": "quick",
"start_offset": 5,
"end_offset": 10,
"type": "word",
"position": 2
}, {
"token": "and",
"start_offset": 11,
"end_offset": 12,
"type": "word",
"position": 3
}, {
"token": "brown",
"start_offset": 13,
"end_offset": 18,
"type": "word",
"position": 4
}, {
"token": "fox",
"start_offset": 24,
"end_offset": 27,
"type": "word",
"position": 7
}]
}

So it clearly shows that custom analyzer is working.

But when i search for documents , it's retrieving documents containing "the".

Here's my query:

curl -XPUT 'http://localhost:9200/my_indice/test' -d '
'{
"query": {
"match": {
"allTags": "the"
}
}
}'

I want to know whether i skipped anything or is there any mistake either with my mappings or query.

thanks,
Ram.

Hey,

looks ok from a top level view. Can you include the index operations of the document as well? Also, the allTags field in the query example should be tags I guess?

--Alex

Hi spinscale,

@it was my mistake while posting in the forum.

the query:

curl -XPUT 'http://localhost:9200/my_indice/test' -d '
'{
"query": {
"match": {
"tags": "the"
}
}
}'

Here's the result i have been getting on running the query:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.1473356,
"hits": [{
"_index": "my_indice",
"_type": "test",
"_id": "1",
"_score": 1.1473356,
"_source": {
"displayName": "Hyderabad Dum Biryani",
"allTags": [
"hyderabad dum biryani",
"this is the best place to enjoy ur spicy biryani",
"hyderabad"
]
}
}]
}
}

now your returned hits contain an allTags field. Can you please provide a full reproducible example for other to debug? The exact calls to create the index, to index the documents and to query, all together.

Hi @spinscale

thanks for your patience man.

Here's my complete work on stop words.

Custom Analyzer Creation for Stop Words:

curl -XPUT 'http://localhost:9200/myIndice' -d'
{
"settings": {
"analysis": {
"char_filter": {
"&toand": {
"type": "mapping",
"mappings": [ "&=> and "]
}},
"filter": {
"my_stopwords": {
"type": "stop",
"stopwords": [ "the", "a" ,"from" ,"is"]
}},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [ "html_strip", "&_to_and" ],
"tokenizer": "whitespace",
"filter": [ "lowercase", "my_stopwords" ]
}}
}}}'

Mappings:

curl -XPUT 'http://localhost:9200/myIndice/_mapping/test' -d '
{
"properties": {
"allTags": {
"type": "string",
"analyzer": "my_analyzer"
},
"displayName": {
"type": "string"
}
}'

Analyzer Testing:

curl -XGET 'http://localhost:9200/myIndice/_analyze?analyzer=my_analyzer' -d '
The quick & brown is a fox'

"tokens": [{
"token": "quick",
"start_offset": 5,
"end_offset": 10,
"type": "word",
"position": 2
}, {
"token": "and",
"start_offset": 11,
"end_offset": 12,
"type": "word",
"position": 3
}, {
"token": "brown",
"start_offset": 13,
"end_offset": 18,
"type": "word",
"position": 4
}, {
"token": "fox",
"start_offset": 24,
"end_offset": 27,
"type": "word",
"position": 7
}]
}

Posting Data:

curl -XPOST 'http://localhost:9200/myIndice/test/1' -d '
{
"displayName": "Hyderabad Dum Biryani",
"allTags": ["hyderabad dum biryani","this is the best place to enjoy ur spicy biryani","hyderabad"]
}'

Query:

curl -XGET 'http://localhost:9200/myIndice/test' -d '
'{
"query": {
"match": {
"allTags": "the"
}
}
}'

Result:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.1473356,
"hits": [{
"_index": "myIndice",
"_type": "test",
"_id": "1",
"_score": 1.1473356,
"_source": {
"displayName": "Hyderabad Dum Biryani",
"allTags": [
"hyderabad dum biryani",
"this is the best place to enjoy ur spicy biryani",
"hyderabad"
]
}
}]
}
}

Do i Need to explicitly add filter for allTags while creating mappings?

regards,
Ram

Hey,

did you really try this example out? The mapping is broken, its name is wrong as it is missing underscores, the index name contains upper case characters and wont be created in 2.x releases, your 'search' operation is a PUT to a type.

Please ensure your example is actually testable.

--Alex

Hi spinacle ,

am using AWS elastic search.Its version is 1.5.2.

I dont have an option to upgrade elastic to current 2.2 in AWS.

regarding the example: I have implemented and tested it except that i gave GET method instead of PUT in my example.

Is it based on the versions , will the implementations change?

regards,
Ram.