ES doesn't remove Persian stopwords

Ahmad_Ahmadi · July 13, 2017, 7:30am

I made a custom analyzer based on Persian analyzer to use my custom stopwords. The problem is ES doesn't remove Persian words from the text.

The analyzer:
persian-without-stopwords-analyzer: {
type: "persian",
stopwords: [
"something",
"دبیرستان",
"another"
]
}

And I tested my analyzer using the following code:
GET driq/_analyze
{
"analyzer": "persian-without-stopwords-analyzer",
"text" : "something دبیرستان another"
}

The result is:
{
"tokens": [
{
"token": "دبيرستان",
"start_offset": 10,
"end_offset": 18,
"type": "",
"position": 1
}
]
}

Why it removes 'something' and 'another' but not 'دبیرستان'?

Ahmad_Ahmadi · July 13, 2017, 8:30am

I found the solution. Because of normalization filters Persian 'ye' character was replaced with Arabic 'ye', but I write the stopword with Persian 'ye', so the didn't match. I write my stopword using Arabic 'ye' and ES removes stopwords from index.

system · August 10, 2017, 8:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Remove all stopwords Elasticsearch	6	500	July 6, 2017
Best way to disable only one stopword in Elastic search Elasticsearch	8	1419	October 5, 2017
Stop words not used by the analyzer Elasticsearch	5	659	July 6, 2017
Some words not analyzed? Elasticsearch	4	353	July 6, 2017
Elasticsearch Foreign Language Stop-words Elasticsearch	2	601	July 6, 2017

ES doesn't remove Persian stopwords

Related topics