ES doesn't remove Persian stopwords

I made a custom analyzer based on Persian analyzer to use my custom stopwords. The problem is ES doesn't remove Persian words from the text.

The analyzer:
persian-without-stopwords-analyzer: {
type: "persian",
stopwords: [

And I tested my analyzer using the following code:
GET driq/_analyze
"analyzer": "persian-without-stopwords-analyzer",
"text" : "something دبیرستان another"

The result is:
"tokens": [
"token": "دبيرستان",
"start_offset": 10,
"end_offset": 18,
"type": "",
"position": 1

Why it removes 'something' and 'another' but not 'دبیرستان'?

1 Like

I found the solution. Because of normalization filters Persian 'ye' character was replaced with Arabic 'ye', but I write the stopword with Persian 'ye', so the didn't match. I write my stopword using Arabic 'ye' and ES removes stopwords from index.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.