ES doesn't remove Persian stopwords

I made a custom analyzer based on Persian analyzer to use my custom stopwords. The problem is ES doesn't remove Persian words from the text.

The analyzer:
persian-without-stopwords-analyzer: {
type: "persian",
stopwords: [
"something",
"دبیرستان",
"another"
]
}

And I tested my analyzer using the following code:
GET driq/_analyze
{
"analyzer": "persian-without-stopwords-analyzer",
"text" : "something دبیرستان another"
}

The result is:
{
"tokens": [
{
"token": "دبيرستان",
"start_offset": 10,
"end_offset": 18,
"type": "",
"position": 1
}
]
}

Why it removes 'something' and 'another' but not 'دبیرستان'?

1 Like

I found the solution. Because of normalization filters Persian 'ye' character was replaced with Arabic 'ye', but I write the stopword with Persian 'ye', so the didn't match. I write my stopword using Arabic 'ye' and ES removes stopwords from index.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.