We are running Elasticsearch 1.7 (planning to upgrade very soon) and I am trying to use the Analyze API to understand what the different analyzers do, but the result presented from elasticsearch is not what I expect.
If I run the following query against our elasticsearch instance
POST _analyze
{
"analyzer": "stop",
"text": "Extremely good food! We had the happiest waiter and the crowd's always flowing!"
}
I will get this result
{
"tokens": [
{
"token": "analyzer",
"start_offset": 6,
"end_offset": 14,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "stop",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "text",
"start_offset": 30,
"end_offset": 34,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "extremely",
"start_offset": 38,
"end_offset": 47,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "good",
"start_offset": 48,
"end_offset": 52,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "food",
"start_offset": 53,
"end_offset": 57,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "we",
"start_offset": 59,
"end_offset": 61,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "had",
"start_offset": 62,
"end_offset": 65,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "the",
"start_offset": 66,
"end_offset": 69,
"type": "<ALPHANUM>",
"position": 9
},
{
"token": "happiest",
"start_offset": 70,
"end_offset": 78,
"type": "<ALPHANUM>",
"position": 10
},
{
"token": "waiter",
"start_offset": 79,
"end_offset": 85,
"type": "<ALPHANUM>",
"position": 11
},
{
"token": "and",
"start_offset": 86,
"end_offset": 89,
"type": "<ALPHANUM>",
"position": 12
},
{
"token": "the",
"start_offset": 90,
"end_offset": 93,
"type": "<ALPHANUM>",
"position": 13
},
{
"token": "crowd's",
"start_offset": 94,
"end_offset": 101,
"type": "<ALPHANUM>",
"position": 14
},
{
"token": "always",
"start_offset": 102,
"end_offset": 108,
"type": "<ALPHANUM>",
"position": 15
},
{
"token": "flowing",
"start_offset": 109,
"end_offset": 116,
"type": "<ALPHANUM>",
"position": 16
}
]
}
which does not make sense to me. I am using the stop analyzer, why is the words "and" and "the" in the result? I have tried to change the stop analyzer to both whitespace and standard, but I get the exact same result as above. There is no difference between them.
However, if I run the exact same query against an instance of Elasticsearch 5.x the result does no longer contain "and" and "the" and it seems much more as expected.
Is this because we are using 1.7 or is it something in our setup of Elasticsearch that is causing this issue?