Different results with fingerprint analyzer

The first test is missing the "e" in "sante"

GET /_analyze?analyzer=fingerprint&text="Santé the in a Monica"
=> "token": "a in monica sant the"

GET /_analyze
{
"analyzer": "fingerprint",
"text": "Santé the in a Monica"
}
=> "token": "a in monica sante the"

Hm, I think it's something about passing it as a URI parameter. If you run it in the body (like your second example), it seems to work as expected:

POST _analyze
{
  "analyzer": "fingerprint",
  "text": "Santé the in a Monica"
}

# POST _analyze
{
  "tokens": [
    {
      "token": "a in monica sante the",
      "start_offset": 0,
      "end_offset": 21,
      "type": "fingerprint",
      "position": 0
    }
  ]
}
GET /_analyze?analyzer=fingerprint&text="Santé the in a Monica"
# GET /_analyze?analyzer=fingerprint&text="Santé the in a Monica"
{
  "tokens": [
    {
      "token": "a in monica sant the",
      "start_offset": 0,
      "end_offset": 23,
      "type": "fingerprint",
      "position": 0
    }
  ]
}

Thanks. I'd like to be able to run it either way and get the same results. I'll run in the "body" format for now.

Yeah. Was probably caused by a non UTF8 encoding IMO.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.