I am setting up a filter within my index:
"min-length-token":{
"type": "length",
"min": 2,
},
then, I add this filter to my anayzer like this:
"extend-standard" : {
"filter" : [
"lowercase", "min-length-token"
],
"tokenizer" : "standard"
},
But when I call the analyzer api with
{
"analyzer": "extend-standard",
"text": "a&b"
}
It still returns "a" and "b", which have length 1 only.
spinscale
(Alexander Reelsen)
December 18, 2019, 12:24pm
2
can you share a fully reproducible example? I tried this in the analyze API and it works.
GET _analyze
{
"tokenizer": "standard",
"filter": [
{
"type": "length",
"min": 2
}
],
"text": "a&b"
}
also, please share your Elasticsearch version.
Thanks!
1 Like
Thanks for your reply, here is a full sample:
POST t_index/_close
PUT t_index/_settings
{
"analysis":{
"filter":{
"min-length-token":{
"type": "length",
"min": 2,
"max": 512
}
},
"analyzer":{
"t-standard":{
"filter":["min-length-token"],
"type":"standard"
}
}
}
}
POST t_index/_open
GET t_index/_analyze
{
"analyzer": "t-standard",
"text": "a&b"
}
The response:
{
"tokens" : [
{
"token" : "a",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "b",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
spinscale
(Alexander Reelsen)
December 19, 2019, 9:56am
4
try this
PUT t_index
{
"settings": {
"analysis": {
"filter": {
"min-length-token": {
"type": "length",
"min": 2,
"max": 512
}
},
"analyzer": {
"t-standard": {
"filter": [
"min-length-token"
],
"tokenizer" : "standard",
"type": "custom"
}
}
}
}
}
GET t_index/_analyze
{
"analyzer": "t-standard",
"text": "a&b"
}
using standard
as the analyzer seems to ignore all the other settings. I will take a deeper look and potentially open an issue.
Thanks!
1 Like
spinscale
(Alexander Reelsen)
December 19, 2019, 10:13am
5
I opened https://github.com/elastic/elasticsearch/issues/50356 for further discussion if you are interested.
1 Like
Thanks for the work around, I also agree it is an issue. I got it fixed with standard tokeniser, but I can't get the english work to work.
"t-standard":{
"filter":["min-length-token"],
"type":"english",
"tokenizer": "standard"
}
system
(system)
Closed
January 16, 2020, 10:32am
7
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.