Hi,
We have a sourceUrl compound field with a custom analyzer for the sourceUrl.pathName component. The idea is that we can run a query to get all urls in a path, e.g. example.com/path/here brings back everything under example.com/path/here.*.
This all worked in ES 1.7, but I can't get it working in 2.4. It's almost like 2.4 is only matching on one or two tokens instead of all of them as I'd expect minimum_should_match: "100%" to do.
So even if I use a very specific filter like "http://www.animalfactguide.com/category/animal-news/page/6/" I get all the other pages at "http://www.animalfactguide.com/category/animal-news/page/10/" etc.
Query
{"query": {
"bool" : {
"filter" : {
"match" : {
"sourceUrl.pathName" : {
"query" : "http://www.animalfactguide.com/category/animal-news/page/6/",
"type" : "boolean",
"minimum_should_match" : "100%"
}
}
}
}
}
}
Mappings and analyzers:
{
"settings": {
"index": {
"analysis": {
"char_filter": {
"drop_trailing_slash": {
"pattern": "/$",
"type": "pattern_replace",
"replacement": ""
},
"path": {
"type": "pattern_replace",
"pattern": "^(.*://)?([^/]*)((/[^?]*?)?(/([^/]*.html?)?)?)(\?.*)?$",
"replacement": "$3"
},
"drop_leading_slash": {
"type": "pattern_replace",
"pattern": "^/",
"replacement": ""
},
},
"analyzer": {
"pathName": {
"filter": "lowercase",
"char_filter": [
"path",
"drop_leading_slash",
"drop_trailing_slash"
],
"type": "custom",
"tokenizer": "pathName"
},
},
"tokenizer": {
"pathName": {
"type": "path_hierarchy",
"reverse": "false",
"delimiter": "/"
},
}
},
}
},
"mappings": {
"page": {
"properties": {
"sourceUrl": {
"index": "no",
"type": "string",
"fields": {
"pathName": {
"analyzer": "pathName",
"type": "string"
},
"raw": {
"index": "not_analyzed",
"type": "string"
}
}
}
}
}
Any help and pointers would be appreciated. I'm running out of options