Hi there!
Pretty new to ES so please bear with me!
So, I have an index "dirs" which contains a bunch of documents, each representing a directory in some filesystem. Looks a bit like this...
{
"settings":{
"number_of_shards":1,
"number_of_replicas":0
},
"mappings":{
"_default_": {
"_all": {
"enabled": true
}
},
"dir":{
"_all": {
"enabled": true
},
"properties":{
"Path":{ "type": "text", "index" : "not_analyzed" },
"Depth":{ "type": "integer"},
"Fingerprint":{"type": "text"}
}
}
}
}
Now, I want to search this index using a regular expression say "/Users.*" to get every directory under /Users.
$ curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"regexp":{
"Path": "/Users"
}
}
}
'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Is this ES saying there's nothing in the index with a path starting with /Users? If so, that's wrong! Lets prove it with a more liberal regex...
$ curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"regexp":{
"Path": ".*"
}
}
}
'
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 31321,
"max_score" : 1.0,
"hits" : [
{
"_index" : "dirs",
"_type" : "dir",
"_id" : "/Users/marmel01/ether/etherpad-lite/src/node_modules/wd/node_modules/caseless",
"_score" : 1.0,
"_source" : {
"Path" : "/Users/marmel01/ether/etherpad-lite/src/node_modules/wd/node_modules/caseless",
"Depth" : 9,
"Fingerprint" : ""
}
}
]
}
}
See, EVERYTHING starts with /Users, something's wrong here!
Here's another bit of weirdness, instead of matching "/Users", lets try "users" (lowercase, no leading slash)
$ curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"regexp":{
"Path": "users"
}
}
}
'
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 31321,
"max_score" : 1.0,
"hits" : [
{
"_index" : "dirs",
"_type" : "dir",
"_id" : "/Users/marmel01/ether/etherpad-lite/src/node_modules/wd/node_modules/caseless",
"_score" : 1.0,
"_source" : {
"Path" : "/Users/marmel01/ether/etherpad-lite/src/node_modules/wd/node_modules/caseless",
"Depth" : 9,
"Fingerprint" : ""
}
}
]
}
}
So, is "users" matching "Users" here? Is there some case sensitivity weirdness going on here?
What's more, we lose our match again if we put the slash back in "/users"
$ curl -XGET 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"regexp":{
"Path": "/users"
}
}
}
'
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Is this expected behaviour? Is there a better way of doing these kinds of matches? I'm going the regex route because eventually I'd like to do more complex matches. For example "/Users/[^/]+" to get all directories under /Users
Thanks!
Mark