Indexing and searching for documents which contain field values as file paths

Hello,

We have a field indexed in ES called as fullpath with the value as “\zl_allen-p_000.pst\Top of Personal Folders\allen-p\All documents-- no subject --.msg\ID00000173.ppt”, as per ES documentation below are the list of reserved characters which are required to be escaped using “\”

    • = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

As the above string contains both “\” and “-“ when we attempt to search the exact string using the escape sequence we don’t receive any search hits, below is the query I’m using to perform search, I’ve tried encapsulating the string within double quotes / without etc.

{ "query": { "query_string": { "query": "fullpath:("\zl_allen-p_000.pst\Top of Personal Folders\allen-p\All documents\-- no subject --.msg\ID00000173.ppt")" } }}

Mappings used as below

{"properties":{"fullpath":{"type":"string","index":"not_analyzed"}}}

Indexed the document using the following json’s

{"fullpath":"\\zl_allen-p_000.pst\\Top of Personal Folders\\allen-p\\All documents\\-- no subject --.msg\\ID00000173.ppt"}

And

{"fullpath":"\zl_allen-p_000.pst\Top of Personal Folders\allen-p\All documents\-- no subject --.msg\ID00000173.ppt"}

The explain API returns back as below

{
"_index": "newnest",
"_type": "newnesttype",
"_id": "1",
"matched": false,
"explanation": {
"value": 0.0,
"description": "Failure to meet condition(s) of required/prohibited clause(s)",
"details": [{
"value": 0.0,
"description": "no match on required clause (fullpath.na:"zl_allen p_000 psttop of personal foldersallen pall documents no subject msgid00000173 ppt")",
"details": [{
"value": 0.0,
"description": "no matching term",
"details": []
}]
},
{
"value": 0.0,
"description": "match on required clause, product of:",
"details": [{
"value": 0.0,
"description": "# clause",
"details": []
},
{
"value": 0.083333336,
"description": "_type:newnesttype, product of:",
"details": [{
"value": 1.0,
"description": "boost",
"details": []
},
{
"value": 0.083333336,
"description": "queryNorm",
"details": []
}]
}]
}]
}
}

Would like to know if I am missing something specific to handle file paths.

You need to escape the special characters in your query also. The following should return a result:

GET /paths/_search
{
  "query": {
    "query_string": {
      "query": "fullpath:\\\\zl_allen\\-p_000.pst\\\\Top\\ of\\ Personal\\ Folders\\\\allen\\-p\\\\All\\ documents\\\\\\-\\-\\ no\\ subject\\ \\-\\-.msg\\\\ID00000173.ppt"
    }
  }
}

If you would use a match query instead of a query string query, you would not need to escape all these characters, i.e. the following will return the same result:

GET /paths/_search
{
  "query": {
    "match": {
      "fullpath": "\\zl_allen-p_000.pst\\Top of Personal Folders\\allen-p\\All documents\\-- no subject --.msg\\ID00000173.ppt"
    }
  }
}
2 Likes

That worked,

Thanks for the response.