How can I get immediate subdirectories using the path hierarchy tokenizer?

I'm using the path hierarchy tokenizer and, using the examples, I would like to give /User/alice/photos/2017/05 to Elasticsearch and have it list the immediate subdirectories (e.g. 15 and 16).

How can I do this?

I do not think you can get it directly through a query as only paths from the start and not individual path components are indexed, but you should be able to extract it from the returned documents.

The problem is we have billions of documents, and are looking for a way to efficiently recreate directory traversal using ES data.

Would you say we just need to store a parent_dir field on each doc or something?

I don't think I'd need each path component. Some more context:

  • We're only indexing files, not dirs.
  • To get the parent dir, I could ask it for the second last path in the array, since the last path would be the full path including the filename, right?
  • I'd just need to as ES to do a kind of terms agg on the second last item in the paths array, and that would group files by their parent dirs, wouldn't it?

Storing a document for each file with parent directory as a field would likely work well and be efficient. Sounds like a good idea. I do not think you can select terms when using the path tokeniser.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.