How to filter search result using a lookup from another index

Hi Community,

Im planning to create a Document Management System using Elasticsearch and FSCrawler on top of Big Data Platform.

One of the key features of DMS is to have a Role based file permission thus having different user level access.Im planning to create two index, index one will be from FSCrawler naming it (docIndex) containing every content of different documents and the second index will be the permission tabs (permIndex) that contains every file that has been uploaded and the permissions set to certain file (E.g onlyme,groupView,). Here is a sample Json structure of two index.

"_index": "docIndex",
"_type": "doc",
"_id": "b9c3d84d2f467b3fd4b5a0911d5e2e87",
"_score": 1,
"_source": {
	"content": """
	sample pdf contents
	""",
"meta": {
"raw": {
	"X-Parsed-By": "org.apache.tika.parser.DefaultParser",
	"Content-Encoding": "windows-1252",
	"resourceName": "sample.pdf",
	"Content-Type": "text/plain; charset=windows-1252"
}
},
"file": {
	"extension": "pdf",
	"content_type": "text/plain; charset=windows-1252",
	"last_modified": "2018-07-12T06:57:32.000+0000",
	"indexing_date": "2018-07-12T06:58:15.617+0000",
	"filesize": 159,
	"filename": "sample.pdf",
	"url": "file:///tmp/es/sample.pdf",
	"indexed_chars": 159
},

Second Index

"_index": "permIndex",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
    "filename": "sample.pdf",
    "permission": 0,
    "author": "someone"
 }

An example scenario would be like this. If someone search some documents using a web app, im going to first filter all the searchable documents based on his/her allowed permissions. By default, a user can view all his documents and other documents that has a "view all" permissions.

First process will be going to permIndex, using this search api.

GET /permIndex/_search
{
    "query": {
        "query_string": {
            "query": "(author:someone OR permission:0)"
        }
    }
}

Sample Output

"hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "fileperm",
        "_type": "_doc",
        "_id": "3",
        "_score": 1,
        "_source": {
          "filename": "klaytodendrix_file",
          "permission": 0,
          "author": "klay"
        }
      },
      {
        "_index": "fileperm",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "filename": "sample.pdf",
          "permission": 1,
          "author": "someone"
        }
      }
    ]
  }
}

I want to use this output to be a filter for the next search API (If its possible?) searching the docIndex having to filter the only viewable contents for a specific user.
Its more like a relational query in a RDBMS wherein I have a primary key (filename) that is related to a second table having its information. Im aware that in elasticsearch I need to denormalize everything and let json document structure handle this kind of thing, but it is just impossible to work it together with FSCrawler since it has its own process of indexing a file thus not giving us a chance to implement a file permission. We kinda eliminate all other options that use a single index that handles everything, since If we want that approach, we solely need to work with FSCrawler integrated with the Web App we are developing. For us having a second index that handles the permission would be much more easier and realiable than firing a new FSCrawler job for every users.

Hope I discuss it clearly especially what are we trying to build.

We are very open if we need to redesign our architecture if this is not feasible. Every input is appreciated :slightly_smiling_face:.

Thanks!

I think this needs to be solved at index time.
FSCrawler should provide such a mechanism.

There are 2 things that are coming to my mind:

There's nothing yet ready OOTB in FSCrawler but that's something I'm thinking about as I believe this is a real need.

This is really helpful :slight_smile:

Thanks for the response @dadoonet, for now we did create a two index (1 for file permission, 2nd is for the real documents), then we need to first query the file permission index on what files are viewable to a certain user after that, the result will be then use for a filter for the next query to the 2nd index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.