Match array of terms in filtered query


(Richard Livsey) #1

Each user in our app can see a subset of the documents based on a
number of conditions (permissions etc...) and I'm wondering what the
best approach to building a restricted query would be.

Andrei asked a similar question about restricting the query with
boolean vs filters in which filters came up as the most efficient
method:
http://elasticsearch-users.115913.n3.nabble.com/Boolean-query-vs-filters-and-more-td1177230.html;cid=1282670994600-452#a1182865

However in our data model we don't have a single field which we can
filter by, it's more a case of getting a list of all the IDs of
documents the user can access and then searching them.

Eg, modelled as a boolean query, which works but isn't very elegant
and I can see it having issues with performance as the query grows:

    {
      :query => {
        :bool => {
          :must => [
            {
              :query_string => {
                :query => params[:q]
              }
            },
            {
              :query_string => {
                :default_field => "id",
                :query => accessible_ids.join(" OR ")
              }
            }
          ]
        }
      }
    }

Is there a way of performing a term query against an array of terms so
that I could convert this to a filter?
Or is there a better way of doing this that someone knows of?

Thanks in advance.

--
Richard Livsey
Minutebase - Online Meeting Minutes
http://minutebase.com
http://livsey.org


(Shay Banon) #2

There is the terms filter which should fit better in this case. Can you
explain more why you get all the ids and then do a search? How do you get
all the ids? Can't that be "logical" groups that are added to each doc, and
filter by that? It will be much more efficient.

-shay.banon

On Tue, Aug 24, 2010 at 8:43 PM, Richard Livsey livsey@gmail.com wrote:

Each user in our app can see a subset of the documents based on a
number of conditions (permissions etc...) and I'm wondering what the
best approach to building a restricted query would be.

Andrei asked a similar question about restricting the query with
boolean vs filters in which filters came up as the most efficient
method:

http://elasticsearch-users.115913.n3.nabble.com/Boolean-query-vs-filters-and-more-td1177230.html;cid=1282670994600-452#a1182865

However in our data model we don't have a single field which we can
filter by, it's more a case of getting a list of all the IDs of
documents the user can access and then searching them.

Eg, modelled as a boolean query, which works but isn't very elegant
and I can see it having issues with performance as the query grows:

   {
     :query => {
       :bool => {
         :must => [
           {
             :query_string => {
               :query => params[:q]
             }
           },
           {
             :query_string => {
               :default_field => "id",
               :query => accessible_ids.join(" OR ")
             }
           }
         ]
       }
     }
   }

Is there a way of performing a term query against an array of terms so
that I could convert this to a filter?
Or is there a better way of doing this that someone knows of?

Thanks in advance.

--
Richard Livsey
Minutebase - Online Meeting Minutes
http://minutebase.com
http://livsey.org


(ppearcy) #3

Hi Richard,
I have a similar situation where for any user we construct a query
string that enforces what they are entitled to. This query string can
be arbitrarily complex. Just taking that query string and wrapping it
in a filter seems to work very well. So, my final query typically ends
up looking like:

				'filtered' : {
					'query' : {
						'query_string' : {'query':mainquery}
						},
					'constant_score' : {
						'filter' : {
							'query' : {
									'query_string' : {'query':userentitlements}
							}
						}
					}
				}

So you can keep doing similar to what you are doing, but this query
structure yields good caching on the user entitlements part.

Not sure if this is optimal, but is working quite well in my
performance testing thus far.

Regards,
Paul

On Aug 24, 11:43 am, Richard Livsey liv...@gmail.com wrote:

Each user in our app can see a subset of the documents based on a
number of conditions (permissions etc...) and I'm wondering what the
best approach to building a restricted query would be.

Andrei asked a similar question about restricting the query with
boolean vs filters in which filters came up as the most efficient
method:http://elasticsearch-users.115913.n3.nabble.com/Boolean-query-vs-filt...

However in our data model we don't have a single field which we can
filter by, it's more a case of getting a list of all the IDs of
documents the user can access and then searching them.

Eg, modelled as a boolean query, which works but isn't very elegant
and I can see it having issues with performance as the query grows:

    {
      :query => {
        :bool => {
          :must => [
            {
              :query_string => {
                :query => params[:q]
              }
            },
            {
              :query_string => {
                :default_field => "id",
                :query => accessible_ids.join(" OR ")
              }
            }
          ]
        }
      }
    }

Is there a way of performing a term query against an array of terms so
that I could convert this to a filter?
Or is there a better way of doing this that someone knows of?

Thanks in advance.

--
Richard Livsey
Minutebase - Online Meeting Minuteshttp://minutebase.comhttp://livsey.org


(Richard Livsey-2) #4

I wish it was simpler, but there's no real grouping of documents on a
per-user basis.
A user can see a document if any of the following are true (and a few
other conditions):

  • they are an admin
  • they created the document
  • they are a member of the project the document belongs to
  • the document is "shared" with them
  • the document is public
  • etc...

The query to generate this list of documents can be fairly hairy, so
we cache a list of document IDs on a per-user basis.
So when doing a full-text search I have the IDs to restrict against,
but unfortunately it could number in the hundreds.

Thanks.

On Tue, Aug 24, 2010 at 6:53 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

There is the terms filter which should fit better in this case. Can you
explain more why you get all the ids and then do a search? How do you get
all the ids? Can't that be "logical" groups that are added to each doc, and
filter by that? It will be much more efficient.
-shay.banon

On Tue, Aug 24, 2010 at 8:43 PM, Richard Livsey livsey@gmail.com wrote:

Each user in our app can see a subset of the documents based on a
number of conditions (permissions etc...) and I'm wondering what the
best approach to building a restricted query would be.

Andrei asked a similar question about restricting the query with
boolean vs filters in which filters came up as the most efficient
method:

http://elasticsearch-users.115913.n3.nabble.com/Boolean-query-vs-filters-and-more-td1177230.html;cid=1282670994600-452#a1182865

However in our data model we don't have a single field which we can
filter by, it's more a case of getting a list of all the IDs of
documents the user can access and then searching them.

Eg, modelled as a boolean query, which works but isn't very elegant
and I can see it having issues with performance as the query grows:

   {
     :query => {
       :bool => {
         :must => [
           {
             :query_string => {
               :query => params[:q]
             }
           },
           {
             :query_string => {
               :default_field => "id",
               :query => accessible_ids.join(" OR ")
             }
           }
         ]
       }
     }
   }

Is there a way of performing a term query against an array of terms so
that I could convert this to a filter?
Or is there a better way of doing this that someone knows of?

Thanks in advance.

--
Richard Livsey
Minutebase - Online Meeting Minutes
http://minutebase.com
http://livsey.org

--
Richard Livsey
Minutebase - Online Meeting Minutes
http://minutebase.com
http://livsey.org


(Shay Banon) #5

The good news is that if you use filter (either constant_score wrapping a
query or terms filter) is that the same effectiveness of your caching of doc
ids, so you should get good performance, even with a large list.

Regarding the list below, you can index that information on the document
level. For example, add created_by, shared_with, status (public or not), and
so on. Then, you can use an "or" filter on them. Thats possible assuming
that those "..." don't hide many more complex restrictions :wink:

-shay.banon

On Tue, Aug 24, 2010 at 8:59 PM, Richard Livsey richard@livsey.org wrote:

I wish it was simpler, but there's no real grouping of documents on a
per-user basis.
A user can see a document if any of the following are true (and a few
other conditions):

  • they are an admin
  • they created the document
  • they are a member of the project the document belongs to
  • the document is "shared" with them
  • the document is public
  • etc...

The query to generate this list of documents can be fairly hairy, so
we cache a list of document IDs on a per-user basis.
So when doing a full-text search I have the IDs to restrict against,
but unfortunately it could number in the hundreds.

Thanks.

On Tue, Aug 24, 2010 at 6:53 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

There is the terms filter which should fit better in this case. Can you
explain more why you get all the ids and then do a search? How do you get
all the ids? Can't that be "logical" groups that are added to each doc,
and
filter by that? It will be much more efficient.
-shay.banon

On Tue, Aug 24, 2010 at 8:43 PM, Richard Livsey livsey@gmail.com
wrote:

Each user in our app can see a subset of the documents based on a
number of conditions (permissions etc...) and I'm wondering what the
best approach to building a restricted query would be.

Andrei asked a similar question about restricting the query with
boolean vs filters in which filters came up as the most efficient
method:

http://elasticsearch-users.115913.n3.nabble.com/Boolean-query-vs-filters-and-more-td1177230.html;cid=1282670994600-452#a1182865

However in our data model we don't have a single field which we can
filter by, it's more a case of getting a list of all the IDs of
documents the user can access and then searching them.

Eg, modelled as a boolean query, which works but isn't very elegant
and I can see it having issues with performance as the query grows:

   {
     :query => {
       :bool => {
         :must => [
           {
             :query_string => {
               :query => params[:q]
             }
           },
           {
             :query_string => {
               :default_field => "id",
               :query => accessible_ids.join(" OR ")
             }
           }
         ]
       }
     }
   }

Is there a way of performing a term query against an array of terms so
that I could convert this to a filter?
Or is there a better way of doing this that someone knows of?

Thanks in advance.

--
Richard Livsey
Minutebase - Online Meeting Minutes
http://minutebase.com
http://livsey.org

--
Richard Livsey
Minutebase - Online Meeting Minutes
http://minutebase.com
http://livsey.org


(Richard Livsey-2) #6

Ah, that looks along the lines of what I'm trying to do, thanks.

I had tried something similar previously but was confused by the note
in the docs saying:
"The filter object can hold only filter elements, not queries", so had
given up trying to get it working as part of the filter!
What does that note actually mean?

The docs also mention that filters don't perform any scoring, so is
there a benefit of nesting the query in a constant_score in your
example?

I've managed to get the following working now:

{
"query" => {
"filtered" => {
"query" => {
"query_string" => { "query" => "test" }
},
"filter" => {
"query" => {
"query_string" => {
"default_field" => "document_id",
"query" => "4c31d38acf02a31548000009 OR 4c31d38acf02a31548000008"
}
}
}
}
}
}

Thanks for the help!

On Tue, Aug 24, 2010 at 6:56 PM, Paul ppearcy@gmail.com wrote:

Hi Richard,
I have a similar situation where for any user we construct a query
string that enforces what they are entitled to. This query string can
be arbitrarily complex. Just taking that query string and wrapping it
in a filter seems to work very well. So, my final query typically ends
up looking like:

                                   'filtered' : {
                                           'query' : {
                                                   'query_string' : {'query':mainquery}
                                                   },
                                           'constant_score' : {
                                                   'filter' : {
                                                           'query' : {
                                                                           'query_string' : {'query':userentitlements}
                                                           }
                                                   }
                                           }
                                   }

So you can keep doing similar to what you are doing, but this query
structure yields good caching on the user entitlements part.

Not sure if this is optimal, but is working quite well in my
performance testing thus far.

Regards,
Paul

On Aug 24, 11:43 am, Richard Livsey liv...@gmail.com wrote:

Each user in our app can see a subset of the documents based on a
number of conditions (permissions etc...) and I'm wondering what the
best approach to building a restricted query would be.

Andrei asked a similar question about restricting the query with
boolean vs filters in which filters came up as the most efficient
method:http://elasticsearch-users.115913.n3.nabble.com/Boolean-query-vs-filt...

However in our data model we don't have a single field which we can
filter by, it's more a case of getting a list of all the IDs of
documents the user can access and then searching them.

Eg, modelled as a boolean query, which works but isn't very elegant
and I can see it having issues with performance as the query grows:

    {
      :query => {
        :bool => {
          :must => [
            {
              :query_string => {
                :query => params[:q]
              }
            },
            {
              :query_string => {
                :default_field => "id",
                :query => accessible_ids.join(" OR ")
              }
            }
          ]
        }
      }
    }

Is there a way of performing a term query against an array of terms so
that I could convert this to a filter?
Or is there a better way of doing this that someone knows of?


(Richard Livsey-2) #7

Fantastic, thanks for the help.

The actual case is a bit more complicated, the documents themselves
aren't the items which have the associated created_by/shared_with data
on them. They are children of those items, so every time one of the
parents changed I'd have to update the same data on all the children.
As one parent object can have many hundreds of child objects, that
would get to be a huge amount of updates.

It sounds like passing in the list of id's isn't too crazy, so I'll
carry on down these lines for now and do some performance testing
against a decent sized dataset.

Cheers again for the help!

On Tue, Aug 24, 2010 at 7:05 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

The good news is that if you use filter (either constant_score wrapping a
query or terms filter) is that the same effectiveness of your caching of doc
ids, so you should get good performance, even with a large list.
Regarding the list below, you can index that information on the document
level. For example, add created_by, shared_with, status (public or not), and
so on. Then, you can use an "or" filter on them. Thats possible assuming
that those "..." don't hide many more complex restrictions :wink:
-shay.banon
On Tue, Aug 24, 2010 at 8:59 PM, Richard Livsey richard@livsey.org wrote:

I wish it was simpler, but there's no real grouping of documents on a
per-user basis.
A user can see a document if any of the following are true (and a few
other conditions):

  • they are an admin
  • they created the document
  • they are a member of the project the document belongs to
  • the document is "shared" with them
  • the document is public
  • etc...

The query to generate this list of documents can be fairly hairy, so
we cache a list of document IDs on a per-user basis.
So when doing a full-text search I have the IDs to restrict against,
but unfortunately it could number in the hundreds.

Thanks.


(system) #8