Unable to filter out results based on 'missing' fields


(aash dhariya) #1

I am using tire-0.4.2 to interact with elasticsearch in my rails
application (uses mongodb database and mongoid for interacting with
mongdb). I have a post model which has a spam document embedded in it.
post.rb

include Mongoid::Document
include Mongoid::Timestamps* *
..
embeds one :spam, as: :spammable
...

spam.rb
include Mongoid::Document
include Mongoid::Timestamps
embedded_in :spammable, polymorphic: true
field :needs_approval, type: Boolean, default: false
field :is_spam, type: Time

has_and_belongs_to_many :request_spam_by, :class_name => "User"

field :request_spam, type: Boolean, default: false

I want to get all the posts which have no spam document: Here is the tire
query

Post.tire.search(:load => :true, page: self.page, per_page: Post::PER_PAGE)
do |pf|
pf.query{ |query| query.string self.search_text } unless
search_text.blank?
pf.filter(:missing, :field => 'spam')
pf.filter(:term, :college_id => self.college.id)
pf.filter(:term, :user_id => self.user.id)
pf.filter(:missing, :field => 'spam' )
pf.filter(:terms, :user_type => self.user_type) unless
self.user_type.blank?
pf.filter(:range, :created_at => {:gte => self.from_time}) unless
self.from_time.blank?
pf.filter(:range, :created_at => {:lte => self.to_time}) unless
self.to_time.blank?
pf.sort{|s| s.by :updated_at, self.sort_order}
end

elasticsearch query:

curl -X GET
"http://localhost:9200/development_posts/post/_search?from=0&load=true&page=1&per_page=10&size=10&pretty=true"
-d
'{"sort":[{"updated_at":"desc"}],"filter":{"and":[{"missing":{"field":"spam"}},{"term":{"college_id":"4fb424a5addf32296f00013a"}},
{"missing":{"field":"spam"}},{"range":{"created_at":{"gte":"2012-06-05T00:00:00+05:30"}}},{"range":{"created_at":{"lte":"2012-06-05T23:59:59+05:30"}}}]},"size":10,"from":0}'

The results of the query gives me the documents in which spam exists even
though I am search only for document which have spam document missing. I
don't know what is the mistake I am doing. Can anybody point me in the
right direction?


(Shay Banon) #2

Your search seems fine, can you gist a sample that curls some docs and
shows the behavior? (simpler to understand whats going on)? Also, for
performance reasons, its better not to use the "filter" search element
unless you use facets and instead use constant_score wrapping that filter,
or filtered query (under the "query" element).

On Tue, Jun 5, 2012 at 9:21 AM, geeky_sh aash.discover@gmail.com wrote:

I am using tire-0.4.2 to interact with elasticsearch in my rails
application (uses mongodb database and mongoid for interacting with
mongdb). I have a post model which has a spam document embedded in it.
post.rb

include Mongoid::Document
include Mongoid::Timestamps* *
..
embeds one :spam, as: :spammable
...

spam.rb
include Mongoid::Document
include Mongoid::Timestamps
embedded_in :spammable, polymorphic: true
field :needs_approval, type: Boolean, default: false
field :is_spam, type: Time

has_and_belongs_to_many :request_spam_by, :class_name => "User"

field :request_spam, type: Boolean, default: false

I want to get all the posts which have no spam document: Here is the tire
query

Post.tire.search(:load => :true, page: self.page, per_page:
Post::PER_PAGE) do |pf|
pf.query{ |query| query.string self.search_text } unless
search_text.blank?
pf.filter(:missing, :field => 'spam')
pf.filter(:term, :college_id => self.college.id)
pf.filter(:term, :user_id => self.user.id)
pf.filter(:missing, :field => 'spam' )
pf.filter(:terms, :user_type => self.user_type) unless
self.user_type.blank?
pf.filter(:range, :created_at => {:gte => self.from_time}) unless
self.from_time.blank?
pf.filter(:range, :created_at => {:lte => self.to_time}) unless
self.to_time.blank?
pf.sort{|s| s.by :updated_at, self.sort_order}
end

elasticsearch query:

curl -X GET "
http://localhost:9200/development_posts/post/_search?from=0&load=true&page=1&per_page=10&size=10&pretty=true"
-d

'{"sort":[{"updated_at":"desc"}],"filter":{"and":[{"missing":{"field":"spam"}},{"term":{"college_id":"4fb424a5addf32296f00013a"}},

{"missing":{"field":"spam"}},{"range":{"created_at":{"gte":"2012-06-05T00:00:00+05:30"}}},{"range":{"created_at":{"lte":"2012-06-05T23:59:59+05:30"}}}]},"size":10,"from":0}'

The results of the query gives me the documents in which spam exists even
though I am search only for document which have spam document missing. I
don't know what is the mistake I am doing. Can anybody point me in the
right direction?


(Wing) #3

what do you mean about the performance reason?

to my understanding, filter does not affect scoring and can be easily
cached, therefore i use filter instead of query

On Jun 9, 2012 6:17 AM, "Shay Banon" kimchy@gmail.com wrote:

Your search seems fine, can you gist a sample that curls some docs and
shows the behavior? (simpler to understand whats going on)? Also, for
performance reasons, its better not to use the "filter" search element
unless you use facets and instead use constant_score wrapping that filter,
or filtered query (under the "query" element).

On Tue, Jun 5, 2012 at 9:21 AM, geeky_sh aash.discover@gmail.com wrote:

I am using tire-0.4.2 to interact with elasticsearch in my rails
application (uses mongodb database and mongoid for interacting with
mongdb). I have a post model which has a spam document embedded in it.

post.rb

include Mongoid::Document
include Mongoid::Timestamps
..
embeds one :spam, as: :spammable
...

spam.rb
include Mongoid::Document
include Mongoid::Timestamps
embedded_in :spammable, polymorphic: true
field :needs_approval, type: Boolean, default: false
field :is_spam, type: Time

has_and_belongs_to_many :request_spam_by, :class_name => "User"

field :request_spam, type: Boolean, default: false

I want to get all the posts which have no spam document: Here is the
tire query

Post.tire.search(:load => :true, page: self.page, per_page:
Post::PER_PAGE) do |pf|

    pf.query{ |query| query.string self.search_text } unless

search_text.blank?

    pf.filter(:missing, :field => 'spam')
    pf.filter(:term, :college_id => self.college.id)
    pf.filter(:term, :user_id => self.user.id)
    pf.filter(:missing, :field => 'spam' )
    pf.filter(:terms, :user_type => self.user_type) unless

self.user_type.blank?

    pf.filter(:range, :created_at => {:gte => self.from_time})

unless self.from_time.blank?

    pf.filter(:range, :created_at => {:lte => self.to_time}) unless

self.to_time.blank?

    pf.sort{|s| s.by :updated_at, self.sort_order}

end

elasticsearch query:

curl -X GET "
http://localhost:9200/development_posts/post/_search?from=0&load=true&page=1&per_page=10&size=10&pretty=true"
-d

'{"sort":[{"updated_at":"desc"}],"filter":{"and":[{"missing":{"field":"spam"}},{"term":{"college_id":"4fb424a5addf32296f00013a"}},

{"missing":{"field":"spam"}},{"range":{"created_at":{"gte":"2012-06-05T00:00:00+05:30"}}},{"range":{"created_at":{"lte":"2012-06-05T23:59:59+05:30"}}}]},"size":10,"from":0}'

The results of the query gives me the documents in which spam exists
even though I am search only for document which have spam document missing.
I don't know what is the mistake I am doing. Can anybody point me in the
right direction?


(Clinton Gormley) #4

On Sat, 2012-06-09 at 15:37 +0800, Yiu Wing TSANG wrote:

what do you mean about the performance reason?

to my understanding, filter does not affect scoring and can be easily
cached, therefore i use filter instead of query

You can filter a query, or you can filter results AFTER the query has
been run and the facets have been calculated on the original results.

You are using the second type of filter but you're not using facets, so
you are running your query, then applying a filter:
{
query: { query_string: { query: "string"}},
filter: { ...your filter... }
}

Much better in this case to use a filtered query, or a constant score
query, for instance:

{
query: {
filtered: {
query: { query_string: { query: "string"}},
filter: { ...your filter... }
}
}
}

clint

On Jun 9, 2012 6:17 AM, "Shay Banon" kimchy@gmail.com wrote:

Your search seems fine, can you gist a sample that curls some docs
and shows the behavior? (simpler to understand whats going on)? Also,
for performance reasons, its better not to use the "filter" search
element unless you use facets and instead use constant_score wrapping
that filter, or filtered query (under the "query" element).

On Tue, Jun 5, 2012 at 9:21 AM, geeky_sh aash.discover@gmail.com
wrote:

I am using tire-0.4.2 to interact with elasticsearch in my rails
application (uses mongodb database and mongoid for interacting with
mongdb). I have a post model which has a spam document embedded in it.

post.rb

include Mongoid::Document
include Mongoid::Timestamps
..
embeds one :spam, as: :spammable
...

spam.rb
include Mongoid::Document
include Mongoid::Timestamps
embedded_in :spammable, polymorphic: true
field :needs_approval, type: Boolean, default: false
field :is_spam, type: Time

has_and_belongs_to_many :request_spam_by, :class_name => "User"

field :request_spam, type: Boolean, default: false

I want to get all the posts which have no spam document: Here is
the tire query

Post.tire.search(:load => :true, page: self.page, per_page:
Post::PER_PAGE) do |pf|

    pf.query{ |query| query.string self.search_text } unless

search_text.blank?

    pf.filter(:missing, :field => 'spam')
    pf.filter(:term, :college_id => self.college.id)
    pf.filter(:term, :user_id => self.user.id)
    pf.filter(:missing, :field => 'spam' )
    pf.filter(:terms, :user_type => self.user_type) unless

self.user_type.blank?

    pf.filter(:range, :created_at => {:gte => self.from_time})

unless self.from_time.blank?

    pf.filter(:range, :created_at => {:lte => self.to_time})

unless self.to_time.blank?

    pf.sort{|s| s.by :updated_at, self.sort_order}

end

elasticsearch query:

curl -X GET
"http://localhost:9200/development_posts/post/_search?from=0&load=true&page=1&per_page=10&size=10&pretty=true" -d

'{"sort":[{"updated_at":"desc"}],"filter":{"and":[{"missing":{"field":"spam"}},{"term":{"college_id":"4fb424a5addf32296f00013a"}},

{"missing":{"field":"spam"}},{"range":{"created_at":{"gte":"2012-06-05T00:00:00+05:30"}}},{"range":{"created_at":{"lte":"2012-06-05T23:59:59+05:30"}}}]},"size":10,"from":0}'

The results of the query gives me the documents in which spam
exists even though I am search only for document which have spam
document missing. I don't know what is the mistake I am doing. Can
anybody point me in the right direction?


(system) #5