An issue with must_not clause


(Ján Paľko) #1

Hi,

We are using elastisearch for sports events which requires geoblocking due
to rights. Below is final filter from Java API. I removed parts which are
not needed for this issue.

{
"filter" : {
"and" : {
"filters" : [ {
"query" : {
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"term" : {
"blockByDefault" : true
}
}, {
"term" : {
"allowedCountryCodes" : "it"
}
} ]
}
}, {
"bool" : {
"must" : {
"term" : {
"blockByDefault" : false
}
},
"must_not" : {
"term" : {
"blockedCountryCodes" : "it"
}
}
}
} ],
"minimum_number_should_match" : 1
}
}
} ]
}
}
}

When blockByDefault is set to true, then it should behave as whitelist (allow
only countries defined in allowedCountryCodes), otherwise blacklist (block
only countries defined in blockedCountryCodes).

Here is the mapping:

{
"event" : {
"properties" : {
"blockByDefault" : {"type" : "boolean", "store" : "yes", "index" :
"not_analyzed"},
"allowedCountryCodes" : {"type" : "string", "store" : "yes", "index"
: "not_analyzed"},
"blockedCountryCodes" : {"type" : "string", "store" : "yes", "index"
: "not_analyzed"}
}
}
}

Here is test data:

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}

Expected number of results for the following countries are as follows:

  • gb 5
  • de 3
  • sk 2
  • it 0 - this is problem, as it returns 3 results (the ones with
    blockByDefault set to false) instead of 0.

Everything works except the last case, when all events should be blocked
from Italy, but it doesn't work here.

I've tried to rewrite it in other ways, but still the same results. I've
also tried to filter the events where there is Italy in blockedCountryCodes
and it worked, but then when I added blockByDefault "switch", it returned
bad results. What am I doing wrong?

Thanks
Jan


(David Pilato) #2

You are using default analyzer.
"It" is a common english word which is ignored.

Applying a keyword mapping (with lowercase filter) should solve your issue.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 19 mars 2012 à 15:26, Ján Paľko jan.palko@gmail.com a écrit :

Hi,

We are using elastisearch for sports events which requires geoblocking due to rights. Below is final filter from Java API. I removed parts which are not needed for this issue.

{
"filter" : {
"and" : {
"filters" : [ {
"query" : {
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"term" : {
"blockByDefault" : true
}
}, {
"term" : {
"allowedCountryCodes" : "it"
}
} ]
}
}, {
"bool" : {
"must" : {
"term" : {
"blockByDefault" : false
}
},
"must_not" : {
"term" : {
"blockedCountryCodes" : "it"
}
}
}
} ],
"minimum_number_should_match" : 1
}
}
} ]
}
}
}

When blockByDefault is set to true, then it should behave as whitelist (allow only countries defined in allowedCountryCodes), otherwise blacklist (block only countries defined in blockedCountryCodes).

Here is the mapping:
{
"event" : {
"properties" : {
"blockByDefault" : {"type" : "boolean", "store" : "yes", "index" : "not_analyzed"},
"allowedCountryCodes" : {"type" : "string", "store" : "yes", "index" : "not_analyzed"},
"blockedCountryCodes" : {"type" : "string", "store" : "yes", "index" : "not_analyzed"}
}
}
}

Here is test data:
{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes" : ["it", "sk"]}

Expected number of results for the following countries are as follows:
gb 5
de 3
sk 2
it 0 - this is problem, as it returns 3 results (the ones with blockByDefault set to false) instead of 0.
Everything works except the last case, when all events should be blocked from Italy, but it doesn't work here.

I've tried to rewrite it in other ways, but still the same results. I've also tried to filter the events where there is Italy in blockedCountryCodes and it worked, but then when I added blockByDefault "switch", it returned bad results. What am I doing wrong?

Thanks
Jan


(Ján Paľko) #3

Many Thanks :slight_smile: It works. But it's strange that I had tried to set index to
not_analyzed and it didn't work and now does.

Jan

On Monday, March 19, 2012 6:29:33 PM UTC+1, David Pilato wrote:

You are using default analyzer.
"It" is a common english word which is ignored.

Applying a keyword mapping (with lowercase filter) should solve your issue.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 19 mars 2012 à 15:26, Ján Paľko a écrit :

Hi,

We are using elastisearch for sports events which requires geoblocking due
to rights. Below is final filter from Java API. I removed parts which are
not needed for this issue.

{
"filter" : {
"and" : {
"filters" : [ {
"query" : {
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"term" : {
"blockByDefault" : true
}
}, {
"term" : {
"allowedCountryCodes" : "it"
}
} ]
}
}, {
"bool" : {
"must" : {
"term" : {
"blockByDefault" : false
}
},
"must_not" : {
"term" : {
"blockedCountryCodes" : "it"
}
}
}
} ],
"minimum_number_should_match" : 1
}
}
} ]
}
}
}

When blockByDefault is set to true, then it should behave as *
whitelist* (allow only countries defined in allowedCountryCodes),
otherwise blacklist (block only countries defined in
blockedCountryCodes).

Here is the mapping:

{
"event" : {
"properties" : {
"blockByDefault" : {"type" : "boolean", "store" : "yes", "index" :
"not_analyzed"},
"allowedCountryCodes" : {"type" : "string", "store" : "yes", "index"
: "not_analyzed"},
"blockedCountryCodes" : {"type" : "string", "store" : "yes", "index"
: "not_analyzed"}
}
}
}

Here is test data:

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}

Expected number of results for the following countries are as follows:

  • gb 5
  • de 3
  • sk 2
  • it 0 - this is problem, as it returns 3 results (the ones with
    blockByDefault set to false) instead of 0.

Everything works except the last case, when all events should be blocked
from Italy, but it doesn't work here.

I've tried to rewrite it in other ways, but still the same results. I've
also tried to filter the events where there is Italy in blockedCountryCodes
and it worked, but then when I added blockByDefault "switch", it returned
bad results. What am I doing wrong?

Thanks
Jan


(Shay Banon) #4

One more thing, you are using a filter, and then wrap it with a query, its
a shame, you should use filters all the way in this case (bool filter, with
term filters for example).

2012/3/20 Ján Paľko jan.palko@gmail.com

Many Thanks :slight_smile: It works. But it's strange that I had tried to set index to
not_analyzed and it didn't work and now does.

Jan

On Monday, March 19, 2012 6:29:33 PM UTC+1, David Pilato wrote:

You are using default analyzer.
"It" is a common english word which is ignored.

Applying a keyword mapping (with lowercase filter) should solve your
issue.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 19 mars 2012 à 15:26, Ján Paľko a écrit :

Hi,

We are using elastisearch for sports events which requires geoblocking
due to rights. Below is final filter from Java API. I removed parts which
are not needed for this issue.

{
"filter" : {
"and" : {
"filters" : [ {
"query" : {
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"term" : {
"blockByDefault" : true
}
}, {
"term" : {
"allowedCountryCodes" : "it"
}
} ]
}
}, {
"bool" : {
"must" : {
"term" : {
"blockByDefault" : false
}
},
"must_not" : {
"term" : {
"blockedCountryCodes" : "it"
}
}
}
} ],
"minimum_number_should_match" : 1
}
}
} ]
}
}
}

When blockByDefault is set to true, then it should behave as *
whitelist* (allow only countries defined in allowedCountryCodes),
otherwise blacklist (block only countries defined in
blockedCountryCodes).

Here is the mapping:

{
"event" : {
"properties" : {
"blockByDefault" : {"type" : "boolean", "store" : "yes", "index" :
"not_analyzed"},
"allowedCountryCodes" : {"type" : "string", "store" : "yes",
"index" : "not_analyzed"},
"blockedCountryCodes" : {"type" : "string", "store" : "yes",
"index" : "not_analyzed"}
}
}
}

Here is test data:

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"], "blockedCountryCodes"
: ["it", "sk"]}

Expected number of results for the following countries are as follows:

  • gb 5
  • de 3
  • sk 2
  • it 0 - this is problem, as it returns 3 results (the ones with
    blockByDefault set to false) instead of 0.

Everything works except the last case, when all events should be blocked
from Italy, but it doesn't work here.

I've tried to rewrite it in other ways, but still the same results. I've
also tried to filter the events where there is Italy in blockedCountryCodes
and it worked, but then when I added blockByDefault "switch", it returned
bad results. What am I doing wrong?

Thanks
Jan


(Ján Paľko) #5

Is there any impact on performance? When I use just a query, it's
slower as it does scoring. Is it the same when I use it inside filter?

Thanks

On Tue, Mar 20, 2012 at 11:58, Shay Banon kimchy@gmail.com wrote:

One more thing, you are using a filter, and then wrap it with a query, its a
shame, you should use filters all the way in this case (bool filter, with
term filters for example).

2012/3/20 Ján Paľko jan.palko@gmail.com

Many Thanks :slight_smile: It works. But it's strange that I had tried to set index to
not_analyzed and it didn't work and now does.

Jan

On Monday, March 19, 2012 6:29:33 PM UTC+1, David Pilato wrote:

You are using default analyzer.
"It" is a common english word which is ignored.

Applying a keyword mapping (with lowercase filter) should solve your
issue.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 19 mars 2012 à 15:26, Ján Paľko a écrit :

Hi,

We are using elastisearch for sports events which requires geoblocking
due to rights. Below is final filter from Java API. I removed parts which
are not needed for this issue.

{
"filter" : {
"and" : {
"filters" : [ {
"query" : {
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"term" : {
"blockByDefault" : true
}
}, {
"term" : {
"allowedCountryCodes" : "it"
}
} ]
}
}, {
"bool" : {
"must" : {
"term" : {
"blockByDefault" : false
}
},
"must_not" : {
"term" : {
"blockedCountryCodes" : "it"
}
}
}
} ],
"minimum_number_should_match" : 1
}
}
} ]
}
}
}

When blockByDefault is set to true, then it should behave
as whitelist (allow only countries defined in allowedCountryCodes),
otherwise blacklist (block only countries defined in blockedCountryCodes).

Here is the mapping:

{
"event" : {
"properties" : {
"blockByDefault" : {"type" : "boolean", "store" : "yes", "index" :
"not_analyzed"},
"allowedCountryCodes" : {"type" : "string", "store" : "yes",
"index" : "not_analyzed"},
"blockedCountryCodes" : {"type" : "string", "store" : "yes",
"index" : "not_analyzed"}
}
}
}

Here is test data:

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}

Expected number of results for the following countries are as follows:

gb 5
de 3
sk 2
it 0 - this is problem, as it returns 3 results (the ones with
blockByDefault set to false) instead of 0.

Everything works except the last case, when all events should be blocked
from Italy, but it doesn't work here.

I've tried to rewrite it in other ways, but still the same results. I've
also tried to filter the events where there is Italy in blockedCountryCodes
and it worked, but then when I added blockByDefault "switch", it returned
bad results. What am I doing wrong?

Thanks
Jan


(Shay Banon) #6

If you transform the internal query in the filter to be based on filters,
then you will gain the benefits of filter caching. Scoring will not be
computed in any case.

2012/3/20 Ján Paľko jan.palko@gmail.com

Is there any impact on performance? When I use just a query, it's
slower as it does scoring. Is it the same when I use it inside filter?

Thanks

On Tue, Mar 20, 2012 at 11:58, Shay Banon kimchy@gmail.com wrote:

One more thing, you are using a filter, and then wrap it with a query,
its a
shame, you should use filters all the way in this case (bool filter, with
term filters for example).

2012/3/20 Ján Paľko jan.palko@gmail.com

Many Thanks :slight_smile: It works. But it's strange that I had tried to set index
to

not_analyzed and it didn't work and now does.

Jan

On Monday, March 19, 2012 6:29:33 PM UTC+1, David Pilato wrote:

You are using default analyzer.
"It" is a common english word which is ignored.

Applying a keyword mapping (with lowercase filter) should solve your
issue.

HTH
David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 19 mars 2012 à 15:26, Ján Paľko a écrit :

Hi,

We are using elastisearch for sports events which requires geoblocking
due to rights. Below is final filter from Java API. I removed parts
which

are not needed for this issue.

{
"filter" : {
"and" : {
"filters" : [ {
"query" : {
"bool" : {
"should" : [ {
"bool" : {
"must" : [ {
"term" : {
"blockByDefault" : true
}
}, {
"term" : {
"allowedCountryCodes" : "it"
}
} ]
}
}, {
"bool" : {
"must" : {
"term" : {
"blockByDefault" : false
}
},
"must_not" : {
"term" : {
"blockedCountryCodes" : "it"
}
}
}
} ],
"minimum_number_should_match" : 1
}
}
} ]
}
}
}

When blockByDefault is set to true, then it should behave
as whitelist (allow only countries defined in allowedCountryCodes),
otherwise blacklist (block only countries defined in
blockedCountryCodes).

Here is the mapping:

{
"event" : {
"properties" : {
"blockByDefault" : {"type" : "boolean", "store" : "yes", "index"
:

"not_analyzed"},
"allowedCountryCodes" : {"type" : "string", "store" : "yes",
"index" : "not_analyzed"},
"blockedCountryCodes" : {"type" : "string", "store" : "yes",
"index" : "not_analyzed"}
}
}
}

Here is test data:

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}

{"blockByDefault" : true, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}
{"blockByDefault" : false, "allowedCountryCodes" : ["gb", "sk"],
"blockedCountryCodes" : ["it", "sk"]}

Expected number of results for the following countries are as follows:

gb 5
de 3
sk 2
it 0 - this is problem, as it returns 3 results (the ones with
blockByDefault set to false) instead of 0.

Everything works except the last case, when all events should be
blocked

from Italy, but it doesn't work here.

I've tried to rewrite it in other ways, but still the same results.
I've

also tried to filter the events where there is Italy in
blockedCountryCodes

and it worked, but then when I added blockByDefault "switch", it
returned

bad results. What am I doing wrong?

Thanks
Jan


(system) #7