Query to return entries not matching a given value


(Vincent Massol) #1

Hi,

I'm trying to perform an ES search to display entries that do not match a
given value.

Specifically I have a String field that is not_analyzed and that represents
a version. I need to find the way to write the Lucene query that says:
return all versions that don't have 'SNAPSHOT' in it.

Examples of versions I have:

  • 5.2
  • 5.3-SNAPSHOT
  • 5.2-milestone-2

I need a query that'll return 5.2 and 5.2-milestone-2 but NOT
5.3-SNAPSHOT...

It can be tested here: http://activeinstalls.xwiki.org/

So far I've tried:

distributionVersion:* AND -distributionVersion:SNAPSHOT

And

distributionVersion:* AND -distributionVersion:/.*SNAPSHOT/

And

distributionVersion:* AND -distributionVersion:/[0-9].*SNAPSHOT/

But none worked...

I believe this is because Lucene doesn't support "*" as the first char. Is
there a solution that doesn't involve adding a new field that says whether
the version is a snapshot or not?

Thanks a lot!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #2

If allow_leading_wildcard is set you should be able to start with a
wildcard.

On Wed, Oct 16, 2013 at 12:32 PM, Vincent Massol vmassol@gmail.com wrote:

Hi,

I'm trying to perform an ES search to display entries that do not match a
given value.

Specifically I have a String field that is not_analyzed and that
represents a version. I need to find the way to write the Lucene query that
says: return all versions that don't have 'SNAPSHOT' in it.

Examples of versions I have:

  • 5.2
  • 5.3-SNAPSHOT
  • 5.2-milestone-2

I need a query that'll return 5.2 and 5.2-milestone-2 but NOT
5.3-SNAPSHOT...

It can be tested here: http://activeinstalls.xwiki.org/

So far I've tried:

distributionVersion:* AND -distributionVersion:SNAPSHOT

And

distributionVersion:* AND -distributionVersion:/.*SNAPSHOT/

And

distributionVersion:* AND -distributionVersion:/[0-9].*SNAPSHOT/

But none worked...

I believe this is because Lucene doesn't support "*" as the first char. Is
there a solution that doesn't involve adding a new field that says whether
the version is a snapshot or not?

Thanks a lot!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #3

I actually think it is the way the query string is being parsed. Might be
a bug or might need some escaping. Your regex version should work, this
does:

curl -XPOST '
http://extensions.xwiki.org/activeinstalls/installs/_search?pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [{"regexp": {"distributionVersion":
".*SNAPSHOT"}}]
}
}
}
}
}'

Thanks,
Matt Weber

On Wed, Oct 16, 2013 at 9:38 AM, Nikolas Everett nik9000@gmail.com wrote:

If allow_leading_wildcard is set you should be able to start with a
wildcard.

On Wed, Oct 16, 2013 at 12:32 PM, Vincent Massol vmassol@gmail.comwrote:

Hi,

I'm trying to perform an ES search to display entries that do not match a
given value.

Specifically I have a String field that is not_analyzed and that
represents a version. I need to find the way to write the Lucene query that
says: return all versions that don't have 'SNAPSHOT' in it.

Examples of versions I have:

  • 5.2
  • 5.3-SNAPSHOT
  • 5.2-milestone-2

I need a query that'll return 5.2 and 5.2-milestone-2 but NOT
5.3-SNAPSHOT...

It can be tested here: http://activeinstalls.xwiki.org/

So far I've tried:

distributionVersion:* AND -distributionVersion:SNAPSHOT

And

distributionVersion:* AND -distributionVersion:/.*SNAPSHOT/

And

distributionVersion:* AND -distributionVersion:/[0-9].*SNAPSHOT/

But none worked...

I believe this is because Lucene doesn't support "*" as the first char.
Is there a solution that doesn't involve adding a new field that says
whether the version is a snapshot or not?

Thanks a lot!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Matt Weber) #4

Ok, this is due to the lowercase_expanded_terms option defaulting to true
on the Query String Query that is used in Kibana. What this means is
"/.*SNAPSHOT/" is being turned into "/.*snapshot/" which doesn't match your
terms since they are not analyzed and uppercase. I feel that
lowercase_expanded_terms should not be used on a regex query and will open
a bug report. In the meantime, if you want to use the following query
"-distributionVersion:/.*SNAPSHOT/" you will need to disable
lowercase_expanded_terms by disabling it in Kibaba source code file
querySrv.js line 106:

return ejs.QueryStringQuery(q.query || '*');

change it to this:

return ejs.QueryStringQuery(q.query || '*').lowercaseExpandedTerms(false);

Hope this helps,
Matt Weber

On Wed, Oct 16, 2013 at 10:16 AM, Matt Weber matt.weber@gmail.com wrote:

I actually think it is the way the query string is being parsed. Might be
a bug or might need some escaping. Your regex version should work, this
does:

curl -XPOST '
http://extensions.xwiki.org/activeinstalls/installs/_search?pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [{"regexp": {"distributionVersion":
".*SNAPSHOT"}}]
}
}
}
}
}'

Thanks,
Matt Weber

On Wed, Oct 16, 2013 at 9:38 AM, Nikolas Everett nik9000@gmail.comwrote:

If allow_leading_wildcard is set you should be able to start with a
wildcard.

On Wed, Oct 16, 2013 at 12:32 PM, Vincent Massol vmassol@gmail.comwrote:

Hi,

I'm trying to perform an ES search to display entries that do not match
a given value.

Specifically I have a String field that is not_analyzed and that
represents a version. I need to find the way to write the Lucene query that
says: return all versions that don't have 'SNAPSHOT' in it.

Examples of versions I have:

  • 5.2
  • 5.3-SNAPSHOT
  • 5.2-milestone-2

I need a query that'll return 5.2 and 5.2-milestone-2 but NOT
5.3-SNAPSHOT...

It can be tested here: http://activeinstalls.xwiki.org/

So far I've tried:

distributionVersion:* AND -distributionVersion:SNAPSHOT

And

distributionVersion:* AND -distributionVersion:/.*SNAPSHOT/

And

distributionVersion:* AND -distributionVersion:/[0-9].*SNAPSHOT/

But none worked...

I believe this is because Lucene doesn't support "*" as the first char.
Is there a solution that doesn't involve adding a new field that says
whether the version is a snapshot or not?

Thanks a lot!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Vincent Massol) #5

Thanks so much for your fast and great help guys!

I was able to fix my Kibana instance and also to fix my Java code to
support what I needed :slight_smile:

FTR here's my java code:
@Component
@Singleton
public class DefaultDataManager implements DataManager
{
@Inject
private JestClientManager jestClientManager;

@Override
public long getInstallCount(String query) throws Exception
{
    Map queryMap = new HashMap();
    queryMap.put("query", query);

    // This allows to write queries such as: 

-distributionVersion:*SNAPSHOT
queryMap.put("lowercase_expanded_terms", false);

    Map jsonMap = new HashMap();
    jsonMap.put("query_string", JSONObject.fromObject(queryMap));

    return executeCount(JSONObject.fromObject(jsonMap).toString());
}

private long executeCount(String query) throws Exception
{
    Count count = new Count.Builder(query)
        .addIndex("installs")
        .addType("install")
        .build();
    JestResult result = this.jestClientManager.getClient().execute(count

);
return ((Double) result.getValue("count")).longValue();
}
}

Great support,
Thanks
-Vincent Massol

On Wednesday, October 16, 2013 8:26:58 PM UTC+2, Matt Weber wrote:

Ok, this is due to the lowercase_expanded_terms option defaulting to true
on the Query String Query that is used in Kibana. What this means is
"/.*SNAPSHOT/" is being turned into "/.*snapshot/" which doesn't match your
terms since they are not analyzed and uppercase. I feel that
lowercase_expanded_terms should not be used on a regex query and will open
a bug report. In the meantime, if you want to use the following query
"-distributionVersion:/.*SNAPSHOT/" you will need to disable
lowercase_expanded_terms by disabling it in Kibaba source code file
querySrv.js line 106:

return ejs.QueryStringQuery(q.query || '*');

change it to this:

return ejs.QueryStringQuery(q.query || '*').lowercaseExpandedTerms(false);

Hope this helps,
Matt Weber

On Wed, Oct 16, 2013 at 10:16 AM, Matt Weber <matt....@gmail.com<javascript:>

wrote:

I actually think it is the way the query string is being parsed. Might
be a bug or might need some escaping. Your regex version should work, this
does:

curl -XPOST '
http://extensions.xwiki.org/activeinstalls/installs/_search?pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [{"regexp": {"distributionVersion":
".*SNAPSHOT"}}]
}
}
}
}
}'

Thanks,
Matt Weber

On Wed, Oct 16, 2013 at 9:38 AM, Nikolas Everett <nik...@gmail.com<javascript:>

wrote:

If allow_leading_wildcard is set you should be able to start with a
wildcard.

On Wed, Oct 16, 2013 at 12:32 PM, Vincent Massol <vma...@gmail.com<javascript:>

wrote:

Hi,

I'm trying to perform an ES search to display entries that do not match
a given value.

Specifically I have a String field that is not_analyzed and that
represents a version. I need to find the way to write the Lucene query that
says: return all versions that don't have 'SNAPSHOT' in it.

Examples of versions I have:

  • 5.2
  • 5.3-SNAPSHOT
  • 5.2-milestone-2

I need a query that'll return 5.2 and 5.2-milestone-2 but NOT
5.3-SNAPSHOT...

It can be tested here: http://activeinstalls.xwiki.org/

So far I've tried:

distributionVersion:* AND -distributionVersion:SNAPSHOT

And

distributionVersion:* AND -distributionVersion:/.*SNAPSHOT/

And

distributionVersion:* AND -distributionVersion:/[0-9].*SNAPSHOT/

But none worked...

I believe this is because Lucene doesn't support "*" as the first char.
Is there a solution that doesn't involve adding a new field that says
whether the version is a snapshot or not?

Thanks a lot!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6