Bug in DisMaxQuery/ConstantScoreQuery?!


(derrickburns) #1

I may have found a bug in the handling of the DisMaxQuery.

Here is my code:

    DateHistogramFacetBuilder afb= new

DateHistogramFacetBuilder("histo1").field("created_at").interval(interval).valueScript("doc.score");

    Map<String, Float> map = wm.getMap();
    DisMaxQueryBuilder disMaxQueryBuilder = new

DisMaxQueryBuilder();
for (Map.Entry<String, Float> e : map.entrySet()) {
ConstantScoreQueryBuilder c = new
ConstantScoreQueryBuilder(get(e.getKey())).boost(e.getValue());
disMaxQueryBuilder.add(c);
}
return
client.prepareSearch().addFacet(afb).addField("text").setQuery(disMaxQueryBuilder).setIndices(index);

where get() is defined:

public FilterBuilder get(String word) {
    return

FilterBuilders.andFilter(FilterBuilders.termFilter("text", "feel"),
FilterBuilders.termFilter("text", word));
}

When passed a map that has only a single entry, say ("stupid" ->
-0.72), I get values that are all positive whole numbers, equal to
count * -1.0.

When passed a map that has two values, say ("stupid" -> -0.72,
"awordthatdoesnotexist" -> 1.0), I get facet values equal to count *
-0.72, as expected.

Is this an ES bug? I cannot imagine that it is not.


(derrickburns) #2

DEFINITE bug in ES code.

I ran the Query with 5 sets of values for subqueries, varying the
number of subqueries between 1 and 2 and varying the values of the
boost factors (positive and negative values with absolute value <=
1.0). The results I got are inconsistent with my interpretation of
the DisMaxQuery.

I suspect that there is a mistake in the use of the filter results
cache or in the results cache itself, but I have not looked at the ES
code.

Here are the results:

key: calm value: 0.934
[JGAP][14:17:16] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: 70.0
(wrong, value should be 70 * 0.934)

key: calm value: 0.936
key: kadfkak value: -1.0
[JGAP][14:19:30] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: 65.51999926567078
(correct)

key: calm value: -0.86
[JGAP][14:18:03] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -70.0
(wrong, value should be 70 * -086)

key: calm value: -0.4
key: kadfkak value: -1.0
[JGAP][14:21:00] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -28.000000417232513
(correct)

key: calm value: -0.4
[JGAP][14:21:50] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -70.0
(wrong, value should be 70 * -0.4)

key: calm value: 2.0
key: jkadlkfjhalskdfjhlkdjf value: 1.0
[JGAP][14:31:16] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: 70.0
(wrong, value should be 70 * 2.0)

key: calm value: -2.0
key: jkadlkfjhalskdfjhlkdjf value: 1.0
[JGAP][14:32:24] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -70.0
(wrong, value should be 70 * -2.0)

key: calm value: -0.35
key: jkadlkfjhalskdfjhlkdjf value: 1.0
[JGAP][14:33:14] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -24.499999582767487
(correct)

On Jan 3, 1:03 pm, Derrick derrickrbu...@gmail.com wrote:

I may have found a bug in the handling of the DisMaxQuery.

Here is my code:

    DateHistogramFacetBuilder afb= new

DateHistogramFacetBuilder("histo1").field("created_at").interval(interval). valueScript("doc.score");

    Map<String, Float> map = wm.getMap();
    DisMaxQueryBuilder disMaxQueryBuilder = new

DisMaxQueryBuilder();
for (Map.Entry<String, Float> e : map.entrySet()) {
ConstantScoreQueryBuilder c = new
ConstantScoreQueryBuilder(get(e.getKey())).boost(e.getValue());
disMaxQueryBuilder.add(c);
}
return
client.prepareSearch().addFacet(afb).addField("text").setQuery(disMaxQueryB uilder).setIndices(index);

where get() is defined:

public FilterBuilder get(String word) {
    return

FilterBuilders.andFilter(FilterBuilders.termFilter("text", "feel"),
FilterBuilders.termFilter("text", word));
}

When passed a map that has only a single entry, say ("stupid" ->
-0.72), I get values that are all positive whole numbers, equal to
count * -1.0.

When passed a map that has two values, say ("stupid" -> -0.72,
"awordthatdoesnotexist" -> 1.0), I get facet values equal to count *
-0.72, as expected.

Is this an ES bug? I cannot imagine that it is not.


(Karussell) #3

Hmmh, it is a bit unclear what your aim is and what you expect ... I
think it isn't not necessarily a bug it is more:
"inconsistent with your interpretation of the DisMaxQuery." :wink:

Be sure you understand the implication of a negative boost:
http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg08849.html

Peter.

On 3 Jan., 23:33, Derrick derrickrbu...@gmail.com wrote:

DEFINITE bug in ES code.

I ran the Query with 5 sets of values for subqueries, varying the
number of subqueries between 1 and 2 and varying the values of the
boost factors (positive and negative values with absolute value <=
1.0). The results I got are inconsistent with my interpretation of
the DisMaxQuery.

I suspect that there is a mistake in the use of the filter results
cache or in the results cache itself, but I have not looked at the ES
code.

Here are the results:

key: calm value: 0.934
[JGAP][14:17:16] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: 70.0
(wrong, value should be 70 * 0.934)

key: calm value: 0.936
key: kadfkak value: -1.0
[JGAP][14:19:30] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: 65.51999926567078
(correct)

key: calm value: -0.86
[JGAP][14:18:03] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -70.0
(wrong, value should be 70 * -086)

key: calm value: -0.4
key: kadfkak value: -1.0
[JGAP][14:21:00] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -28.000000417232513
(correct)

key: calm value: -0.4
[JGAP][14:21:50] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -70.0
(wrong, value should be 70 * -0.4)

key: calm value: 2.0
key: jkadlkfjhalskdfjhlkdjf value: 1.0
[JGAP][14:31:16] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: 70.0
(wrong, value should be 70 * 2.0)

key: calm value: -2.0
key: jkadlkfjhalskdfjhlkdjf value: 1.0
[JGAP][14:32:24] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -70.0
(wrong, value should be 70 * -2.0)

key: calm value: -0.35
key: jkadlkfjhalskdfjhlkdjf value: 1.0
[JGAP][14:33:14] INFO EsSeriesGenerator - time: Thu Jun 11 17:00:00
PDT 2009 count: 70 value: -24.499999582767487
(correct)

On Jan 3, 1:03 pm, Derrick derrickrbu...@gmail.com wrote:

I may have found a bug in the handling of the DisMaxQuery.

Here is my code:

    DateHistogramFacetBuilder afb= new

DateHistogramFacetBuilder("histo1").field("created_at").interval(interval). valueScript("doc.score");

    Map<String, Float> map = wm.getMap();
    DisMaxQueryBuilder disMaxQueryBuilder = new

DisMaxQueryBuilder();
for (Map.Entry<String, Float> e : map.entrySet()) {
ConstantScoreQueryBuilder c = new
ConstantScoreQueryBuilder(get(e.getKey())).boost(e.getValue());
disMaxQueryBuilder.add(c);
}
return
client.prepareSearch().addFacet(afb).addField("text").setQuery(disMaxQueryB uilder).setIndices(index);

where get() is defined:

public FilterBuilder get(String word) {
    return

FilterBuilders.andFilter(FilterBuilders.termFilter("text", "feel"),
FilterBuilders.termFilter("text", word));
}

When passed a map that has only a single entry, say ("stupid" ->
-0.72), I get values that are all positive whole numbers, equal to
count * -1.0.

When passed a map that has two values, say ("stupid" -> -0.72,
"awordthatdoesnotexist" -> 1.0), I get facet values equal to count *
-0.72, as expected.

Is this an ES bug? I cannot imagine that it is not.


(system) #4