Facet returns different total at different times with the latest 0.90 release

Hello,

I have a strange problem. My query returns different results at different
times.
The query:
curl -XPOST localhost:9200/content/bb/_seach?pretty=1 -d '
{"query":{"match_all":{}},"facets":{"facet":{"terms":{"field":"country","size":10000,"order":"term"}}}}'

on the first go returns:

...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,

  •  "total" : 7256,*
    "other" : 0,
    "terms" : [ {
      "term" : "albania",
      "count" : 1
    }, {
      "term" : "argentina",
      "count" : 4
    

...
...
On the second go, the same query returns:
...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,

  •  "total" : 6948,*
    "other" : 0,
    "terms" : [ {
      "term" : "albania",
      "count" : 1
    }, {
      "term" : "argentina",
      "count" : 4
    

...
...

Please note the difference in the "total". Down the lines in the results, I
find differences in numbers for individual countries. I am not sure why. My
index is defined as:
Analyzer -
{
"index": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Mapping -

"bb" : {
    "properties" : {
        "content": {
            "type": "object",
            "properties":{
                "centroid":{
                    "type":"geo_point"
                  },
                 "categoryID": {
                       "type" : "string",
                       "index" : "not_analyzed"
                 },
            "address":{
                "type":"object",
                "properties":{
                    "country":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                     "city":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                    "postalCode":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                   "houseNumber":{
                        "type":"string"
                        }
                    }
                }    
            }
        }        
    }
}

}

Any ideas why I get different results for the same query at different times?
The elasticsearch cluster is having two nodes running in Amazon Web Service.

thanks a lot for your help
Best,
Manish

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Manish,

On 24 May 2013, at 13:20, Manish Singh singhmanishp@gmail.com wrote:

Hello,

I have a strange problem. My query returns different results at different times.
The query:
curl -XPOST localhost:9200/content/bb/_seach?pretty=1 -d '
{"query":{"match_all":{}},"facets":{"facet":{"terms":{"field":"country","size":10000,"order":"term"}}}}'

on the first go returns:

...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,
"total" : 7256,
"other" : 0,
"terms" : [ {
"term" : "albania",
"count" : 1
}, {
"term" : "argentina",
"count" : 4
...
...
On the second go, the same query returns:
...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,
"total" : 6948,
"other" : 0,
"terms" : [ {
"term" : "albania",
"count" : 1
}, {
"term" : "argentina",
"count" : 4
...
...

Please note the difference in the "total". Down the lines in the results, I find differences in numbers for individual countries. I am not sure why. My index is defined as:
Analyzer -
{
"index": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Mapping -

"bb" : {
    "properties" : {
        "content": {
            "type": "object",
            "properties":{
                "centroid":{
                    "type":"geo_point"
                  },
                 "categoryID": {
                       "type" : "string",
                       "index" : "not_analyzed"
                 },
            "address":{
                "type":"object",
                "properties":{
                    "country":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                     "city":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                    "postalCode":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                   "houseNumber":{
                        "type":"string"
                        }
                    }
                }    
            }
        }        
    }
}

}

Any ideas why I get different results for the same query at different times?
The elasticsearch cluster is having two nodes running in Amazon Web Service.

Could it be possible that you're running into:

https://github.com/elasticsearch/elasticsearch/issues/1305

How many shards are in your index?

Cheers,
Dan

--
Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Dan,

I have 5 shards in my index. I did not experience the problem with the
older version of the elasticsearch 0.20.5.

Is there any other workaround other then recreating the index with 1 shard?

Cheers,
Manish

On Fri, May 24, 2013 at 9:10 PM, Dan Fairs dan.fairs@gmail.com wrote:

Hi Manish,

On 24 May 2013, at 13:20, Manish Singh singhmanishp@gmail.com wrote:

Hello,

I have a strange problem. My query returns different results at different
times.
The query:
curl -XPOST localhost:9200/content/bb/_seach?pretty=1 -d '

{"query":{"match_all":{}},"facets":{"facet":{"terms":{"field":"country","size":10000,"order":"term"}}}}'

on the first go returns:

...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,

  •  "total" : 7256,*
    "other" : 0,
    "terms" : [ {
      "term" : "albania",
      "count" : 1
    }, {
      "term" : "argentina",
      "count" : 4
    

...
...
On the second go, the same query returns:
...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,

  •  "total" : 6948,*
    "other" : 0,
    "terms" : [ {
      "term" : "albania",
      "count" : 1
    }, {
      "term" : "argentina",
      "count" : 4
    

...
...

Please note the difference in the "total". Down the lines in the results,
I find differences in numbers for individual countries. I am not sure why.
My index is defined as:
Analyzer -
{
"index": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Mapping -

"bb" : {
    "properties" : {
        "content": {
            "type": "object",
            "properties":{
                "centroid":{
                    "type":"geo_point"
                  },
                 "categoryID": {
                       "type" : "string",
                       "index" : "not_analyzed"
                 },
            "address":{
                "type":"object",
                "properties":{
                    "country":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                     "city":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                    "postalCode":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                   "houseNumber":{
                        "type":"string"
                        }
                    }
                }
            }
        }
    }
}

}

Any ideas why I get different results for the same query at different
times?
The elasticsearch cluster is having two nodes running in Amazon Web
Service.

Could it be possible that you're running into:

terms facet gives wrong count with n_shards > 1 · Issue #1305 · elastic/elasticsearch · GitHub

How many shards are in your index?

Cheers,
Dan

--
Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/qq9-27vOkrY/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Manish,

Hi Dan,

I have 5 shards in my index. I did not experience the problem with the older version of the elasticsearch 0.20.5.

Is there any other workaround other then recreating the index with 1 shard?

The workaround suggested by Shay (I can't remember if it's in that issue, or whether it was in related discussion on the mailing list) was to increase the size parameter in your facet - multiply it by the number of shards, basically. So, in your example below, you'd ask for a size of 50,000 rather than 10,000, assuming 5 shards.

We've been using the 1-shard workaround to date. It's worked OK for us, but we do have time-based data (so a new index every week) and a relatively small cluster. If you aren't regularly creating new indices, or your cluster has relatively few nodes (so you'll end up with hotspots for indexing/search) then you'll probably want to try the facet size workaround.

If those don't work for you, then maybe you're not experiencing the problem described in the issue! (Especially as it used to work OK... we're still on 0.19.8).

Cheers,
Dan

Cheers,
Manish

On Fri, May 24, 2013 at 9:10 PM, Dan Fairs dan.fairs@gmail.com wrote:
Hi Manish,

On 24 May 2013, at 13:20, Manish Singh singhmanishp@gmail.com wrote:

Hello,

I have a strange problem. My query returns different results at different times.
The query:
curl -XPOST localhost:9200/content/bb/_seach?pretty=1 -d '
{"query":{"match_all":{}},"facets":{"facet":{"terms":{"field":"country","size":10000,"order":"term"}}}}'

on the first go returns:

...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,
"total" : 7256,
"other" : 0,
"terms" : [ {
"term" : "albania",
"count" : 1
}, {
"term" : "argentina",
"count" : 4
...
...
On the second go, the same query returns:
...
...
"facets" : {
"facet" : {
"_type" : "terms",
"missing" : 0,
"total" : 6948,
"other" : 0,
"terms" : [ {
"term" : "albania",
"count" : 1
}, {
"term" : "argentina",
"count" : 4
...
...

Please note the difference in the "total". Down the lines in the results, I find differences in numbers for individual countries. I am not sure why. My index is defined as:
Analyzer -
{
"index": {
"analysis": {
"analyzer": {
"string_lowercase": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}

Mapping -

"bb" : {
    "properties" : {
        "content": {
            "type": "object",
            "properties":{
                "centroid":{
                    "type":"geo_point"
                  },
                 "categoryID": {
                       "type" : "string",
                       "index" : "not_analyzed"
                 },
            "address":{
                "type":"object",
                "properties":{
                    "country":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                     "city":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                    "postalCode":{
                        "type":"string",
                        "analyzer": "string_lowercase"
                        },
                   "houseNumber":{
                        "type":"string"
                        }
                    }
                }    
            }
        }        
    }
}

}

Any ideas why I get different results for the same query at different times?
The elasticsearch cluster is having two nodes running in Amazon Web Service.

Could it be possible that you're running into:

https://github.com/elasticsearch/elasticsearch/issues/1305

How many shards are in your index?

Cheers,
Dan

--
Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/qq9-27vOkrY/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Dan Fairs | dan.fairs@gmail.com | @danfairs | secondsync.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.