"aggregations" do not work any more (index corrupt ?) - resolved


(A.Brod) #1

Hi all,

I want to get all server items stored within the database. Now I have a problem within the following aggregations request:

POST /index/item/_search
{
    "size": 0,
    "aggregations": {
        "agg": {
            "terms": {
                "field": "server",
                "size": 0
            }
        }
    }
}

it worked correctly until last week and returned the correct servers.

In the meantime we got a "data too large" Error. According to Limiting Memory Usage - Elasticsearch we increased the memory ... and now it works again.

But (unfortunately) now the buckets are empty

{
    "took": 296,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 15586945,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "agg": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
        }
    }
}

The mapping for this field is:

{
    "index": {
        "mappings": {
            "item": {
                "properties": {
                    ...,
                    "server": {
                        "type": "string",
                        "index": "not_analyzed"
                    }
                }
            }
        }
    }
}

has any one, an idea what may be wrong ?

Regards,
Andreas


(Mark Walkom) #2

Elasticsearch is not a database :wink:

Does the data still exist, check the _cat APIs to make sure your indices are still there. They should be, as restarting isn't destructive.


(A.Brod) #3

Hi Mark,

thanks for your reply. You are right elasticsearch is not a database ... it is document store :smirk: .
It is much faster than any database I ever used :sunglasses:.

I checked the index

GET /_cat/indices/?v
health status index   pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   .kibana   1   1          2            0      9.7kb          9.7kb 
yellow open   arctic    5   1   17393295      3706667      313gb          313gb 

so the index seems to be ok.

The fields are also listed with

GET /_cat/fielddata/?v
id                     host   ip        node            total server ... clientServerName requests ...
CvQnMDmJTNyB0BMsUH68NQ ...... 11.1.2.68 Thunderstrike 314.7mb     0b ...               0b   20.2mb ...

but the size for "server" is 0b ... but also for clientServerName (so I think, this may be OK?)

The data of the server field is available: A 'normal' search request returns:

GET /arctic/item/_search?fields=server,clientServerName,requests
{
    "took": 114,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 16831091,
        "max_score": 1,
        "hits": [
            {
                "_index": "arctic",
                "_type": "item",
                "_id": "2015052614322253007",
                "_score": 1,
                "fields": {
                    "server": ["server3"],
                    "clientServerName": ["WORLD"],
                    "requests": ["SendMail"]
                }
            },
            {
                "_index": "arctic",
                "_type": "item",
                "_id": "2015052417083254002",
                "_score": 1,
                "fields": {
                    "server": ["server4"],
                    "clientServerName": ["INFO"],
                    "requests": ["Display"]
                }
            },...
        ]
    }
}

The aggregations request for e.g. clientServerName

{
    "size": 0,
    "aggregations": {
        "agg": {
            "terms": {
                "field": "clientServerName",
                "size": 0
            }
        }
    }
}

returns results

{
    "took": 212,
    "timed_out": false,
    "_shards": {
        "total": 5, "successful": 5, "failed": 0
    },
    "hits": {
        "total": 16832876, "max_score": 0, "hits": []
    },
    "aggregations": {
        "agg": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "WORLD","doc_count": 5237311
                },
                {
                    "key": "INFO","doc_count": 3978026
                },...
            ]
        }
    }
}

All fields (e.g. server, clientServerName and requests) are defined as:

{
    "type" : "string",
    "index" : "not_analyzed"
}

So all fields return results except the field "server" !
... this is strange :confused:, because the data is available. Only the aggregations requests returns no data.

Do you have an idea, what we can check additionally ?

Regards,
Andreas

(P.S. I think one solution would be to "reindex" as described within the https://www.elastic.co/guide/en/elasticsearch/guide/current/reindex.html .... but this, I'd like to avoid (with already >300gb of data). Perhaps this would be the last solution.)


(David Pilato) #4

What I'm often saying is that elasticsearch is a search engine with storage capabilities but it's a search engine first.
On the opposite, databases are IMHO storage engines with search capabilities.

:smile:


(Duncan Pratt) #5

I too am experiencing the same behaviour.

Our mappings are slightly different in that my documents have a group object with id and avatar properties. But essentially, terms aggregations that were working on 1.4.3 are no longer returning results. (I recently migrated to 1.6.0 from 1.4.3 and the index was initially created in v1.0.2 - I think)

{
   "index" : {
      "mappings" : {
         "item" : {
            "properties" : {
	           ...,
               "group" : {
                  "type" : "object",
                  "properties" : {
                     "id" : {
	                "type" : "string",
                        "index" : "not_analyzed"
                     },
                     "avatar" : {
                        "type" : "string",
                        "index" : "not_analyzed"
                     }
                  }
               }
            }
         }
      }
   }
}

Aggregation request for group.id doesn't return any buckets

{
    "size": 0,
    "aggregations": {
        "agg": {
            "terms": {
                "field": "group.id",
                "size": 0
            }
        }
    }
}

i.e,

{
    "took" : 76,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
   "hits" : {
        "total" : 19106,
        "max_score" : 0,
        "hits" : [
        ]
   },
   "aggregations" : {
        "groups" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
            ]
        }
    }
}

Reindexing the data didn't rectify this for me either.

As a workaround, I'm using a script to generate the terms:

{
    "size": 0,
    "aggs": {
        "groups": {
            "terms": {
                "script": "doc['group.id'].value",
                "size": 0
            }
        }
    }
}

Which works and returns aggregate result just fine.

As an aside, I tried setting up a test index (on v1.6.0) using a combination of @abrod's mapping and my own and can't reproduce this behaviour - Terms aggregations for fields server or group.id both returned the correct results.
I also tried setting up a test index on 1.0.2 and migrated it via 1.4.3 to 1.6.0 but again I can't reproduce the issue.

Using a script to generate terms is working fine for me

"script": "doc['group.id'].value"

but it would be good to know if there is an issue with my index.

Regards
Duncan


(Mark Walkom) #6

It's better if you can start your own thread :slight_smile:


(A.Brod) #7

So,

It seems that there is a bug :beetle: in elasticsearch (or is this a feature ?)

I could identify (reproduce) the problem:

  1. create a mapping for app1

    POST /test/_mapping/app1
    {
    "app1": {
    "properties": {
    "server": {
    "type": "string",
    "index": "not_analyzed",
    "doc_values": true
    }
    }
    }
    }

and for app2 (but this time without "doc_values": true)

POST /test/_mapping/app2
{
    "app2": {
        "properties": {
            "server": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}
  1. add data

    POST /test/app1
    {
    "server": "10.1.1.180"
    }

and

POST /test/app2
{
    "server": "10.1.1.181"
}
  1. after this the search

    POST /test/app1/_search
    {
    "size": 0,
    "aggregations": {
    "agg": {
    "terms": {
    "field": "server",
    "size": 0
    }
    }
    }
    }

returns no data !

{
    "took": 16,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "agg": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
        }
    }
}

I added "doc_values": true within the mapping for app2 and everything works fine.
So the problem is resolved. :white_check_mark:

Do you think, this is a bug ? If yes, where can this be reported ?

regards,
Andreas


(Mark Walkom) #8

I just ran the same thing and got;

{
   "took": 67,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "agg": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "10.1.1.180",
               "doc_count": 1
            }
         ]
      }
   }
}

(A.Brod) #9

It seems, as if this problem is fixed.

Now I also get a correct result ...

The thing I changed: I updated to the newest version 1.7.1 of elasticsearch. The problem (where the error occurred) was (probably) within version 1.6.

Thanks for you support,
Andreas


(Mark Walkom) #10

It's unlikely to be a bug.


(system) #11