Let's say I set my time range to be 2013-01-01 00:00:00.000 to 2013-01-31 23:59:59.999 (entire month of January 2013). Then I do a line graph visualization and keep the default y-axis setup which is a count of _all documents in my index in that time range, it produces 1,565 hits. (just a dot at 1,565, since I haven't defined an x-axis).
One of the fields in my documents is user_id. If I then change the y-axis to do a Count of user_id in that same time period, I get 1,565 hits. So far, nothing strange - every document has a user_id, as I expect.
Then I change the y-axis to be a Unique Count of user_id instead, I get 488 hits. So far, so good - 488 unique users this month.
If I then add a bucket - doesn't matter what kind, can be x-axis or split lines - on the scripted field term 'year' which is defined as doc['@timestamp'].getYear(), my hits go to 520.
I don't understand this - where are the extra documents coming from?? How can I have 488 unique user_ids in my time period Jan 2013, but then have 520 unique user_ids in time period Jan 2013 when I add year on the x-axis? "unique user_ids by year" should not differ from "unique user_ids" when the time period is just Jan 2013!
This only seems to affect Unique Count. If I do the original plain count (1,565) and then add the 'year' sub-bucket, I still get 1,565, which is what I expect.
So what am I not understanding about the interplay between Unique Count and buckets?