I'm planning for a system that will be indexing a large number (>100M/day) of telephone records. A new date based index will be created each day. Important fields within each document are the called and calling phone number. These fields are strings and would be "not_analyzed". I would have to assume each call has a distinct called number and therefore that there would be close to 100M unique values in the called number field each day.
My question is, can a phone number search across a week on an index with that level of cardinality be expected to perform reasonably well (say < 5 seconds) on a 3 to 5 node cluster (spinning disks)? Would a cluster of that size cope with that many incoming documents?
I realize there's no straight answer here, but any guidance would be appreciated.