I am writing a service that will be creating and managing user records.
100+ million of them. For each new user, service will generate a unique
user id and write it in database. Database is sharded based on unique user
id that gets generated.
Each user record has several fields. Now one of the requirement is that the
service be able to search if there exists a user with a matching field
value. So those fields are declared as index in database schema.
However since database is sharded based on primary key ( unique user id ).
I will need to search on all shards to find a user record that matches a
So to make that lookup fast. One thing i am thinking of doing is setting up
an ElasticSearch cluster. Service will write to the ES cluster every time
it creates a new user record. ES cluster will index the user record based
on the relevant fields.
My question is :
-- What kind of performance can i expect from ES here ? Assuming i have
100+million user records where 5 columns of each user record need to be
indexed. I know it depends on hardware config as well. But please assume a
well tuned hardware.
-- Here i am trying to use ES as a memcache alternative as ES provides me
multi-key-value store. So i want all dataset to be in memory and does not
need to be durable. Is ES right tool to do that ?
Any comment/recommendation based on experience with ElasticSearch for large
dataset is very much appreciated.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0338e27-9fd0-43eb-86a7-ab6ed590a0f0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.