I'am working on a new webapp which is dedicated to search documents on
We intend to use elasticsearch as a search engine and are very keen on it (
@Kimchy : fantastic job ! ; ) ) .
the documents we want to index are quite complex : several data levels ,
so we use nested fields in our mapping;
the space used in the source (3 databases couchDB) is 900 Go by year,
these 3 differents databases in couchDB are indexed in ES on 1 cluster:
the space used by all the indexes in ES is about 2.1 To a year,
we have an index per month per database source (12 indexes by year per
each index has 2 types and 1 replica, 5 shards
the total number of indexed documents is 15 Millions
the data indexed are stored on a disk bay (RAID-5)
several fields (about 150) are opened for search
we intend to use facets in queries (this will be a new functionnality and
could increase the numbers of queries done and users logged),
each query is limited to a search period of only 1 year
1000 differents users log every day to execute a mean of 2 queries
We are facing the problem of the architecture to start (number of servers,
number of CPUs, power, RAM, etc ..)
We need to have a scalable solution because we will have to index 4 years
of datas without decreasing perfs.
Has anyone an idea of the best approach ?
Help will be very usefull and appreciated.