Hi all,
I need a general advice on how to structure: cluster, nodes, sharding and
indexes.
I have a small database of about 8 million articles that take about 320G in
json format including all de-normalized sub fields (objects).
Articles are added at about 10k per day. The majority of queries (about
90%) includes results for last 3 months (900k articles).
I use modified (hacked) Apache Solr for "histogram/facet" style analysis on
article fields (patched Apache Solr stats module).
For these latest 3 articles the query response should be fast (sub second),
but query response for all other articles or larger interval can be greater.
Articles are mostly queried and analysed by all kinds of tags and a date
inserted field.
I've read a book on Elasticsearch and it seems very promising (though I
still didn't get my head around all the features of the Elasticsearch).
I would like to get as many as possible suggestions on how to build a
cluster that would replace current Apache Solr+Mongodb installation.
Mostly to reduce sys admin and development/maintenance complexity.
I would like to move mostly used data as close as possible to the front end
nodes (limited disk space and ram), while having an option of rare
searching on distant whole dataset nodes (lots of disk space but still
limited ram).
To summarize:
How would one build a cluster having light weight index with only latest
articles and heavy weight index with all articles?
Is it better to just forget this concept and use date based sharding
by-the-book?
Is it possible to move replication of selected shards closer to frontend
nodes?
Substantial size of an article json object are child or parent objects
(mostly repeated small sets (<100k in total) of related tags, authors,
publisher etc...). Is it ok to use built in parent-child functionality for
these article fields since most of the analysis is done with data
aggregation of those?
Thanks for any suggestions in advance!
Nikola
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.