Questions related to ES cluster architecture


(havetobe unknown) #1

Hi everyone,

I work as a sysadmin and I discovered Elastic Search about two months ago.
As I'm new to this technology, many questions are running through my mind
especially regarding cluster architecture.

We have a 3 nodes cluster running ES 0.90.3. Our customer's application is
using ES (with the mapper-attachment plugin) as a search engine for a wide
variety of documents (PDF, OpenXML documents, images, etc...). Each user is
then able to search for their own set of documents. Our customer, who is
also new to this technology, decided to design their architecture as
follows : 1 user -> 1 index containing their documents. Each index is split
into 6 shards + 1 replicas (i.e. 12 shards per index)

As I'm gathering more and more pieces of information each day, I found
myself watching this very interesting video : Big Data, Search and Analyticshttp://www.elasticsearch.org/videos/big-data-search-and-analytics/.
After watching this video, I think it's time to re-consider our cluster
design.

Indeed, as data is growing, we currently are in the following situation :

  • 1477 indices
  • about 17000 shards o_O
  • about 1700000 documents
  • about 25 GB of data

Therefore, I think it's WAY TOO BIG especially regarding shards. Plus, the
amount of documents is not that high. One of our node eventually crashed
and it took hours to fully recover. I think the actual design is clearly
the main culprit. That's why I'd like to have a better approach like
suggesting our customer to use ES "routing" capabilities, having less
indices and less shards per index. Maybe one index with 3 or 4 shards (to
match the number of nodes) and use routing based on user ID as the video
suggests.

So, I'd like to know if this should be the correct approach as our customer
data flow seem to match the "users data flow" definition mentioned in the
video or am I heading the wrong way (again;) ? What should be the "best
practices" in this situation ?

Thanks for your time

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Yes, using ES routing for less indices would also be my approach, I share
your analysis.

With index aliases
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-aliases.htmlyour
customer will not see much a difference in the API to the concrete
index approach. In the JSON result docs, some adjustment is needed to
reflect the concrete index in _index and the user id now being a term in
the doc.

You should take care of deletions. If users delete often many docs (which
seems not the case), deleting a concrete index could be a plus regarding
performance. Rotating index could help in that situation.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(havetobe unknown) #3

Thanks for your advice Jörg. I'm gonna look into that alias feature and see
what's possible when combined with routing.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4