Being new to ES, i would like some advice on the following, please...
In asking a questions relating to using ES for web applications, suggestions have been made to have one index for things like user profiles, another index for data, etc., and several other ones for logs.
Having these all on a cluster with several web applications, this seems like things could get messy or disorganized.
In that case, are people using one cluster per application? I am a bit confused because when I read articles about indexing logs, they seem to refer to storing the data in multiple indices, rather than types within an index.
Secondly, why not have one index per app, with types for logs, user profiles, data, etc.?
Is there some benefit to using multiple indices rather than many types within an index for a web application, of so, what kinds of naming conventions are typical?
To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the requesting node, which then sorts all 50 results in order to select the overall top 10.
Now imagine that we ask for page 1,000—results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The requesting node then sorts through all 50,050 results and discards 50,040 of them!
You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.