Hello friends.
I have a question today, so I'm writing it down.
I'm going to write it down in detail today.
I ask for your understanding of my poor English skills.
I have a question when building a log server using Elasticsearch.
The conditions are as follows.
I will store at least 10,000 docs.
All docs must be available for inquiry.
i will going to add 1,000 docs per second.
In this case, I have a question.
When I look up documents, apply the pagination.(Show 10 docs on one page)
I know that the search_after should be applied when inquiring more than 10,000number documents.
With search_after, it is impossible to go from page 1 to page 5 at once.(Is it right?)
So, where other site Elasticsearch is applied, limit the number of paginations, It shows additional data by scrolling down Or ask for more specific search terms.(Is it right?)
At this time, is it technically impossible to paginate the index containing more than 10,000 documents?(The number of documents may be 100,000, 1 million, or more.)
Add 1,000 docs per second
documents are being added through shell. At this time, we are adding doc one by one through post.
Is it too much to add 1,000 documents per second if I add them like this?
If so, is it possible to add 1,000 documents per second through bulk?
Thank you for watching my question again today.
I would appreciate it if many people could give me a lot of answers.
I hope you have a good day today
No, it's possible. 10,000 is a default limit for size and from parameter. seach after has no limitation for repetition. Only concern is inconsistent results caused by refresh and it could be coped with by point in time (PID) to preserve the current index state.
It completely depends on the cluster & index settings, document profile,..etc, but I suppose it is not a good idea to add such documents one by one. You may need performance experiments for your specific case. Also for the optimal number of actions in one bulk request, there are no correct number and experiment is recommended in the official document.
It depends. If the amount of data varies greatly from day to day, an index split by capacity might be better, or if you have strict data retention rules, a split separated by date might be better. You can choose both by ILM settings. The fact that you can choose from both shows that there is no clear advantage between the two.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.