Elastic search pagination to log server

Hello friends.
I have a question today, so I'm writing it down.
I'm going to write it down in detail today.
I ask for your understanding of my poor English skills.

I have a question when building a log server using Elasticsearch.

The conditions are as follows.

  1. I will store at least 10,000 docs.
  2. All docs must be available for inquiry.
  3. i will going to add 1,000 docs per second.

In this case, I have a question.

  1. When I look up documents, apply the pagination.(Show 10 docs on one page)
  • I know that the search_after should be applied when inquiring more than 10,000number documents.

  • With search_after, it is impossible to go from page 1 to page 5 at once.(Is it right?)

  • So, where other site Elasticsearch is applied, limit the number of paginations, It shows additional data by scrolling down Or ask for more specific search terms.(Is it right?)

  • At this time, is it technically impossible to paginate the index containing more than 10,000 documents?(The number of documents may be 100,000, 1 million, or more.)

  1. Add 1,000 docs per second
  • documents are being added through shell. At this time, we are adding doc one by one through post.
  • Is it too much to add 1,000 documents per second if I add them like this?
  • If so, is it possible to add 1,000 documents per second through bulk?

Thank you for watching my question again today.
I would appreciate it if many people could give me a lot of answers.
I hope you have a good day today :smiley:

Hi,
I'll answer to the best of my understanding.

Yes. You can't jump to distant pages. Of cource it is possible that you will get 50 docs and just dicard 40 of them.

Yes. It is said:

You should never give your users access to all the pages of their search request. If your PM is not happy about that, tell him that even Google is only showing ~50 pages (500 hits).
Elasticsearch Pagination Techniques: SearchAfter, Scroll, Pagination & PIT

This post may help you consider about pagination.

No, it's possible. 10,000 is a default limit for size and from parameter. seach after has no limitation for repetition. Only concern is inconsistent results caused by refresh and it could be coped with by point in time (PID) to preserve the current index state.

It completely depends on the cluster & index settings, document profile,..etc, but I suppose it is not a good idea to add such documents one by one. You may need performance experiments for your specific case. Also for the optimal number of actions in one bulk request, there are no correct number and experiment is recommended in the official document.

1 Like

Thank you. It was very helpful!

I have a few more questions, but for large servers, should I divide the index by date or by capacity?

And is it impossible to give two types of keywords and text to one field?
(maybe.. I can just specify the text in the field, right?)

It depends. If the amount of data varies greatly from day to day, an index split by capacity might be better, or if you have strict data retention rules, a split separated by date might be better. You can choose both by ILM settings. The fact that you can choose from both shows that there is no clear advantage between the two.

See multi-fields.

1 Like

Thank you. I didn't know there was an ILM setting.

I think I can work comfortably.

I was thinking about making a setting similar to ILM into C++, but I'm glad.

Thank you for helping me again today :smile:

(If you know any additional places with good data regarding ILM setting, can I ask you?)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.