How to index a large dataset

Erates · August 2, 2017, 3:24pm

Well, I took a subset of the data and ran it into a test elastic search instance. Then after that was done, I let the index optimize itself (because from experience, I've noticed that just pushing it in and then checking disk size is not reliable). Then with that data I've extrapolated until we had a "guess" about what the size would be with the full dataset.

That test already uses the n-gram filter, 2-20 for category and 3-25 for the filename. Both 2 times (in_ and out_). I still have to run the test without the _all field.

Topic		Replies	Views
Elasticsearch ngram tokenizer Elasticsearch	4	912	February 10, 2020
Advice about mapping Elasticsearch	3	352	July 6, 2017
Elasticsearch index mapping? Elasticsearch	2	333	April 26, 2022
Searching for exact string in big fields (Lucene Limitation) Elasticsearch	14	8106	March 3, 2018
Need help with Performance and Storage Factors Elasticsearch	6	765	November 3, 2017

How to index a large dataset

Related topics