Index strategy for non retantion data and huge dataset

sweb · January 9, 2020, 6:37pm

I have many content sites and their produce about 1000 content per day, and many of them now is has about 3 million records.

Sample document:

{
  "website_id": 10,
  "title": "Sample title",
  "description": "Sample descriptions",
   "body": "this is body of content",
  "date": "2020-01-09T18:27:57.738Z",
  "tags": ["sample", "tag"],
  "users_visits": 123045,
  "users_comments": 120,
  "users_rate": 4.3
}

I need to know how i design my indexes for full text search:

There is no retention or historical content always all data must be searchable so i cannot create index with archive or etc content even maybe in past will be update and must be searchable
I want search full text search on text fields
I want filter by tags
Default filter is always use website_id but it's could be remove or some sites together.
Default order is date descending + filters
Also order could be by numeric rates like : users_visits and users_comments and users_rate.

Questions:

Index strategy ?
Cluster requirement how many nodes
Hardware suggestion, storage and cpu ram ?

system · February 6, 2020, 6:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Newbie question on how to structure elasticsearch index Elasticsearch	4	483	October 21, 2018
Need advice how to organize data / schema? Elasticsearch	1	322	June 26, 2020
Index, Filter and Query Strategy Elasticsearch	6	2069	July 6, 2017
Optinal way to index and search data Elasticsearch	1	311	April 1, 2021
Large index design question Elasticsearch	7	425	July 6, 2017

Index strategy for non retantion data and huge dataset

Related topics