Index strategy for non retantion data and huge dataset

I have many content sites and their produce about 1000 content per day, and many of them now is has about 3 million records.

Sample document:

{
  "website_id": 10,
  "title": "Sample title",
  "description": "Sample descriptions",
   "body": "this is body of content",
  "date": "2020-01-09T18:27:57.738Z",
  "tags": ["sample", "tag"],
  "users_visits": 123045,
  "users_comments": 120,
  "users_rate": 4.3
}

I need to know how i design my indexes for full text search:

  • There is no retention or historical content always all data must be searchable so i cannot create index with archive or etc content even maybe in past will be update and must be searchable
  • I want search full text search on text fields
  • I want filter by tags
  • Default filter is always use website_id but it's could be remove or some sites together.
  • Default order is date descending + filters
  • Also order could be by numeric rates like : users_visits and users_comments and users_rate.

Questions:

  • Index strategy ?
  • Cluster requirement how many nodes
  • Hardware suggestion, storage and cpu ram ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.