I have many content sites and their produce about 1000 content per day, and many of them now is has about 3 million records.
Sample document:
{
"website_id": 10,
"title": "Sample title",
"description": "Sample descriptions",
"body": "this is body of content",
"date": "2020-01-09T18:27:57.738Z",
"tags": ["sample", "tag"],
"users_visits": 123045,
"users_comments": 120,
"users_rate": 4.3
}
I need to know how i design my indexes for full text search:
- There is no retention or historical content always all data must be searchable so i cannot create index with archive or etc content even maybe in past will be update and must be searchable
- I want search full text search on text fields
- I want filter by tags
- Default filter is always use website_id but it's could be remove or some sites together.
- Default order is date descending + filters
- Also order could be by numeric rates like : users_visits and users_comments and users_rate.
Questions:
- Index strategy ?
- Cluster requirement how many nodes
- Hardware suggestion, storage and cpu ram ?