Most efficient mapping for a chat search engine

Hi,

I'm building a new cluster to allow search on our internal chat system.
Given the volume of data to index, and the budget for it, storage efficiency is a key here.

Message structure:
channel_id: integer
user_id: integer
message_id: GUID
message: text
timestamp: currently date object, can be transformed into unix timestamp 
features: JSON 
profile: JSON
reply_parent_message_id: GUID
role: JSON
score: integer between 0 and 100

The use cases will be searches based on timestamp, channel_id, user_id, full text search on the message field, filtering by feature, profile or role containing a specific keyword.
The rest is only for display in search results.

In terms of aggregations I would need to make histograms of the count of messages per unit of time per channel_id between 2 timestamps, and a average of the score field for all messages for one channel_id between 2 timestamps.

So I'm thinking of building my index template mappings the following way:

{
  "mappings": {
    "properties": {
      "channel_id": {
        "type": "integer",
        "index": true
      },
      "user_id": {
        "type": "integer",
        "index": true
      },
      "message_id": {
        "type": "keyword",
        "norms": false,
        "index_options": "freqs"
      },
      "message": {
        "type": "text"
      },
      "timestamp": {
        "type": "date"
      },
      "features": {
        "type": "text"
      },
      "profile": {
        "type": "text"
      },
      "reply_parent_message_id": {
        "type": "keyword",
        "norms": false,
        "index_options": "freqs"
      },
      "role": {
        "type": "text"
      },
      "score": {
        "type": "byte",
        "index": false
      }
    }
  }
}

My main interrogations are about the GUIDs and the message itself.
I understand that for GUIDs since I will only store it but not filter or search that's the best way to use minimal space.
For the message I would only search for specific keywords and return the full message, so text is the best format.
Could you please confirm if my assumptions are correct and my mapping ok?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.