Most efficient mapping for a chat search engine

Baygon · August 14, 2021, 10:36am

Hi,

I'm building a new cluster to allow search on our internal chat system.
Given the volume of data to index, and the budget for it, storage efficiency is a key here.

Message structure:
channel_id: integer
user_id: integer
message_id: GUID
message: text
timestamp: currently date object, can be transformed into unix timestamp 
features: JSON 
profile: JSON
reply_parent_message_id: GUID
role: JSON
score: integer between 0 and 100

The use cases will be searches based on timestamp, channel_id, user_id, full text search on the message field, filtering by feature, profile or role containing a specific keyword.
The rest is only for display in search results.

In terms of aggregations I would need to make histograms of the count of messages per unit of time per channel_id between 2 timestamps, and a average of the score field for all messages for one channel_id between 2 timestamps.

So I'm thinking of building my index template mappings the following way:

{
  "mappings": {
    "properties": {
      "channel_id": {
        "type": "integer",
        "index": true
      },
      "user_id": {
        "type": "integer",
        "index": true
      },
      "message_id": {
        "type": "keyword",
        "norms": false,
        "index_options": "freqs"
      },
      "message": {
        "type": "text"
      },
      "timestamp": {
        "type": "date"
      },
      "features": {
        "type": "text"
      },
      "profile": {
        "type": "text"
      },
      "reply_parent_message_id": {
        "type": "keyword",
        "norms": false,
        "index_options": "freqs"
      },
      "role": {
        "type": "text"
      },
      "score": {
        "type": "byte",
        "index": false
      }
    }
  }
}

My main interrogations are about the GUIDs and the message itself.
I understand that for GUIDs since I will only store it but not filter or search that's the best way to use minimal space.
For the message I would only search for specific keywords and return the full message, so text is the best format.
Could you please confirm if my assumptions are correct and my mapping ok?

system · September 11, 2021, 10:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice on how to implement a chat feature w/ ElasticSearch as back end Elasticsearch	3	2283	July 6, 2017
Hipchat & Elasticsearch Elasticsearch	2	296	July 6, 2017
Using ElasticSearch for indexing forum Elasticsearch	8	796	July 6, 2017
Problems with auto-_timestamp Elasticsearch	4	338	July 6, 2017
Enhancing perf for my cluster Elasticsearch	2	432	July 6, 2017

Most efficient mapping for a chat search engine

Related topics