Elasticsearch Aggregations

I have a project in NodeJS with TypeScript that uses the library "@elastic/elasticsearch": "^8.7.0", to connect the server with Elastic Search 8.7.1.

I'm trying to make a query where I retrieve the frequency of words from the "message" field.
I tried several approaches, but I only get it to return the entire sentence!
I need you to bring the words and how many times they appeared in total.

This is my function that creates the index and field mapping:

    private async createIndexIfNotExists(indexName: string): Promise<void> {
        const indexExists = await this.client.indices.exists({ index: indexName });
        if (indexExists) return;

        await this.client.indices.create({
            index: indexName,
            body: {
                mappings: {
                    properties: {
                        id: {
                            type: 'keyword'
                        },
                        remoteJid: {
                            type: 'keyword'
                        },
                        participant: {
                            type: 'keyword'
                        },
                        fromMe: {
                            type: 'boolean'
                        },
                        message: {
                            type: 'text',
                            fields: {
                                raw: {
                                    type: 'keyword'
                                }
                            }
                        },
                    }
                }
            },
        });


    };

This is my function that inserts the documents:

const handleMessage = async (
  msg: proto.IWebMessageInfo,
  wbot: Session
): Promise<void> => {
  try {
    if (msg.key.remoteJid && !msg.key.remoteJid.includes("@g.us")) return;

    const message = getBodyMessage(msg);
    const index = "groups";
    const document = {
      id: msg.key.id,
      remoteJid: msg.key.remoteJid,
      participant: msg.key.participant,
      fromMe: msg.key.fromMe,
      message: message,
    };

    const es = await esService.insertDocument(index, document);
  } catch (err: any) {
    logger.error(`CATCH HANDLEMESSAGE: ${err}`);
  }
};
    public async insertDocument(index: string, document: any): Promise<any> {
        try {
            await this.createIndexIfNotExists(index);
            const response = await this.client.index({
                index,
                refresh: true,
                document
            });
            return response;
        } catch (error) {
            console.error(error);
            throw new Error('Erro ao inserir documento no Elasticsearch');
        }
    };

This is my function that executes the query where the frequency of the words should appear:

    public async cloudWords(): Promise<Record<string, any>> {
        const indexName = 'groups';
        const query = {
            size: 0,
            aggs: {
                palavras: {
                    terms: {
                        field: 'message.raw',
                    }
                }
            }
        };

        const result = await this.client.search({
            index: indexName,
            body: query,
        });
        return result;
    };

But it always returns me this (whole sentence):

    "message": {
        "took": 11,
        "timed_out": false,
        "_shards": {
            "total": 1,
            "successful": 1,
            "skipped": 0,
            "failed": 0
        },
        "hits": {
            "total": {
                "value": 7,
                "relation": "eq"
            },
            "max_score": null,
            "hits": []
        },
        "aggregations": {
            "palavras": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "Alguém conseguiu",
                        "doc_count": 1
                    },
                    {
                        "key": "Atualizar?",
                        "doc_count": 1
                    },
                    {
                        "key": "Com Quepasa por enquanto, só vai funcionar se estiver tudo com localhost, pra url do webhook ficar menor",
                        "doc_count": 1
                    },
                    {
                        "key": "Imporam limite de caracteres",
                        "doc_count": 1
                    },
                    {
                        "key": "Meu esposo vc está com uma ótima fisionomia tá bonito ❤️ só Deus sabe o tanto de saudades q estou de vc meu velho ❤️ que Deus e Jesus te abençoe hoje e sempre 🙏🏻 como eu queria vc aqui comigo más pra Deus nada é impossível força e fé sempre🙏🏻 amém amém",
                        "doc_count": 1
                    },
                    {
                        "key": "Sim",
                        "doc_count": 1
                    },
                    {
                        "key": "To na 17 ja",
                        "doc_count": 1
                    }
                ]
            }
        }
    }
}

My goal is to assemble a cloud of words by frequency.
Can anyone help? Where am I going wrong?

Welcome to our community! :smiley:

Your message.raw field is mapped as a keyword which means that the field that is stored and returned will be the entire thing as you passed it i and is not analysed. You might want to try this on the message field instead.

I was using the message field, but it returns an error and the error itself returns the suggestion of:

  1. Activate the fieldData feature, but it is not indicated because it consumes a lot of memory;
  2. Indicates to use keyword.

See the error below when trying to use the message field:

How to resolve since it does not indicate activating fielddata in the message field by increasing RAM consumption?

If you activate the fielddata feature in the message field, it works perfectly and the return is as expected, that is, I get the list of words with the recurrence in all records.

    private async createIndexIfNotExists(indexName: string): Promise<void> {
        const indexExists = await this.client.indices.exists({ index: indexName });
        if (indexExists) return;

        await this.client.indices.create({
            index: indexName,
            body: {
                mappings: {
                    properties: {
                        id: {
                            type: 'keyword'
                        },
                        remoteJid: {
                            type: 'keyword'
                        },
                        participant: {
                            type: 'keyword'
                        },
                        fromMe: {
                            type: 'boolean'
                        },
                        message: {
                            type: 'text',
                            fielddata: true //I ACTIVATED fielddata=true AND THE AGGREGATION WORKS PERFECTLY!
                        },
                    }
                }
            },
        });


    };

My doubt is that none of the Elasticsearch documentation indicates activating fielddata and the error shown indicates using it but warns that the increase in memory usage is significant.

Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.