Tag Cloud for match phrases

I'm a using Kibana to create a Tag Cloud and it is working fine at the moment. However, I want to show more than simple words. I'm trying to show expressions such as: cartao de credito, mensagem de erro, etc.

I tried using the Shingle Token Filter and a customized list of Stopwords. So when creating the index I changed the settings and created the analyzer and filters.

The result is not showing what I wanted. I used the analyze API to make things easier to test.

Here's the settings:

PUT /analisador_test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"filtro_portugues",
"asciifolding",
"filtro_separador"
]
}
},
"filter":{
"filtro_separador": {
"type": "shingle",
"max_shingle_size": 3,
"min_shingle_size": 3,
"output_unigrams": false
},
"filtro_portugues": {
"type": "stop",
"stopwords": ["o", "cliente"]
}
}
}
}
}

POST /analisador_test/_analyze
{
"analyzer": "my_analyzer",
"text": "O cliente informa problema no cartão de crédito"
}

The result is:
"_ _ informa"
"_ informa problema"
"informa problema no"
"problema no cartao"
"no cartao de"
"cartao de credito"

Part of this result is OK but the first and second lines should not exist.

If I change the order of the filters in the settings, the STOPWORDS are not used and the results are:

"o cliente informa"
"cliente informa problema"
"informa problema no"
"problema no cartao"
"no cartao de"
"cartao de credito"

Does anyone know if there is a way to accomplish a TAG CLOUD with short expressions (3 words)?

Just to inform that the TAG CLOUD is being made after analysing a text field that contains a lot of text.

Recently discussed on LinkedIn

Check out all the comments where we discussed pre-processsing text with python Vs elasticsearch shingling and significant_text aggregation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.