I am trying to get keywords from a bunch of tweets in Spanish language. The thing is that when I get the results the last vowel in most words in the response is removed. Any idea of why is this happening?
Here is the query:
{
"query": {
"bool": {
"must": {
"terms": {
"full_text_sentiment": "positive"
}
},
"filter": {
"range": {
"created_at": {
"gte": greaterThanTime,
"lte": lessThanTime
}
}
}
}
},
"aggs": {
"keywords": {
"terms": { "field": "full_text_clean", "size": 10}
}
}
}
The mapping is the following for the field:
"full_text_clean": {
"type": "text",
"analyzer": "spanish",
"fielddata": true,
"fielddata_frequency_filter": {
"min": 0.1,
"max": 1.0,
"min_segment_size": 10
},
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 512
}
}
}
And this is the buckets in the response:
[ { key: 'aquí', doc_count: 3 },
{ key: 'deport', doc_count: 3 },
{ key: 'informacion', doc_count: 3 },
{ key: '23', doc_count: 2 },
{ key: 'corazon', doc_count: 2 },
{ key: 'dios', doc_count: 2 },
{ key: 'mexic', doc_count: 2 },
{ key: 'mujer', doc_count: 2 },
{ key: 'quier', doc_count: 2 },
{ key: 'siempr', doc_count: 2 }]
where "deport", should be "deporte", "mexic" should be "mexico", "quier" should be "quiero" etc.
Any idea of what is happening
Thank you!