Hello,
I have trouble specifying a query which search for some smileys in tweets. Can anyone show me an example of a working query? I have tried several ideas for escaping a simple smiley like ":)", but none of them can capture a smiley.
Just to mention that I am indexing tweet data using Twitter river plugin.
Thank you,
You probably indexed your tweets without specifying an analyzer. In this case the default Standard Analyzer is used, which in turn uses the Standard Tokenizer. This tokenizer will filter out characters you usually find in smileys (like ":;))" etc...), so you can't serach for them later.
Try adding a mapping which preserves punctuation, e.g. the Whitespace Analyzer. Here is a short example in Sense notation:
DELETE /_all
PUT /test
PUT /test/_mapping/my_type
{
"my_type": {
"properties": {
"text": {
"type": "string",
"analyzer": "whitespace"
}
}
}
}
POST /test/my_type
{
"text": "Don't worry, be :) now!"
}
GET /test/my_type/_search
{
"query": {
"match": {
"text": ":)"
}
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.