But, I am getting all the results with 'sachin' and not '#sachin'. Also, I am writing a regular expression for getting the count of terms. The facet looks like this:
This is not returning any values. I think it has something to do with escaping the '#' inside the regular expression, but I am not sure how to do it. I have tried to escape it \ and \, but it did not work. Can anyone help me in this regard?
you should not facet on analyzed fields, if you dont want to run out of
memory pretty quickly (because every term of the inverted index gets loaded
into memory for this field, which may be a lot, depending on the size of
the index).
But what I need is to search is really "analyzed", because if content is something like '#sachin big news, retires from ODI', it should get that on searching, but if it is like 'sachin, you are awesome', it should not get that on searching.
"not_analyzed" will get only exact matches and is not useful for my usecase (Here, it will get only content with "#sachin" as the full text and not containing "#sachin")
Also, please note Alex's warning regarding memory usage. I'm not sure how
your hardware and data look like, but might be worth paying some price at
index time for more performance and less memory usage during searches. For
example, you can parse your text for hashtags and store them in some "tags"
field (which can be an array), that you can store as not_analyzed and facet
separately.
But what I need is to search is really "analyzed", because if content is
something like '#sachin big news, retires from ODI', it should get that on
searching, but if it is like 'sachin, you are awesome', it should not get
that on searching.
"not_analyzed" will get only exact matches and is not useful for my usecase
(Here, it will get only content with "#sachin" as the full text and not
containing "#sachin")
I am using elasticsearch for searching a table of selected twitter posts, and we have a UI, that allows users to search for multiple terms using OR condition and we need to display the count of records for each term. We are using the facet for getting the count of individual terms. So, it is not limited to hashtags, but for anything the users searches.
So, is there a better way than using facets to get this count?
Cool! Then I think you should try that and see how it works for you. I
thought the whitespace analyzer alone is enough, but maybe it isn't. I
didn't test it.
I think you should run a test, see how it fits your use-case and if you
need something more you can always come back here and ask more questions.
I am using elasticsearch for searching a table of selected twitter posts,
and we have a UI, that allows users to search for multiple terms using OR
condition and we need to display the count of records for each term. We are
using the facet for getting the count of individual terms. So, it is not
limited to hashtags, but for anything the users searches.
So, is there a better way than using facets to get this count?
Hmm... I don't see one. If you need counts for all the terms, then you have
to have memory for all the terms
I would monitor the cluster with something like
SPMhttp://sematext.com/spm/elasticsearch-performance-monitoring/.
You'll be able to see how your memory, field cache, etc goes up and down as
you use Elasticsearch. Then you can tell whether you have enough hardware
for the dataset and usage you're expecting.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.