Elasticsearch search fo words having '#' character

For example, I am right now searching like this:

http://localhost:9200/posts/post/_search?q=content:%23sachin

But, I am getting all the results with 'sachin' and not '#sachin'. Also, I am writing a regular expression for getting the count of terms. The facet looks like this:

"facets": {
"content": {
"terms": {
"field": "content",
"size": 1000,
"all_terms": false,
"regex": "#sachin",
"regex_flags": [
"DOTALL",
"CASE_INSENSITIVE"
]
}
}
}

This is not returning any values. I think it has something to do with escaping the '#' inside the regular expression, but I am not sure how to do it. I have tried to escape it \ and \, but it did not work. Can anyone help me in this regard?

I have posted the same question on Stack Overflow (http://stackoverflow.com/questions/17526736/elasticsearch-search-fo-words-having-character)

Hello,

I think the standard
analyzerhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer/will
get rid of your #, and that's why it doesn't show up in searches.

If you want exact matches for that field, the easiest way is to use index
it as "not_analyzed". Here's a curl example that should work;

On Wed, Jul 10, 2013 at 9:29 AM, prince prince@qburst.com wrote:

I have posted the same question on Stack Overflow
(
http://stackoverflow.com/questions/17526736/elasticsearch-search-fo-words-having-character
)

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-search-fo-words-having-character-tp4037822p4037823.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

you should not facet on analyzed fields, if you dont want to run out of
memory pretty quickly (because every term of the inverted index gets loaded
into memory for this field, which may be a lot, depending on the size of
the index).

--Alex

On Wed, Jul 10, 2013 at 10:46 AM, Radu Gheorghe
radu.gheorghe@sematext.comwrote:

Hello,

I think the standard analyzerhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-analyzer/will get rid of your #, and that's why it doesn't show up in searches.

If you want exact matches for that field, the easiest way is to use index
it as "not_analyzed". Here's a curl example that should work;
https://gist.github.com/radu-gheorghe/5964537

On Wed, Jul 10, 2013 at 9:29 AM, prince prince@qburst.com wrote:

I have posted the same question on Stack Overflow
(
http://stackoverflow.com/questions/17526736/elasticsearch-search-fo-words-having-character
)

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-search-fo-words-having-character-tp4037822p4037823.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Radu,

But what I need is to search is really "analyzed", because if content is something like '#sachin big news, retires from ODI', it should get that on searching, but if it is like 'sachin, you are awesome', it should not get that on searching.

"not_analyzed" will get only exact matches and is not useful for my usecase (Here, it will get only content with "#sachin" as the full text and not containing "#sachin")

Hello,

In that case, you need to change your
analyzerhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/.
Maybe Whitespace
Analyzerhttp://www.elasticsearch.org/guide/reference/index-modules/analysis/whitespace-analyzer/is
more appropriate here?

Also, please note Alex's warning regarding memory usage. I'm not sure how
your hardware and data look like, but might be worth paying some price at
index time for more performance and less memory usage during searches. For
example, you can parse your text for hashtags and store them in some "tags"
field (which can be an array), that you can store as not_analyzed and facet
separately.

Best regards,
Radu

On Wed, Jul 10, 2013 at 2:02 PM, prince prince@qburst.com wrote:

Hi Radu,

But what I need is to search is really "analyzed", because if content is
something like '#sachin big news, retires from ODI', it should get that on
searching, but if it is like 'sachin, you are awesome', it should not get
that on searching.

"not_analyzed" will get only exact matches and is not useful for my usecase
(Here, it will get only content with "#sachin" as the full text and not
containing "#sachin")

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-search-fo-words-having-character-tp4037822p4037864.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Radu,

I have found this post http://webcache.googleusercontent.com/search?q=cache:http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

My mapping, right now, is like this:

{
"content": {"type": 'string', "index": "analyzed'}
}

I am not sure how this mapping for getting strings starting with "#" and "@", should change, as I am new to elasticsearch.

Hi Radu,

I am using elasticsearch for searching a table of selected twitter posts, and we have a UI, that allows users to search for multiple terms using OR condition and we need to display the count of records for each term. We are using the facet for getting the count of individual terms. So, it is not limited to hashtags, but for anything the users searches.

So, is there a better way than using facets to get this count?

Hello,

Cool! Then I think you should try that and see how it works for you. I
thought the whitespace analyzer alone is enough, but maybe it isn't. I
didn't test it.

I think you should run a test, see how it fits your use-case and if you
need something more you can always come back here and ask more questions.

Best regards,
Radu

On Wed, Jul 10, 2013 at 3:12 PM, prince prince@qburst.com wrote:

Hi Radu,

I have found this post

http://webcache.googleusercontent.com/search?q=cache:http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

My mapping, right now, is like this:

{
"content": {"type": 'string', "index": "analyzed'}
}

I am not sure how this mapping for getting strings starting with "#" and
"@", should change, as I am new to elasticsearch.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-search-fo-words-having-character-tp4037822p4037870.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- ElasticSearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Wed, Jul 10, 2013 at 3:45 PM, prince prince@qburst.com wrote:

Hi Radu,

I am using elasticsearch for searching a table of selected twitter posts,
and we have a UI, that allows users to search for multiple terms using OR
condition and we need to display the count of records for each term. We are
using the facet for getting the count of individual terms. So, it is not
limited to hashtags, but for anything the users searches.

So, is there a better way than using facets to get this count?

Hmm... I don't see one. If you need counts for all the terms, then you have
to have memory for all the terms :slight_smile:

I would monitor the cluster with something like
SPMhttp://sematext.com/spm/elasticsearch-performance-monitoring/.
You'll be able to see how your memory, field cache, etc goes up and down as
you use Elasticsearch. Then you can tell whether you have enough hardware
for the dataset and usage you're expecting.

Best regards,
Radu

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.