Having trouble when using search strings that have a hash symbol (#) in them

Rahul_Malireddy · December 11, 2013, 11:24pm

Hi,

I am trying to use elasticsearch to index my log files. Some of my
environment details are given below:

Version of elasticsearch that I am using: 0.90.3
Version of JVM that I am using: 1.7.0_17 JVM (64-bit)
Operating System: Windows Server 2008
Cluster Health:

cluster_name: logstash
status: yellow
timed_out: false
number_of_nodes: 2
number_of_data_nodes: 1
active_primary_shards: 15
active_shards: 15
relocating_shards: 0
initializing_shards: 0
unassigned_shards: 15

Right now, I am in development stage and am playing around with
elasticsearchto see how it works. I am running into one problem. Some of
the documents in my index have the hash symbol (#) in them. For Example:
FlexCubeCash_001#0103347CAD2203178.000000000. When I query for documents
that have this string (FlexCubeCash_001#0103347CAD2203178.000000000) in
them, it returns all the documents which have the string "FlexCubeCash_001"
(in other words text before the hash symbol) in them.

How do I get around this problem? I must be doing some thing wrong. I have
created a small snippet of code to reproduce the problem your environment.
This code snippet is in the Gist URL given below:

gist.github.com

https://gist.github.com/rahul-m/7920182

mrisk

# Remove old data
curl -XDELETE "http://localhost:9200/mrisk/"
 
# Create index with settings
curl -XPOST "http://localhost:9200/mrisk/" -d '
{
  "settings":{
    "index":{
      "analysis":{
        "analyzer":{

This file has been truncated. show original

As you can see from the code snippet, there is only document that matches
the search string but I get all three documents back.

If you have any questions or need any other information, please let me
know, Any help provided will be greatly appreciated.

Thank you.
Rahul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c57e164-0072-4196-a137-06800704a10c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew_Cholakian_2 · December 12, 2013, 3:57am

The problem is that the 'mrisk' analyzer is using the 'standard' tokenizer,
which splits on non-word text. You can verify this by running:

curl -XPOST 'http://localhost:9200/_analyze?analyzer=standard' -d
'hello#there'

The output will show that it will split the text on the #. You probably
want the 'keyword' tokenizer which leaves the source text untouched.

On Wednesday, December 11, 2013 3:24:22 PM UTC-8, Rahul Malireddy wrote:

Hi,

I am trying to use elasticsearch to index my log files. Some of my
environment details are given below:

Version of elasticsearch that I am using: 0.90.3
Version of JVM that I am using: 1.7.0_17 JVM (64-bit)
Operating System: Windows Server 2008
Cluster Health:

cluster_name: logstash

status: yellow

timed_out: false

number_of_nodes: 2

number_of_data_nodes: 1

active_primary_shards: 15

active_shards: 15

relocating_shards: 0

initializing_shards: 0

unassigned_shards: 15

Right now, I am in development stage and am playing around with
elasticsearchto see how it works. I am running into one problem. Some of
the documents in my index have the hash symbol (#) in them. For Example:
FlexCubeCash_001#0103347CAD2203178.000000000. When I query for documents
that have this string (FlexCubeCash_001#0103347CAD2203178.000000000) in
them, it returns all the documents which have the string "FlexCubeCash_001"
(in other words text before the hash symbol) in them.

How do I get around this problem? I must be doing some thing wrong. I have
created a small snippet of code to reproduce the problem your environment.
This code snippet is in the Gist URL given below:

Gist that creates a repository to store log messages · GitHub

As you can see from the code snippet, there is only document that matches
the search string but I get all three documents back.

If you have any questions or need any other information, please let me
know, Any help provided will be greatly appreciated.

Thank you.
Rahul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4396044c-3fa6-4e1f-8e47-56a55ce106d0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rahul_Malireddy · December 12, 2013, 12:10pm

Thank you.

On Wednesday, December 11, 2013 10:57:29 PM UTC-5, Andrew Cholakian wrote:

The problem is that the 'mrisk' analyzer is using the 'standard'
tokenizer, which splits on non-word text. You can verify this by running:

curl -XPOST 'http://localhost:9200/_analyze?analyzer=standard' -d
'hello#there'

The output will show that it will split the text on the #. You probably
want the 'keyword' tokenizer which leaves the source text untouched.

On Wednesday, December 11, 2013 3:24:22 PM UTC-8, Rahul Malireddy wrote:

Hi,

I am trying to use elasticsearch to index my log files. Some of my
environment details are given below:

Version of elasticsearch that I am using: 0.90.3
Version of JVM that I am using: 1.7.0_17 JVM (64-bit)
Operating System: Windows Server 2008
Cluster Health:

cluster_name: logstash

status: yellow

timed_out: false

number_of_nodes: 2

number_of_data_nodes: 1

active_primary_shards: 15

active_shards: 15

relocating_shards: 0

initializing_shards: 0

unassigned_shards: 15

Right now, I am in development stage and am playing around with
elasticsearchto see how it works. I am running into one problem. Some of
the documents in my index have the hash symbol (#) in them. For Example:
FlexCubeCash_001#0103347CAD2203178.000000000. When I query for documents
that have this string (FlexCubeCash_001#0103347CAD2203178.000000000) in
them, it returns all the documents which have the string "FlexCubeCash_001"
(in other words text before the hash symbol) in them.

How do I get around this problem? I must be doing some thing wrong. I
have created a small snippet of code to reproduce the problem your
environment. This code snippet is in the Gist URL given below:

Gist that creates a repository to store log messages · GitHub

As you can see from the code snippet, there is only document that matches
the search string but I get all three documents back.

If you have any questions or need any other information, please let me
know, Any help provided will be greatly appreciated.

Thank you.
Rahul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/92726019-87c0-47b5-946d-5343c5b633e1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Elasticsearch search fo words having '#' character Elasticsearch	10	454	July 6, 2017
Search for hashtags - Find exact matches only Elasticsearch	3	3158	July 6, 2017
Search #/@'s on ElasticSearch Elasticsearch	1	603	November 29, 2021
I can't find anything after hypens or underscores Elasticsearch	10	5282	July 6, 2017
Non Alphanumeric character searching Elasticsearch	4	2449	July 6, 2017

Having trouble when using search strings that have a hash symbol (#) in them

Related topics