Having trouble when using search strings that have a hash symbol (#) in them


(Rahul Malireddy) #1

Hi,

I am trying to use elasticsearch to index my log files. Some of my
environment details are given below:

Version of elasticsearch that I am using: 0.90.3
Version of JVM that I am using: 1.7.0_17 JVM (64-bit)
Operating System: Windows Server 2008
Cluster Health:

  • cluster_name: logstash
  • status: yellow
  • timed_out: false
  • number_of_nodes: 2
  • number_of_data_nodes: 1
  • active_primary_shards: 15
  • active_shards: 15
  • relocating_shards: 0
  • initializing_shards: 0
  • unassigned_shards: 15

Right now, I am in development stage and am playing around with
elasticsearchto see how it works. I am running into one problem. Some of
the documents in my index have the hash symbol (#) in them. For Example:
FlexCubeCash_001#0103347CAD2203178.000000000. When I query for documents
that have this string (FlexCubeCash_001#0103347CAD2203178.000000000) in
them, it returns all the documents which have the string "FlexCubeCash_001"
(in other words text before the hash symbol) in them.

How do I get around this problem? I must be doing some thing wrong. I have
created a small snippet of code to reproduce the problem your environment.
This code snippet is in the Gist URL given below:

As you can see from the code snippet, there is only document that matches
the search string but I get all three documents back.

If you have any questions or need any other information, please let me
know, Any help provided will be greatly appreciated.

Thank you.
Rahul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c57e164-0072-4196-a137-06800704a10c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Andrew Cholakian-2) #2

The problem is that the 'mrisk' analyzer is using the 'standard' tokenizer,
which splits on non-word text. You can verify this by running:

curl -XPOST 'http://localhost:9200/_analyze?analyzer=standard' -d
'hello#there'

The output will show that it will split the text on the #. You probably
want the 'keyword' tokenizer which leaves the source text untouched.

On Wednesday, December 11, 2013 3:24:22 PM UTC-8, Rahul Malireddy wrote:

Hi,

I am trying to use elasticsearch to index my log files. Some of my
environment details are given below:

Version of elasticsearch that I am using: 0.90.3
Version of JVM that I am using: 1.7.0_17 JVM (64-bit)
Operating System: Windows Server 2008
Cluster Health:

  • cluster_name: logstash
  • status: yellow
  • timed_out: false
  • number_of_nodes: 2
  • number_of_data_nodes: 1
  • active_primary_shards: 15
  • active_shards: 15
  • relocating_shards: 0
  • initializing_shards: 0
  • unassigned_shards: 15

Right now, I am in development stage and am playing around with
elasticsearchto see how it works. I am running into one problem. Some of
the documents in my index have the hash symbol (#) in them. For Example:
FlexCubeCash_001#0103347CAD2203178.000000000. When I query for documents
that have this string (FlexCubeCash_001#0103347CAD2203178.000000000) in
them, it returns all the documents which have the string "FlexCubeCash_001"
(in other words text before the hash symbol) in them.

How do I get around this problem? I must be doing some thing wrong. I have
created a small snippet of code to reproduce the problem your environment.
This code snippet is in the Gist URL given below:

https://gist.github.com/rahul-m/7920182

As you can see from the code snippet, there is only document that matches
the search string but I get all three documents back.

If you have any questions or need any other information, please let me
know, Any help provided will be greatly appreciated.

Thank you.
Rahul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4396044c-3fa6-4e1f-8e47-56a55ce106d0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Rahul Malireddy) #3

Thank you.

On Wednesday, December 11, 2013 10:57:29 PM UTC-5, Andrew Cholakian wrote:

The problem is that the 'mrisk' analyzer is using the 'standard'
tokenizer, which splits on non-word text. You can verify this by running:

curl -XPOST 'http://localhost:9200/_analyze?analyzer=standard' -d
'hello#there'

The output will show that it will split the text on the #. You probably
want the 'keyword' tokenizer which leaves the source text untouched.

On Wednesday, December 11, 2013 3:24:22 PM UTC-8, Rahul Malireddy wrote:

Hi,

I am trying to use elasticsearch to index my log files. Some of my
environment details are given below:

Version of elasticsearch that I am using: 0.90.3
Version of JVM that I am using: 1.7.0_17 JVM (64-bit)
Operating System: Windows Server 2008
Cluster Health:

  • cluster_name: logstash
  • status: yellow
  • timed_out: false
  • number_of_nodes: 2
  • number_of_data_nodes: 1
  • active_primary_shards: 15
  • active_shards: 15
  • relocating_shards: 0
  • initializing_shards: 0
  • unassigned_shards: 15

Right now, I am in development stage and am playing around with
elasticsearchto see how it works. I am running into one problem. Some of
the documents in my index have the hash symbol (#) in them. For Example:
FlexCubeCash_001#0103347CAD2203178.000000000. When I query for documents
that have this string (FlexCubeCash_001#0103347CAD2203178.000000000) in
them, it returns all the documents which have the string "FlexCubeCash_001"
(in other words text before the hash symbol) in them.

How do I get around this problem? I must be doing some thing wrong. I
have created a small snippet of code to reproduce the problem your
environment. This code snippet is in the Gist URL given below:

https://gist.github.com/rahul-m/7920182

As you can see from the code snippet, there is only document that matches
the search string but I get all three documents back.

If you have any questions or need any other information, please let me
know, Any help provided will be greatly appreciated.

Thank you.
Rahul

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/92726019-87c0-47b5-946d-5343c5b633e1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4