Indexing and searching on special characters

Hi all,

I've been using elasticsearch-1.2.1 and I've been indexing .xml and .jsp
file content.

And this is how my index has been analyzed as:

    "settings": {
         "analysis": {
             "filter": {
                "word_delimiter" : {
                   "type" : "word_delimiter",
                   "preserve_original" : true,
                   "split_on_case_change" : false,
                   "stem_english_possessive" : false,
                   "type_table" : [
                                          "# => ALPHA", 
                                          "@ => ALPHA",
                                          "$ => ALPHA", 
                                          "& => ALPHA",
                                          "? => ALPHA",
                                          "= => ALPHA"
                                         ]
                }
             },
             "analyzer": {
                "custom_analyzer" : {
                   "type" : "custom",
                   "tokenizer" : "whitespace",
                   "filter" : ["word_delimiter", "lowercase"]
                }
          }
    }

And the file contains one of the lines as, <%@ page
import="java.util.Vector" %>
.

While searching the index as import="java.**, i'm getting the result as
expected. But while searching for the keyword
"import"*, I dint get any of
the result.

On analyzing with the help of kopf plugin, I came to know my content *import="java.util.Vector"
*was indexed into *import="java.util.Vector", **import=, *java, util and
vector.

But what I want is *import= to be indexed as it is and as well as import,
*so that it'll match my scenario.

Also I've tried other option i.e. without the type_table, in that case my
search results gets reversed. The keyword search for "import" works and import="java."
*doesn't seems to be working.

Anybody have any idea on how the index should be analyzed to get my desired
index?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27051ee9-7b25-461c-96ee-36cc1747ed11%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

In the type_table mentioned above, assinging "= => DIGIT" instead of "=
=>ALPHA" yields the indexing as I expected.

On Monday, 24 November 2014 23:18:58 UTC+5:30, Anand kumar wrote:

Hi all,

I've been using elasticsearch-1.2.1 and I've been indexing .xml and .jsp
file content.

And this is how my index has been analyzed as:

    "settings": {
         "analysis": {
             "filter": {
                "word_delimiter" : {
                   "type" : "word_delimiter",
                   "preserve_original" : true,
                   "split_on_case_change" : false,
                   "stem_english_possessive" : false,
                   "type_table" : [
                                          "# => ALPHA", 
                                          "@ => ALPHA",
                                          "$ => ALPHA", 
                                          "& => ALPHA",
                                          "? => ALPHA",
                                          "= => ALPHA"
                                         ]
                }
             },
             "analyzer": {
                "custom_analyzer" : {
                   "type" : "custom",
                   "tokenizer" : "whitespace",
                   "filter" : ["word_delimiter", "lowercase"]
                }
          }
    }

And the file contains one of the lines as, <%@ page
import="java.util.Vector" %>
.

While searching the index as import="java.**, i'm getting the result as
expected. But while searching for the keyword
"import"*, I dint get any
of the result.

On analyzing with the help of kopf plugin, I came to know my content *import="java.util.Vector"
*was indexed into *import="java.util.Vector", **import=, *java, util and
vector.

But what I want is *import= to be indexed as it is and as well as import,
*so that it'll match my scenario.

Also I've tried other option i.e. without the type_table, in that case my
search results gets reversed. The keyword search for "import" works and import="java."
*doesn't seems to be working.

Anybody have any idea on how the index should be analyzed to get my
desired index?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0073b0d-aa4b-4160-b6a7-35fe718e2107%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.