I run a forum software called Xenforo and it uses ElasticSearch as a addon. 
It works great and I have enjoyed learning all about ES.
What I would like to be able to do is search messages that contain 
parentheses. For example a message will contain:
This is a picture of Andy (Andy).
So I would like to be able to search for (Andy) including the parenthesis.
In researching this, it looks like the only way to accomplish this is to 
create an analyzer as described here:
http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html 
If I'm not mistaken would these be the steps to create what I would like to 
do?
Delete existing index 
Run the analyzer script 
Re-index my forum 
 
Thank you kindly for your assistance.
Andy
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              When I do a _mapping I get the following information.
{ 
"xenforo113" : { 
"post" : { 
"_source" : { 
"enabled" : false 
}, 
"properties" : { 
"date" : { 
"type" : "long", 
"store" : "yes" 
}, 
"discussion_id" : { 
"type" : "long", 
"store" : "yes" 
}, 
"message" : { 
"type" : "string" 
}, 
"node" : { 
"type" : "long" 
}, 
"thread" : { 
"type" : "long" 
}, 
"title" : { 
"type" : "string" 
}, 
"user" : { 
"type" : "long", 
"store" : "yes" 
} 
} 
},
What exactly do I need to do to create a new index with the above mapping and a char map to 
change the ( to an underscore. Or is there a better way that would index the parenthesis?
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              On Sunday, April 14, 2013 2:15:08 PM UTC-7, Andy Bajka wrote:
I run a forum software called Xenforo and it uses Elasticsearch as a 
addon. It works great and I have enjoyed learning all about ES.
What I would like to be able to do is search messages that contain 
parentheses. For example a message will contain:
This is a picture of Andy (Andy).
So I would like to be able to search for (Andy) including the parenthesis.
In researching this, it looks like the only way to accomplish this is to 
create an analyzer as described here:
The domain name Fullscale.co is for sale | Dan.com 
If I'm not mistaken would these be the steps to create what I would like 
to do?
Delete existing index 
Run the analyzer script 
Re-index my forum 
 
Thank you kindly for your assistance.
Andy
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              By the way the developer of Xenforo wrote the following when I asked how I 
can have parenthesis indexed:
That's getting into tokenizers and analysis: 
http://www.elasticsearch.org/guide/reference/index-modules/analysis/ 
So it look like I need to do several things in order to re-index in a way 
that duplicates what is already there but adds the char mapping.
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              Looks like I need to create an analyzer that uses the array type property.
http://www.elasticsearch.org/guide/reference/mapping/array-type/ 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              Looking at the Xenforo code, I need to replicate this mapping.
public static $optimizedGenericMapping = array(
    "_source" => array("enabled" => false),
    "properties" => array(
        "title" => array("type" => "string"),
        "message" => array("type" => "string"),
        "date" => array("type" => "long", "store" => "yes"),
        "user" => array("type" => "long", "store" => "yes"),
        "discussion_id" => array("type" => "long", "store" => "yes")
    )
); 
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              I've taken a stab at creating my own analyzer mapping:
"settings" : {
    "index" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    }, 
    "analysis" : {
        "filter" : {
            "tweet_filter" : {
                "type" : "word_delimiter",
                "type_table": ["( => ALPHA", ") => ALPHA"]
            } 
        },
        "analyzer" : {
            "tweet_analyzer" : {
                "type" : "custom",
                "tokenizer" : "whitespace",
                "filter" : ["lowercase", "tweet_filter"]
            }
        }
    }
},
"mappings" : {
    "source" : {"enabled" : "false"},
        "properties" : {
            "title" : {"type" : "string"},
            "message" : {"type" : "string"},
         "date" : {"type" : "long", "store" : "yes"},
         "user" : {"type" : "long", "store" : "yes"},
         "discussion_id" : {"type" : "long", "store" : "yes"}
        }
    }
}
 
Here is the _mapping which is not correct.
curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true ' 
{ 
"twitter" : { 
"source" : { 
"enabled" : false, 
"properties" : { } 
}, 
"properties" : { 
"properties" : { } 
} 
} 
}
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              Also it said I could not use the underscore in _source so I changed it to 
source.
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              I'm making progress. It's still not like the mapping of the Xenforo 
ElasticSearch, but getting closer:
{ 
"twitter" : { 
"tweet" : { 
"properties" : { 
"date" : { 
"type" : "long", 
"store" : "yes" 
}, 
"discussion_id" : { 
"type" : "long", 
"store" : "yes" 
}, 
"message" : { 
"type" : "string", 
"analyzer" : "tweet_analyzer" 
}, 
"title" : { 
"type" : "string" 
}, 
"user" : { 
"type" : "long", 
"store" : "yes" 
} 
} 
} 
}
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              This is a good sign, the filter works.
curl -XGET 'localhost:9200/twitter/_analyze?field=message&pretty=1' -d 
'(andy)' 
{ 
"tokens" : [ { 
"token" : "(andy)", 
"start_offset" : 0, 
"end_offset" : 6, 
"type" : "word", 
"position" : 1 
} ] 
}
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              I think I got it!!
curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true ' 
{ 
"twitter" : { 
"post" : { 
"_source" : { 
"enabled" : false 
}, 
"properties" : { 
"date" : { 
"type" : "long", 
"store" : "yes" 
}, 
"discussion_id" : { 
"type" : "long", 
"store" : "yes" 
}, 
"message" : { 
"type" : "string", 
"analyzer" : "tweet_analyzer" 
}, 
"title" : { 
"type" : "string" 
}, 
"user" : { 
"type" : "long", 
"store" : "yes" 
} 
} 
} 
} 
}
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
              
                Ivan  
                (Ivan Brusic)
               
              
                  
                    April 15, 2013,  3:16pm
                   
                   
              12 
               
             
            
              Glad we can help you out. 
You will get more flexibility by switching from whitespace tokenizer to a 
pattern tokenizer so that you can split on additional characters such as 
commas and periods in addition to whitespace.
-- 
Ivan
On Sun, Apr 14, 2013 at 6:59 PM, Andy Bajka andybajka2012@gmail.com  wrote:
I think I got it!!
curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true ' 
{ 
"twitter" : { 
"post" : { 
"_source" : { 
"enabled" : false 
}, 
"properties" : { 
"date" : { 
"type" : "long", 
"store" : "yes" 
}, 
"discussion_id" : { 
"type" : "long", 
"store" : "yes" 
}, 
"message" : { 
"type" : "string", 
"analyzer" : "tweet_analyzer" 
}, 
"title" : { 
"type" : "string" 
}, 
"user" : { 
"type" : "long", 
"store" : "yes" 
} 
} 
} 
} 
}
-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an 
email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .
             
            
               
               
               
            
            
           
          
            
            
              Hi Ivan,
Thank you for the suggestion. So far I'm pretty happy with the results that 
the whitespace tokenizer indexes. I think most of the data that we look for 
on my forum is the type that has white space around the word, so perhaps 
it's fine the way it is. I'll continue to monitor my results.
On Monday, April 15, 2013 8:16:35 AM UTC-7, Ivan Brusic wrote:
Glad we can help you out. 
You will get more flexibility by switching from whitespace tokenizer to a 
pattern tokenizer so that you can split on additional characters such as 
commas and periods in addition to whitespace.
-- 
Ivan
 
-- 
You received this message because you are subscribed to the Google Groups "elasticsearch" group. 
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com . 
For more options, visit https://groups.google.com/groups/opt_out .