I would like to be able to search parenthsis

I run a forum software called Xenforo and it uses ElasticSearch as a addon.
It works great and I have enjoyed learning all about ES.

What I would like to be able to do is search messages that contain
parentheses. For example a message will contain:

This is a picture of Andy (Andy).

So I would like to be able to search for (Andy) including the parenthesis.

In researching this, it looks like the only way to accomplish this is to
create an analyzer as described here:

http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

If I'm not mistaken would these be the steps to create what I would like to
do?

  1. Delete existing index
  2. Run the analyzer script
  3. Re-index my forum

Thank you kindly for your assistance.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

When I do a _mapping I get the following information.

{
"xenforo113" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string"
},
"node" : {
"type" : "long"
},
"thread" : {
"type" : "long"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
},

What exactly do I need to do to create a new index with the above mapping and a char map to
change the ( to an underscore. Or is there a better way that would index the parenthesis?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Sunday, April 14, 2013 2:15:08 PM UTC-7, Andy Bajka wrote:

I run a forum software called Xenforo and it uses ElasticSearch as a
addon. It works great and I have enjoyed learning all about ES.

What I would like to be able to do is search messages that contain
parentheses. For example a message will contain:

This is a picture of Andy (Andy).

So I would like to be able to search for (Andy) including the parenthesis.

In researching this, it looks like the only way to accomplish this is to
create an analyzer as described here:

http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

If I'm not mistaken would these be the steps to create what I would like
to do?

  1. Delete existing index
  2. Run the analyzer script
  3. Re-index my forum

Thank you kindly for your assistance.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

By the way the developer of Xenforo wrote the following when I asked how I
can have parenthesis indexed:

That's getting into tokenizers and analysis:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/

So it look like I need to do several things in order to re-index in a way
that duplicates what is already there but adds the char mapping.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Looks like I need to create an analyzer that uses the array type property.

http://www.elasticsearch.org/guide/reference/mapping/array-type/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Looking at the Xenforo code, I need to replicate this mapping.

public static $optimizedGenericMapping = array(
    "_source" => array("enabled" => false),
    "properties" => array(
        "title" => array("type" => "string"),
        "message" => array("type" => "string"),
        "date" => array("type" => "long", "store" => "yes"),
        "user" => array("type" => "long", "store" => "yes"),
        "discussion_id" => array("type" => "long", "store" => "yes")
    )
); 

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I've taken a stab at creating my own analyzer mapping:

"settings" : {
    "index" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    }, 
    "analysis" : {
        "filter" : {
            "tweet_filter" : {
                "type" : "word_delimiter",
                "type_table": ["( => ALPHA", ") => ALPHA"]
            } 
        },
        "analyzer" : {
            "tweet_analyzer" : {
                "type" : "custom",
                "tokenizer" : "whitespace",
                "filter" : ["lowercase", "tweet_filter"]
            }
        }
    }
},
"mappings" : {
    "source" : {"enabled" : "false"},
        "properties" : {
            "title" : {"type" : "string"},
            "message" : {"type" : "string"},
         "date" : {"type" : "long", "store" : "yes"},
         "user" : {"type" : "long", "store" : "yes"},
         "discussion_id" : {"type" : "long", "store" : "yes"}
        }
    }
}

Here is the _mapping which is not correct.

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"source" : {
"enabled" : false,
"properties" : { }
},
"properties" : {
"properties" : { }
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Also it said I could not use the underscore in _source so I changed it to
source.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I'm making progress. It's still not like the mapping of the Xenforo
ElasticSearch, but getting closer:

{
"twitter" : {
"tweet" : {
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

This is a good sign, the filter works.

curl -XGET 'localhost:9200/twitter/_analyze?field=message&pretty=1' -d
'(andy)'
{
"tokens" : [ {
"token" : "(andy)",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I think I got it!!

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Glad we can help you out. :slight_smile:

You will get more flexibility by switching from whitespace tokenizer to a
pattern tokenizer so that you can split on additional characters such as
commas and periods in addition to whitespace.

--
Ivan

On Sun, Apr 14, 2013 at 6:59 PM, Andy Bajka andybajka2012@gmail.com wrote:

I think I got it!!

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Ivan,

Thank you for the suggestion. So far I'm pretty happy with the results that
the whitespace tokenizer indexes. I think most of the data that we look for
on my forum is the type that has white space around the word, so perhaps
it's fine the way it is. I'll continue to monitor my results.

On Monday, April 15, 2013 8:16:35 AM UTC-7, Ivan Brusic wrote:

Glad we can help you out. :slight_smile:

You will get more flexibility by switching from whitespace tokenizer to a
pattern tokenizer so that you can split on additional characters such as
commas and periods in addition to whitespace.

--
Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.