I would like to be able to search parenthsis

Andy_Bajka_2 · April 14, 2013, 9:15pm

I run a forum software called Xenforo and it uses ElasticSearch as a addon.
It works great and I have enjoyed learning all about ES.

What I would like to be able to do is search messages that contain
parentheses. For example a message will contain:

This is a picture of Andy (Andy).

So I would like to be able to search for (Andy) including the parenthesis.

In researching this, it looks like the only way to accomplish this is to
create an analyzer as described here:

http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

If I'm not mistaken would these be the steps to create what I would like to
do?

Delete existing index
Run the analyzer script
Re-index my forum

Thank you kindly for your assistance.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 14, 2013, 9:54pm

When I do a _mapping I get the following information.

{
"xenforo113" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string"
},
"node" : {
"type" : "long"
},
"thread" : {
"type" : "long"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
},

What exactly do I need to do to create a new index with the above mapping and a char map to
change the ( to an underscore. Or is there a better way that would index the parenthesis?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 14, 2013, 9:59pm

On Sunday, April 14, 2013 2:15:08 PM UTC-7, Andy Bajka wrote:

I run a forum software called Xenforo and it uses Elasticsearch as a
addon. It works great and I have enjoyed learning all about ES.

What I would like to be able to do is search messages that contain
parentheses. For example a message will contain:

This is a picture of Andy (Andy).

So I would like to be able to search for (Andy) including the parenthesis.

In researching this, it looks like the only way to accomplish this is to
create an analyzer as described here:

The domain name Fullscale.co is for sale | Dan.com

If I'm not mistaken would these be the steps to create what I would like
to do?

Delete existing index

Run the analyzer script

Re-index my forum

Thank you kindly for your assistance.

Andy

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 14, 2013, 10:02pm

By the way the developer of Xenforo wrote the following when I asked how I
can have parenthesis indexed:

That's getting into tokenizers and analysis:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/

So it look like I need to do several things in order to re-index in a way
that duplicates what is already there but adds the char mapping.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 14, 2013, 10:46pm

Looks like I need to create an analyzer that uses the array type property.

http://www.elasticsearch.org/guide/reference/mapping/array-type/

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 12:59am

Looking at the Xenforo code, I need to replicate this mapping.

public static $optimizedGenericMapping = array(
    "_source" => array("enabled" => false),
    "properties" => array(
        "title" => array("type" => "string"),
        "message" => array("type" => "string"),
        "date" => array("type" => "long", "store" => "yes"),
        "user" => array("type" => "long", "store" => "yes"),
        "discussion_id" => array("type" => "long", "store" => "yes")
    )
);

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 1:12am

I've taken a stab at creating my own analyzer mapping:

"settings" : {
    "index" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    }, 
    "analysis" : {
        "filter" : {
            "tweet_filter" : {
                "type" : "word_delimiter",
                "type_table": ["( => ALPHA", ") => ALPHA"]
            } 
        },
        "analyzer" : {
            "tweet_analyzer" : {
                "type" : "custom",
                "tokenizer" : "whitespace",
                "filter" : ["lowercase", "tweet_filter"]
            }
        }
    }
},
"mappings" : {
    "source" : {"enabled" : "false"},
        "properties" : {
            "title" : {"type" : "string"},
            "message" : {"type" : "string"},
         "date" : {"type" : "long", "store" : "yes"},
         "user" : {"type" : "long", "store" : "yes"},
         "discussion_id" : {"type" : "long", "store" : "yes"}
        }
    }
}

Here is the _mapping which is not correct.

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"source" : {
"enabled" : false,
"properties" : { }
},
"properties" : {
"properties" : { }
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 1:14am

Also it said I could not use the underscore in _source so I changed it to
source.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 1:50am

I'm making progress. It's still not like the mapping of the Xenforo
ElasticSearch, but getting closer:

{
"twitter" : {
"tweet" : {
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 1:53am

This is a good sign, the filter works.

curl -XGET 'localhost:9200/twitter/_analyze?field=message&pretty=1' -d
'(andy)'
{
"tokens" : [ {
"token" : "(andy)",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 1:59am

I think I got it!!

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · April 15, 2013, 3:16pm

Glad we can help you out.

You will get more flexibility by switching from whitespace tokenizer to a
pattern tokenizer so that you can split on additional characters such as
commas and periods in addition to whitespace.

--
Ivan

On Sun, Apr 14, 2013 at 6:59 PM, Andy Bajka andybajka2012@gmail.com wrote:

I think I got it!!

curl -XGET 'http://localhost:9200/twitter/_mapping?pretty=true'
{
"twitter" : {
"post" : {
"_source" : {
"enabled" : false
},
"properties" : {
"date" : {
"type" : "long",
"store" : "yes"
},
"discussion_id" : {
"type" : "long",
"store" : "yes"
},
"message" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
},
"title" : {
"type" : "string"
},
"user" : {
"type" : "long",
"store" : "yes"
}
}
}
}
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy_Bajka_2 · April 15, 2013, 4:17pm

Hi Ivan,

Thank you for the suggestion. So far I'm pretty happy with the results that
the whitespace tokenizer indexes. I think most of the data that we look for
on my forum is the type that has white space around the word, so perhaps
it's fine the way it is. I'll continue to monitor my results.

On Monday, April 15, 2013 8:16:35 AM UTC-7, Ivan Brusic wrote:

Glad we can help you out.

You will get more flexibility by switching from whitespace tokenizer to a
pattern tokenizer so that you can split on additional characters such as
commas and periods in addition to whitespace.

--
Ivan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Parentheses in http querystring help needed Elasticsearch	5	2136	July 6, 2017
How to make the query for _all so that it will not treat email as two substrings Elasticsearch	2	327	July 6, 2017
Looking stemmer for tenses Elasticsearch	3	333	July 6, 2017
How to search for a bracket using simple_query_string? Elasticsearch	5	1873	October 10, 2022
Index and find emails using pre- and postfix wildcards Elasticsearch	1	832	July 6, 2017

I would like to be able to search parenthsis

Related topics