RegEx Filter Not Matching on Hash tag (#)

Hi,
I'm trying to do a RegEx Filter to match on .#. to find all the items (in
a field) that contains a hash tag. I looked around and thought maybe the
analyzer was stripping the character, so I took a cue from a previous post (
https://groups.google.com/forum/#!searchin/elasticsearch/hashtag/elasticsearch/TNTehyS5lL8/EIk_HptjAQoJ)
which led me to the following _setting on the index:

{
"analysis":{
"filter":{
"tweet_filter":{
"type":"word_delimiter",
"type_table":[
"# => ALPHA",
"@ => ALPHA"
]
}
},
"analyzer":{
"lowercase":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase"
]
},
"tweet_analyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":[
"lowercase",
"tweet_filter"
]
}
}
}
}

The mapping:

{
"contacts":{
"analyzer":"lowercase",
"date_detection":false,
"dynamic_templates":[
{
"sourceName":{
"match":"sourceName",
"mapping":{
"type":"string",
"analyzer":"tweet_analyzer"
}
}
},
{
"template_1":{
"match":"*",
"mapping":{
"type":"string",
"index":"analyzed"
}
}
}
],
"properties":{
"contactBirthday":{
"type":"date",
"format":"dateOptionalTime"
},
"dateCreated":{
"type":"date",
"format":"dateOptionalTime"
},
"lastActivityDate":{
"type":"date",
"format":"dateOptionalTime"
}
}
}
}

This didn't change much on searching for exact matching terms with the
hashtag (seems to work find with the lowercase analyzer anyhow) using a
query like the following:

{
"query":{
"filtered":{
"query":{
"match_all":{

        }
     },
     "filter":{  
        "and":[  
           {  
              "term":{  
                 "customerID":201
              }
           },
           {  
              "range":{  
                 "dateCreated":{  
                    "from":0,
                    "to":1426775109000
                 }
              }
           },
           {  
              "or":[  
                 {  
                    "and":[  
                       {  
                          "term":{  
                             "sourceName":"#offertl"
                          }
                       }
                    ]
                 }
              ]
           }
        ]
     }
  }

}
}

However, using a regex filter doesn't return matches when using a term such
as
"sourceName":".#[oO][fF][fF][eE][rR][tT][lL]."

I want to try and match on terms with hash tags in them with the RegEx
filter. Is this possible?

Thanks for any insight!

Mahesh

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ce06e9dd-94f9-4599-a9d0-a487520d01ad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Try escaping the hash tag. It has a special meaning in the Lucene Dialect
of Regular Expression
https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/util/automaton/RegExp.html?is-external=true
.

On Thu, Mar 19, 2015 at 11:44 AM, Mahesh Kommareddi <
mahesh.kommareddi@gmail.com> wrote:

Hi,
I'm trying to do a RegEx Filter to match on .#. to find all the items
(in a field) that contains a hash tag. I looked around and thought maybe
the analyzer was stripping the character, so I took a cue from a previous
post (
Redirecting to Google Groups)
which led me to the following _setting on the index:

{
"analysis":{
"filter":{
"tweet_filter":{
"type":"word_delimiter",
"type_table":[
"# => ALPHA",
"@ => ALPHA"
]
}
},
"analyzer":{
"lowercase":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase"
]
},
"tweet_analyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":[
"lowercase",
"tweet_filter"
]
}
}
}
}

The mapping:

{
"contacts":{
"analyzer":"lowercase",
"date_detection":false,
"dynamic_templates":[
{
"sourceName":{
"match":"sourceName",
"mapping":{
"type":"string",
"analyzer":"tweet_analyzer"
}
}
},
{
"template_1":{
"match":"*",
"mapping":{
"type":"string",
"index":"analyzed"
}
}
}
],
"properties":{
"contactBirthday":{
"type":"date",
"format":"dateOptionalTime"
},
"dateCreated":{
"type":"date",
"format":"dateOptionalTime"
},
"lastActivityDate":{
"type":"date",
"format":"dateOptionalTime"
}
}
}
}

This didn't change much on searching for exact matching terms with the
hashtag (seems to work find with the lowercase analyzer anyhow) using a
query like the following:

{
"query":{
"filtered":{
"query":{
"match_all":{

        }
     },
     "filter":{
        "and":[
           {
              "term":{
                 "customerID":201
              }
           },
           {
              "range":{
                 "dateCreated":{
                    "from":0,
                    "to":1426775109000
                 }
              }
           },
           {
              "or":[
                 {
                    "and":[
                       {
                          "term":{
                             "sourceName":"#offertl"
                          }
                       }
                    ]
                 }
              ]
           }
        ]
     }
  }

}
}

However, using a regex filter doesn't return matches when using a term
such as
"sourceName":".#[oO][fF][fF][eE][rR][tT][lL]."

I want to try and match on terms with hash tags in them with the RegEx
filter. Is this possible?

Thanks for any insight!

Mahesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce06e9dd-94f9-4599-a9d0-a487520d01ad%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ce06e9dd-94f9-4599-a9d0-a487520d01ad%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0NbOK5reQ9jK%3DRycVCH3OvsmWgtEs046oZvaRXW-WuLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Interesting. Thanks for the heads up! That worked. Looks like I'll have to
escape the "@" as well.

Really appreciate that.

On Thu, Mar 19, 2015 at 11:59 AM, Nikolas Everett nik9000@gmail.com wrote:

Try escaping the hash tag. It has a special meaning in the Lucene
Dialect of Regular Expression
https://lucene.apache.org/core/4_1_0/core/org/apache/lucene/util/automaton/RegExp.html?is-external=true
.

On Thu, Mar 19, 2015 at 11:44 AM, Mahesh Kommareddi <
mahesh.kommareddi@gmail.com> wrote:

Hi,
I'm trying to do a RegEx Filter to match on .#. to find all the items
(in a field) that contains a hash tag. I looked around and thought maybe
the analyzer was stripping the character, so I took a cue from a previous
post (
Redirecting to Google Groups)
which led me to the following _setting on the index:

{
"analysis":{
"filter":{
"tweet_filter":{
"type":"word_delimiter",
"type_table":[
"# => ALPHA",
"@ => ALPHA"
]
}
},
"analyzer":{
"lowercase":{
"type":"custom",
"tokenizer":"keyword",
"filter":[
"lowercase"
]
},
"tweet_analyzer":{
"type":"custom",
"tokenizer":"whitespace",
"filter":[
"lowercase",
"tweet_filter"
]
}
}
}
}

The mapping:

{
"contacts":{
"analyzer":"lowercase",
"date_detection":false,
"dynamic_templates":[
{
"sourceName":{
"match":"sourceName",
"mapping":{
"type":"string",
"analyzer":"tweet_analyzer"
}
}
},
{
"template_1":{
"match":"*",
"mapping":{
"type":"string",
"index":"analyzed"
}
}
}
],
"properties":{
"contactBirthday":{
"type":"date",
"format":"dateOptionalTime"
},
"dateCreated":{
"type":"date",
"format":"dateOptionalTime"
},
"lastActivityDate":{
"type":"date",
"format":"dateOptionalTime"
}
}
}
}

This didn't change much on searching for exact matching terms with the
hashtag (seems to work find with the lowercase analyzer anyhow) using a
query like the following:

{
"query":{
"filtered":{
"query":{
"match_all":{

        }
     },
     "filter":{
        "and":[
           {
              "term":{
                 "customerID":201
              }
           },
           {
              "range":{
                 "dateCreated":{
                    "from":0,
                    "to":1426775109000
                 }
              }
           },
           {
              "or":[
                 {
                    "and":[
                       {
                          "term":{
                             "sourceName":"#offertl"
                          }
                       }
                    ]
                 }
              ]
           }
        ]
     }
  }

}
}

However, using a regex filter doesn't return matches when using a term
such as
"sourceName":".#[oO][fF][fF][eE][rR][tT][lL]."

I want to try and match on terms with hash tags in them with the RegEx
filter. Is this possible?

Thanks for any insight!

Mahesh

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ce06e9dd-94f9-4599-a9d0-a487520d01ad%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/ce06e9dd-94f9-4599-a9d0-a487520d01ad%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/k5Qy3RGaaf8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0NbOK5reQ9jK%3DRycVCH3OvsmWgtEs046oZvaRXW-WuLg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0NbOK5reQ9jK%3DRycVCH3OvsmWgtEs046oZvaRXW-WuLg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKfHp37p91L%3DDeBBWPsm_UJ39UDni36v%2B7aCh3BwFZK2u_ur1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.