Am I  reading it wrong or has something changed recently? From 
http://www.elasticsearch.org/guide/reference/index-modules/analysis/standard-tokenizer.html: 
"[A standard tokenizer] also splits words at hyphens, unless there’s a 
number in the token, in which case the whole token is interpreted as a 
product number and is not split."
curl -XGET 'http://localhost:9200/twitter/_analyze/ ? 
pretty=true&analyzer=standard' -d '123-456-7890' 
{ 
"tokens" : [ { 
"token" : "123", 
"start_offset" : 0, 
"end_offset" : 3, 
"type" : "", 
"position" : 1 
}, { 
"token" : "456", 
"start_offset" : 4, 
"end_offset" : 7, 
"type" : "", 
"position" : 2 
}, { 
"token" : "7890", 
"start_offset" : 8, 
"end_offset" : 12, 
"type" : "", 
"position" : 3 
} ]
"It recognizes email addresses and internet hostnames as one token."
curl -GET 'http://localhost:9200/twitter/_analyze/ ? 
pretty=true&analyzer=standard' -d 'somebody@example.com' 
{ 
"tokens" : [ { 
"token" : "somebody", 
"start_offset" : 0, 
"end_offset" : 8, 
"type" : "", 
"position" : 1 
}, { 
"token" : "example.com ", 
"start_offset" : 9, 
"end_offset" : 20, 
"type" : "", 
"position" : 2 
} ]
             
            
               
               
               
            
            
           
          
            
              
                kimchy  
                (Shay Banon)
               
              
                  
                    January 19, 2012,  6:36pm
                   
                   
              2 
               
             
            
              You are right, this is the old behavior of the tokenizer (though you can 
retain the email behavior by using the uax tokenizer). I will fix it 
shortly.
On Thu, Jan 19, 2012 at 3:05 PM, George Sakkis george.sakkis@gmail.com wrote:
Am I  reading it wrong or has something changed recently? From
Elasticsearch Platform — Find real-time answers at scale | Elastic  
:
"[A standard tokenizer] also splits words at hyphens, unless there’s a 
number in the token, in which case the whole token is interpreted as a 
product number and is not split."
curl -XGET 'http://localhost:9200/twitter/_analyze/ ? 
pretty=true&analyzer=standard' -d '123-456-7890' 
{ 
"tokens" : [ { 
"token" : "123", 
"start_offset" : 0, 
"end_offset" : 3, 
"type" : "", 
"position" : 1 
}, { 
"token" : "456", 
"start_offset" : 4, 
"end_offset" : 7, 
"type" : "", 
"position" : 2 
}, { 
"token" : "7890", 
"start_offset" : 8, 
"end_offset" : 12, 
"type" : "", 
"position" : 3 
} ]
"It recognizes email addresses and internet hostnames as one token."
curl -GET 'http://localhost:9200/twitter/_analyze/ ? 
pretty=true&analyzer=standard' -d 'somebody@example.com' 
{ 
"tokens" : [ { 
"token" : "somebody", 
"start_offset" : 0, 
"end_offset" : 8, 
"type" : "", 
"position" : 1 
}, { 
"token" : "example.com ", 
"start_offset" : 9, 
"end_offset" : 20, 
"type" : "", 
"position" : 2 
} ]