Uax_url_email tokenizer

smhdiu · November 6, 2012, 1:18pm

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

Ivan · November 6, 2012, 11:22pm

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smhdiu@gmail.com wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

smhdiu · November 7, 2012, 10:15am

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen <smh...@gmail.com<javascript:>

wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

smhdiu · November 8, 2012, 9:54am

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.com wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

Ivan · November 8, 2012, 4:55pm

You would need a custom analyzer in order to use that tokenizer.

Analyzers and tokenizers are Lucene concepts, so you can learn more by
looking into Lucene. Remember to have a consistent analyzer between
indexing and querying. Elasticsearch mappings make this part easy.

--
Ivan

On Thu, Nov 8, 2012 at 1:54 AM, mohsin husen smhdiu@gmail.com wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(**indexName);
searchRequestBuilder.**setSearchType(SearchType.**DEFAULT);
searchRequestBuilder.**setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));
searchRequestBuilder.setFilter(FilterBuilders.
andFilter().add(**FilterBuilders. termFilter("email",email)));
searchRequestBuilder.setFrom(**0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.com wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

--

Igor_Motov · November 9, 2012, 7:48pm

Standard analyzer splits emails addresses into terms.
So elasticsearch@googlegroups.com is indexed as two terms: "elasticsearch"
and "googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example
here http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.com wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

smhdiu · November 12, 2012, 5:43pm

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows


curl -XPUT 'http://localhost:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "1" : {     // this is for type 1
         "properties" : {
            "authorOfAbstracts" : {
               "type" : "string"
            },
            "firstName" : {
               "type" : "string","search_analyzer" : 
"name_analyzer","index_analyzer" : "name_analyzer"},
"lastName" : {
               "type" : "string","search_analyzer" : 
"name_analyzer","index_analyzer" : "name_analyzer"},
"email" : {
"type" : "string",
   "search_analyzer" : "email_analyzer",
   "index_analyzer" : "email_analyzer"
   }
   }
   }
   },
   "settings" : {
      "analysis" : {
          "analyzer" : {
            "name_analyzer" : {
               "filter" : [
                  "standard",
                  "lowercase",
                  "asciifolding"
               ],
               "type" : "custom",
               "tokenizer" : "standard"
            },
"email_analyzer" : {
               "type" : "custom",
               "tokenizer" : "uax_url_email"
            }

} }} }'

now problem with this is when i am adding data to index=test and type=1,
its working fine but when i m inserting the data to type=2 its not working
as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1 data
will be indexed and searched as per the mapping but what about the type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{
"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},
"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com <javascript:> is indexed as two terms:
"elasticsearch" and "googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.com wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

Igor_Motov · November 12, 2012, 11:22pm

Sorry, I don't think I quite understand what you are trying
to achieve here. If you are looking for some automation in assigning
mappings to fields, take a look at Dynamic Templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.htmland Index
Templates APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
.

On Monday, November 12, 2012 12:43:45 PM UTC-5, mohsin husen wrote:

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows
curl -XPUT 'http://localhost:9200/test/?pretty=1' -d ' { "mappings" : { "1" : { // this is for type 1 "properties" : { "authorOfAbstracts" : { "type" : "string" }, "firstName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "lastName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "email" : { "type" : "string", "search_analyzer" : "email_analyzer", "index_analyzer" : "email_analyzer" } } } }, "settings" : { "analysis" : { "analyzer" : { "name_analyzer" : { "filter" : [ "standard", "lowercase", "asciifolding" ], "type" : "custom", "tokenizer" : "standard" }, "email_analyzer" : { "type" : "custom", "tokenizer" : "uax_url_email" }
} }} }'

now problem with this is when i am adding data to index=test and type=1,
its working fine but when i m inserting the data to type=2 its not working
as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1 data
will be indexed and searched as per the mapping but what about the type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{

"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},

"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com is indexed as two terms: "elasticsearch" and "
googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.com wrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

smhdiu · November 13, 2012, 10:41am

Sorry for the confusion Igor.
suppose i have a index as books and their types are departmentid.

so if i want to create the mapping for bookid bookname price author
author_email etc like below

bookid - string , lowercase
bookname - string, lowercase
price - long
author - string lowercase
author_email - string, uax_url_email tokenizer

now problem with this mapping is that it is asking for the type while
creating the mapping and settings.
if i will create mapping with departmentid - 1 then whatever goes in that
department has those mappings and settings which we created.

but now when we add data to second depatmentid -2 it takes by default
values. which is quite right as we havent created mappings and settings for
this type.

so my question is how to create the mapping and the settings that can be
available for all the types by default.

let me know if you still have doubt.

Thanks in advanced.

On Monday, 12 November 2012 23:22:52 UTC, Igor Motov wrote:

Sorry, I don't think I quite understand what you are trying
to achieve here. If you are looking for some automation in assigning
mappings to fields, take a look at Dynamic Templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.htmland Index
Templates APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
.

On Monday, November 12, 2012 12:43:45 PM UTC-5, mohsin husen wrote:

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows
curl -XPUT 'http://localhost:9200/test/?pretty=1' -d ' { "mappings" : { "1" : { // this is for type 1 "properties" : { "authorOfAbstracts" : { "type" : "string" }, "firstName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "lastName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "email" : { "type" : "string", "search_analyzer" : "email_analyzer", "index_analyzer" : "email_analyzer" } } } }, "settings" : { "analysis" : { "analyzer" : { "name_analyzer" : { "filter" : [ "standard", "lowercase", "asciifolding" ], "type" : "custom", "tokenizer" : "standard" }, "email_analyzer" : { "type" : "custom", "tokenizer" : "uax_url_email" }
} }} }'

now problem with this is when i am adding data to index=test and type=1,
its working fine but when i m inserting the data to type=2 its not working
as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1
data will be indexed and searched as per the mapping but what about the
type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{

"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},

"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com is indexed as two terms: "elasticsearch" and
"googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.comwrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

Igor_Motov · November 13, 2012, 11:49am

Why cannot you just use a field for departmentid instead of creating a
whole new type? Type is somewhat heavy construct because of mappings that
are created, maintained and sent between nodes.

On Tuesday, November 13, 2012 5:41:18 AM UTC-5, mohsin husen wrote:

Sorry for the confusion Igor.
suppose i have a index as books and their types are departmentid.

so if i want to create the mapping for bookid bookname price author
author_email etc like below

bookid - string , lowercase
bookname - string, lowercase
price - long
author - string lowercase
author_email - string, uax_url_email tokenizer

now problem with this mapping is that it is asking for the type while
creating the mapping and settings.
if i will create mapping with departmentid - 1 then whatever goes in that
department has those mappings and settings which we created.

but now when we add data to second depatmentid -2 it takes by default
values. which is quite right as we havent created mappings and settings for
this type.

so my question is how to create the mapping and the settings that can be
available for all the types by default.

let me know if you still have doubt.

Thanks in advanced.

On Monday, 12 November 2012 23:22:52 UTC, Igor Motov wrote:

Sorry, I don't think I quite understand what you are trying
to achieve here. If you are looking for some automation in assigning
mappings to fields, take a look at Dynamic Templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.htmland Index
Templates APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
.

On Monday, November 12, 2012 12:43:45 PM UTC-5, mohsin husen wrote:

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows
curl -XPUT 'http://localhost:9200/test/?pretty=1' -d ' { "mappings" : { "1" : { // this is for type 1 "properties" : { "authorOfAbstracts" : { "type" : "string" }, "firstName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "lastName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "email" : { "type" : "string", "search_analyzer" : "email_analyzer", "index_analyzer" : "email_analyzer" } } } }, "settings" : { "analysis" : { "analyzer" : { "name_analyzer" : { "filter" : [ "standard", "lowercase", "asciifolding" ], "type" : "custom", "tokenizer" : "standard" }, "email_analyzer" : { "type" : "custom", "tokenizer" : "uax_url_email" }
} }} }'

now problem with this is when i am adding data to index=test and type=1,
its working fine but when i m inserting the data to type=2 its not working
as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1
data will be indexed and searched as per the mapping but what about the
type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{

"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},

"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com is indexed as two terms: "elasticsearch"
and "googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting empty
result where as i hava data in email with that text.
in other word combination with email is not giving any result if i am
applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.comwrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

smhdiu · November 13, 2012, 12:27pm

Hello Igor,

Thanks for the revert back.
i am lil bit confused in creating the type, i have following situation.

i have book database which has books of different department like Computer,
Account etc..
now each department has millions of books. so what I thought was like
creating the main index as books and then divide the department as type so
while searching particular book in department we can only go to specific
department id(type) rather than searching entire index of books.

so just curious about the approach,
will it degrade more performance in compare to the single type?

Thanks in advanced
Mohsin

On Tuesday, 13 November 2012 11:49:04 UTC, Igor Motov wrote:

Why cannot you just use a field for departmentid instead of creating a
whole new type? Type is somewhat heavy construct because of mappings that
are created, maintained and sent between nodes.

On Tuesday, November 13, 2012 5:41:18 AM UTC-5, mohsin husen wrote:

Sorry for the confusion Igor.
suppose i have a index as books and their types are departmentid.

so if i want to create the mapping for bookid bookname price author
author_email etc like below

bookid - string , lowercase
bookname - string, lowercase
price - long
author - string lowercase
author_email - string, uax_url_email tokenizer

now problem with this mapping is that it is asking for the type while
creating the mapping and settings.
if i will create mapping with departmentid - 1 then whatever goes in that
department has those mappings and settings which we created.

but now when we add data to second depatmentid -2 it takes by default
values. which is quite right as we havent created mappings and settings for
this type.

so my question is how to create the mapping and the settings that can be
available for all the types by default.

let me know if you still have doubt.

Thanks in advanced.

On Monday, 12 November 2012 23:22:52 UTC, Igor Motov wrote:

Sorry, I don't think I quite understand what you are trying
to achieve here. If you are looking for some automation in assigning
mappings to fields, take a look at Dynamic Templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.htmland Index
Templates APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
.

On Monday, November 12, 2012 12:43:45 PM UTC-5, mohsin husen wrote:

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows
curl -XPUT 'http://localhost:9200/test/?pretty=1' -d ' { "mappings" : { "1" : { // this is for type 1 "properties" : { "authorOfAbstracts" : { "type" : "string" }, "firstName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "lastName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "email" : { "type" : "string", "search_analyzer" : "email_analyzer", "index_analyzer" : "email_analyzer" } } } }, "settings" : { "analysis" : { "analyzer" : { "name_analyzer" : { "filter" : [ "standard", "lowercase", "asciifolding" ], "type" : "custom", "tokenizer" : "standard" }, "email_analyzer" : { "type" : "custom", "tokenizer" : "uax_url_email" }
} }} }'

now problem with this is when i am adding data to index=test and
type=1, its working fine but when i m inserting the data to type=2 its not
working as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1
data will be indexed and searched as per the mapping but what about the
type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{

"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},

"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com is indexed as two terms: "elasticsearch"
and "googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));
searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting
empty result where as i hava data in email with that text.
in other word combination with email is not giving any result if i
am applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.comwrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

Igor_Motov · November 13, 2012, 1:06pm

Documents of all types are indexed into the same index. There is special
_type http://www.elasticsearch.org/guide/reference/mapping/type-field.htmlfield that is indexed and used to query documents of a certain type. So,
using different types for different departments doesn't give you any
performance advantages comparing to indexing departmentid as not_analyzed
field, but there is some overhead of handling additional types.

If you want to limit your searches to a single department you will need to
create one index per department or consider using departmentid as a routing
value. My advice, however, would be to just use a single index for now, get
more comfortable with elasticsearch, and optimize it later, when it will
become an issue and you will have some real data and queries to test
different optimizations on.

On Tuesday, November 13, 2012 7:27:39 AM UTC-5, mohsin husen wrote:

Hello Igor,

Thanks for the revert back.
i am lil bit confused in creating the type, i have following situation.

i have book database which has books of different department like
Computer, Account etc..
now each department has millions of books. so what I thought was like
creating the main index as books and then divide the department as type so
while searching particular book in department we can only go to specific
department id(type) rather than searching entire index of books.

so just curious about the approach,
will it degrade more performance in compare to the single type?

Thanks in advanced
Mohsin

On Tuesday, 13 November 2012 11:49:04 UTC, Igor Motov wrote:

Why cannot you just use a field for departmentid instead of creating a
whole new type? Type is somewhat heavy construct because of mappings that
are created, maintained and sent between nodes.

On Tuesday, November 13, 2012 5:41:18 AM UTC-5, mohsin husen wrote:

Sorry for the confusion Igor.
suppose i have a index as books and their types are departmentid.

so if i want to create the mapping for bookid bookname price author
author_email etc like below

bookid - string , lowercase
bookname - string, lowercase
price - long
author - string lowercase
author_email - string, uax_url_email tokenizer

now problem with this mapping is that it is asking for the type while
creating the mapping and settings.
if i will create mapping with departmentid - 1 then whatever goes in
that department has those mappings and settings which we created.

but now when we add data to second depatmentid -2 it takes by default
values. which is quite right as we havent created mappings and settings for
this type.

so my question is how to create the mapping and the settings that can be
available for all the types by default.

let me know if you still have doubt.

Thanks in advanced.

On Monday, 12 November 2012 23:22:52 UTC, Igor Motov wrote:

Sorry, I don't think I quite understand what you are trying
to achieve here. If you are looking for some automation in assigning
mappings to fields, take a look at Dynamic Templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.htmland Index
Templates APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
.

On Monday, November 12, 2012 12:43:45 PM UTC-5, mohsin husen wrote:

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows
curl -XPUT 'http://localhost:9200/test/?pretty=1' -d ' { "mappings" : { "1" : { // this is for type 1 "properties" : { "authorOfAbstracts" : { "type" : "string" }, "firstName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "lastName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "email" : { "type" : "string", "search_analyzer" : "email_analyzer", "index_analyzer" : "email_analyzer" } } } }, "settings" : { "analysis" : { "analyzer" : { "name_analyzer" : { "filter" : [ "standard", "lowercase", "asciifolding" ], "type" : "custom", "tokenizer" : "standard" }, "email_analyzer" : { "type" : "custom", "tokenizer" : "uax_url_email" }
} }} }'

now problem with this is when i am adding data to index=test and
type=1, its working fine but when i m inserting the data to type=2 its not
working as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1
data will be indexed and searched as per the mapping but what about the
type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{

"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},

"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com is indexed as two terms: "elasticsearch"
and "googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));

searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting
empty result where as i hava data in email with that text.
in other word combination with email is not giving any result if i
am applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.comwrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

smhdiu · November 14, 2012, 12:31pm

Igor !
Thanks a Ton!
Really appreciated your response. It was really helpful.
will try single index approach for now as u suggested.
Thanks again.

On Tuesday, 13 November 2012 13:06:10 UTC, Igor Motov wrote:

Documents of all types are indexed into the same index. There is special
_typehttp://www.elasticsearch.org/guide/reference/mapping/type-field.htmlfield that is indexed and used to query documents of a certain type. So,
using different types for different departments doesn't give you any
performance advantages comparing to indexing departmentid as not_analyzed
field, but there is some overhead of handling additional types.

If you want to limit your searches to a single department you will need to
create one index per department or consider using departmentid as a routing
value. My advice, however, would be to just use a single index for now, get
more comfortable with elasticsearch, and optimize it later, when it will
become an issue and you will have some real data and queries to test
different optimizations on.

On Tuesday, November 13, 2012 7:27:39 AM UTC-5, mohsin husen wrote:

Hello Igor,

Thanks for the revert back.
i am lil bit confused in creating the type, i have following situation.

i have book database which has books of different department like
Computer, Account etc..
now each department has millions of books. so what I thought was like
creating the main index as books and then divide the department as type so
while searching particular book in department we can only go to specific
department id(type) rather than searching entire index of books.

so just curious about the approach,
will it degrade more performance in compare to the single type?

Thanks in advanced
Mohsin

On Tuesday, 13 November 2012 11:49:04 UTC, Igor Motov wrote:

Why cannot you just use a field for departmentid instead of creating a
whole new type? Type is somewhat heavy construct because of mappings that
are created, maintained and sent between nodes.

On Tuesday, November 13, 2012 5:41:18 AM UTC-5, mohsin husen wrote:

Sorry for the confusion Igor.
suppose i have a index as books and their types are departmentid.

so if i want to create the mapping for bookid bookname price author
author_email etc like below

bookid - string , lowercase
bookname - string, lowercase
price - long
author - string lowercase
author_email - string, uax_url_email tokenizer

now problem with this mapping is that it is asking for the type while
creating the mapping and settings.
if i will create mapping with departmentid - 1 then whatever goes in
that department has those mappings and settings which we created.

but now when we add data to second depatmentid -2 it takes by default
values. which is quite right as we havent created mappings and settings for
this type.

so my question is how to create the mapping and the settings that can
be available for all the types by default.

let me know if you still have doubt.

Thanks in advanced.

On Monday, 12 November 2012 23:22:52 UTC, Igor Motov wrote:

Sorry, I don't think I quite understand what you are trying
to achieve here. If you are looking for some automation in assigning
mappings to fields, take a look at Dynamic Templateshttp://www.elasticsearch.org/guide/reference/mapping/root-object-type.htmland Index
Templates APIhttp://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html
.

On Monday, November 12, 2012 12:43:45 PM UTC-5, mohsin husen wrote:

Thanks Igor and Ivan...
with your guidence i creted the mappings as follows
curl -XPUT 'http://localhost:9200/test/?pretty=1' -d ' { "mappings" : { "1" : { // this is for type 1 "properties" : { "authorOfAbstracts" : { "type" : "string" }, "firstName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "lastName" : { "type" : "string","search_analyzer" : "name_analyzer","index_analyzer" : "name_analyzer"}, "email" : { "type" : "string", "search_analyzer" : "email_analyzer", "index_analyzer" : "email_analyzer" } } } }, "settings" : { "analysis" : { "analyzer" : { "name_analyzer" : { "filter" : [ "standard", "lowercase", "asciifolding" ], "type" : "custom", "tokenizer" : "standard" }, "email_analyzer" : { "type" : "custom", "tokenizer" : "uax_url_email" }
} }} }'

now problem with this is when i am adding data to index=test and
type=1, its working fine but when i m inserting the data to type=2 its not
working as expected to be work with email tokenizer.

its shows following mapping... in which we can clearly see for type=1
data will be indexed and searched as per the mapping but what about the
type=2 ?
can we achieve same mapping in all the type in one index?

{"test":{

"2":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string"},"firstName":{"type":"string"},"lastName":{"type":"string"}}},

"1":{"properties":{"authorOfAbstracts":{"type":"string"},"email":{"type":"string","analyzer":"email_analyzer"},"firstName":{"type":"string","analyzer":"name_analyzer"},"lastName":{"type":"string","analyzer":"name_analyzer"}}}
}}

Thanks
Mohsin

On Friday, 9 November 2012 19:48:11 UTC, Igor Motov wrote:

Standard analyzer splits emails addresses into terms. So
elasti...@googlegroups.com is indexed as two terms: "elasticsearch"
and "googlegroups.com". The email filter that you
use FilterBuilders.termFilter("email",email) it searching for the entire
email as a single term. There are two ways to solve this problem. One is to
use uax_url_email (you can find an example here
http://stackoverflow.com/questions/13173185/not-analyzed-is-not-working-as-expected · GitHub) Another solution is to replace
termFilter with queryFilter(matchPhraseQuery("email", email)).

On Thursday, November 8, 2012 4:54:05 AM UTC-5, mohsin husen wrote:

just to add that i am using default analyzer.
is it happening due to analyzer ?

please please advise

Thanks
Mohsin

On Wednesday, 7 November 2012 10:15:34 UTC, mohsin husen wrote:

Thanks Ivan for your reply.
i have following java code for fetching the data from elasticsearch

SearchRequestBuilder searchRequestBuilder =
client.prepareSearch(indexName);
searchRequestBuilder.setSearchType(SearchType.DEFAULT);
searchRequestBuilder.setQuery(boolQuery()
.should(textQuery("email", email))
.should(textQuery("firstName", firstName))
.should(textQuery("lastName", lastName))
.should(textQuery("keyword", keyword)));

searchRequestBuilder.setFilter(FilterBuilders.andFilter().add(FilterBuilders.
termFilter("email",email)));

searchRequestBuilder.setFrom(0).setSize(40).setExplain(true);
SearchResponse response = null;
response = searchRequestBuilder.execute().actionGet();

problem with filter with email here.
if i have only email or email with any other field i am getting
empty result where as i hava data in email with that text.
in other word combination with email is not giving any result if i
am applying the filter.

please suggest accordingly

Thanks in advanced

On Tuesday, 6 November 2012 23:22:45 UTC, Ivan Brusic wrote:

The easiest way to to create a custom analyzer that utilizes the
uxl_url_email tokenizer and map that analyzer to a field. When that field
is indexed/searched, the analyzer would be used.

--
Ivan

On Tue, Nov 6, 2012 at 5:18 AM, mohsin husen smh...@gmail.comwrote:

Hello

can we do uax_url_email tokenizer in java api ?
if so then how to perform.
please suggest.

Thanks in advanced
Mohsin H

--

--

Topic		Replies	Views
[.NET] UAX URL Email Tokenizer Issue Elasticsearch	2	876	December 27, 2017
Search Analyzer Not Working Elasticsearch	1	763	July 6, 2017
UAX URL Email Tokenizer not working Elasticsearch	3	509	April 30, 2020
The tokenizer "uax_url_email" doesn't work Elasticsearch	3	513	July 5, 2017
Uax_url_email tokenizer unexpected result Elasticsearch	3	438	April 17, 2019

Uax_url_email tokenizer

Related topics