The Analyzers and Terms

Really got confused. Actually I want to the relationship between the analyzer and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to separate the words, so there should be 2 terms, right? Can anyone explain me on this?

Your concern here is that you create an analyzer but you did not apply it to your jdbc type.
So, Elasticsearch uses the default mapping.

Use the PUT Mapping API: http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html

Le 30 janv. 2013 à 10:10, Curt Hu zhongting.hu@gmail.com a écrit :

Really got confused. Actually I want to the relationship between the analyzer
and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to
separate the words, so there should be 2 terms, right? Can anyone explain me
on this?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi, dadoonet
Thanks very much for your rely, I still got a little confused on this.
I need to apply the analyzer to the specific type? (for me example, is jdbc),
How could I do that by put mapping api? Thanks.

Your concern here is that you create an analyzer but you did not apply it to your jdbc type.
So, Elasticsearch uses the default mapping.

Use the PUT Mapping API: http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html

Le 30 janv. 2013 à 10:10, Curt Hu <zhongting.hu@> a écrit :

Really got confused. Actually I want to the relationship between the analyzer
and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to
separate the words, so there should be 2 terms, right? Can anyone explain me
on this?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

What you did not understand in the link I mentioned before?

Did you try it? What command did you send and what answer do you have?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 1 févr. 2013 à 11:35, Curt Hu zhongting.hu@gmail.com a écrit :

Hi, dadoonet
Thanks very much for your rely, I still got a little confused on this.
I need to apply the analyzer to the specific type? (for me example, is
jdbc),
How could I do that by put mapping api? Thanks.

dadoonet wrote

Your concern here is that you create an analyzer but you did not apply it
to your jdbc type.
So, Elasticsearch uses the default mapping.

Use the PUT Mapping API:
http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html

Le 30 janv. 2013 à 10:10, Curt Hu <

zhongting.hu@

> a écrit :

Really got confused. Actually I want to the relationship between the
analyzer
and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to
separate the words, so there should be 2 terms, right? Can anyone explain
me
on this?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.
For more options, visit https://groups.google.com/groups/opt_out.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003p4029172.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.