The Analyzers and Terms

Curt_Hu · January 30, 2013, 9:10am

Really got confused. Actually I want to the relationship between the analyzer and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": [],
"should": []
}
},
"from": 0,
"size": 50,
"sort": [],
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to separate the words, so there should be 2 terms, right? Can anyone explain me on this?

dadoonet · January 30, 2013, 9:14am

Your concern here is that you create an analyzer but you did not apply it to your jdbc type.
So, Elasticsearch uses the default mapping.

Use the PUT Mapping API: Elasticsearch Platform — Find real-time answers at scale | Elastic

Le 30 janv. 2013 à 10:10, Curt Hu zhongting.hu@gmail.com a écrit :

Really got confused. Actually I want to the relationship between the analyzer
and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to
separate the words, so there should be 2 terms, right? Can anyone explain me
on this?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Curt_Hu · February 1, 2013, 10:35am

Hi, dadoonet
Thanks very much for your rely, I still got a little confused on this.
I need to apply the analyzer to the specific type? (for me example, is jdbc),
How could I do that by put mapping api? Thanks.

Your concern here is that you create an analyzer but you did not apply it to your jdbc type.
So, Elasticsearch uses the default mapping.

Use the PUT Mapping API: Elasticsearch Platform — Find real-time answers at scale | Elastic

Le 30 janv. 2013 à 10:10, Curt Hu <zhongting.hu@> a écrit :

Really got confused. Actually I want to the relationship between the analyzer
and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to
separate the words, so there should be 2 terms, right? Can anyone explain me
on this?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@.
For more options, visit https://groups.google.com/groups/opt_out.

dadoonet · February 1, 2013, 11:01pm

What you did not understand in the link I mentioned before?

Did you try it? What command did you send and what answer do you have?

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 1 févr. 2013 à 11:35, Curt Hu zhongting.hu@gmail.com a écrit :

Hi, dadoonet
Thanks very much for your rely, I still got a little confused on this.
I need to apply the analyzer to the specific type? (for me example, is
jdbc),
How could I do that by put mapping api? Thanks.

dadoonet wrote

Your concern here is that you create an analyzer but you did not apply it
to your jdbc type.
So, Elasticsearch uses the default mapping.

Use the PUT Mapping API:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Le 30 janv. 2013 à 10:10, Curt Hu <

zhongting.hu@

> a écrit :

Really got confused. Actually I want to the relationship between the
analyzer
and terms.
I created a customer analyzer with a new index.
curl -XPOST 'http://localhost:9200/curt' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "custom",
"tokenizer": "commatokenizer"
}
},
"tokenizer": {
"commatokenizer":{
"type": "pattern",
"pattern": ","
}
}
}
}
}'

Just use comma to separate the token.
curl -XPUT 'http://localhost:9200/curt/jdbc/1' -d '{
"keywords" : "foo,bar baz"
}'

But When I do the following queries with terms facets:
{
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"must_not": ,
"should":
}
},
"from": 0,
"size": 50,
"sort": ,
"facets": {
"termscount": {
"terms": {
"field": "keywords",
"size": 25
}
}
}
}

Why I get the following outputs??
"facets" : {
"termscount" : {
"_type" : "terms",
"missing" : 0,
"total" : 3,
"other" : 0,
"terms" : [ {
"term" : "foo",
"count" : 1
}, {
"term" : "baz",
"count" : 1
}, {
"term" : "bar",
"count" : 1
} ]
}
}

So there are 3 terms? Since I already use my analyzer to use comma to
separate the words, so there should be 2 terms, right? Can anyone explain
me
on this?

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to

elasticsearch+unsubscribe@

.
For more options, visit https://groups.google.com/groups/opt_out.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/The-Analyzers-and-Terms-tp4029003p4029172.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Setting term separators, analyzers with java api Elasticsearch	5	350	July 6, 2017
Terms aggregation ignoring analyzers? Elasticsearch	4	458	June 1, 2018
Terms filter on analyzed field Elasticsearch	1	385	July 5, 2017
Mapping for facets Elasticsearch	5	316	July 6, 2017
String is tokenized in terms facet but shouldn't be Elasticsearch	3	402	July 6, 2017

The Analyzers and Terms

Related topics