Problem: Facets tokenize tags with spaces. Is there a solution?


(Royce) #1

Hi,

I'm using facets to do filters on search results. One tag, for
example, is a city name "Kansas City." The facet interprets "Kansas
City" as two separate counts, "Kansas" and "City".

How can I configure facets to recognize "Kansas City" as one tag?

Take care,

Royce


(Royce) #2

Similar topics:

http://groups.google.com/group/elasticsearch/browse_thread/thread/ec24d56db34275b1/ccdaa025a3bb2481?lnk=gst&q=facet+tokenize#ccdaa025a3bb2481

On Jan 11, 12:01 pm, Royce royce.hay...@gmail.com wrote:

Hi,

I'm using facets to do filters on search results. One tag, for
example, is a city name "Kansas City." The facet interprets "Kansas
City" as two separate counts, "Kansas" and "City".

How can I configure facets to recognize "Kansas City" as one tag?

Take care,

Royce


(Royce) #3

Let me be clear, that my tag isn't "Kansas City". Instead, it's
"location" and one of the cities that shows up is "Kansas City."

On Jan 11, 12:01 pm, Royce royce.hay...@gmail.com wrote:

Hi,

I'm using facets to do filters on search results. One tag, for
example, is a city name "Kansas City." The facet interprets "Kansas
City" as two separate counts, "Kansas" and "City".

How can I configure facets to recognize "Kansas City" as one tag?

Take care,

Royce


(Ivan Brusic) #4

The topic you referenced has the answer. If the field you are faceting
on is a string, then it needs to either be not analyzed or analyzed
with something like the KeywordAnalyzer which terms the term as a
single token. Can you gist the mapping you are using? In your example,
it appears that location is being analyzed and is indexed as two
tokens "Kansas" and "City", which is the default behavior. The facet
will treat the two tokens as unique terms.

http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html

--
Ivan

http://www.elasticsearch.org/guide/reference/api/search/facets/

On Wed, Jan 11, 2012 at 10:01 AM, Royce royce.haynes@gmail.com wrote:

Hi,

I'm using facets to do filters on search results. One tag, for
example, is a city name "Kansas City." The facet interprets "Kansas
City" as two separate counts, "Kansas" and "City".

How can I configure facets to recognize "Kansas City" as one tag?

Take care,

Royce


(Suraj) #5

Hi,

I have the same problem as with Royce. I have the following mappings:

curl -XPOST "http://localhost:9200/pictures" -d '
{
"mappings" : {
"pictures" : {
"properties" : {
"id": { "type": "string" },
"description": {"type": "string", "index": "not_analyzed"},
"featured": { "type": "boolean" },
"categories": { "type": "string", "index": "not_analyzed" },
"tags": { "type": "string", "index": "not_analyzed",
"analyzer": "keyword" },
"created_at": { "type": "double" }
}
}
}
}'

And My Data is:

curl -X POST "http://localhost:9200/pictures/picture" -d '{
"picture": {
"id": "4defe0ecf02a8724b8000047",
"title": "Victoria Secret PhotoShoot",
"description": "From France and Italy",
"featured": true,
"categories": [
"Fashion",
"Girls",
],
"tags": [
"girl",
"photoshoot",
"supermodel",
"Victoria Secret"
],
"created_at": 1405784416.04672
}
}'

And My Query is:
curl -X POST "http://localhost:9200/pictures/_search?pretty=true" -d '
{
"query": {
"text": {
"tags": {
"query": "Victoria Secret"
}
}
},
"facets": {
"tags": {
"terms": {
"field": "tags"
}
}
}
}'

The Output result is:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
},
"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 0,
"other" : 0,
"terms" : [ ]
}
}
}

So I got total 0 in facets and total: 0 in hits

Any Idea Why its not working? I know that when remove the keyword analyzer
from tags and make it "not_analyzed" then I get result but there is still a
problem of case insensitive.

Cheers!
Suraj

On Thursday, January 12, 2012 7:39:24 AM UTC+5:45, Ivan Brusic wrote:

The topic you referenced has the answer. If the field you are faceting
on is a string, then it needs to either be not analyzed or analyzed
with something like the KeywordAnalyzer which terms the term as a
single token. Can you gist the mapping you are using? In your example,
it appears that location is being analyzed and is indexed as two
tokens "Kansas" and "City", which is the default behavior. The facet
will treat the two tokens as unique terms.

http://www.elasticsearch.org/guide/reference/index-modules/analysis/keyword-analyzer.html

--
Ivan

http://www.elasticsearch.org/guide/reference/api/search/facets/

On Wed, Jan 11, 2012 at 10:01 AM, Royce <royce....@gmail.com <javascript:>>
wrote:

Hi,

I'm using facets to do filters on search results. One tag, for
example, is a city name "Kansas City." The facet interprets "Kansas
City" as two separate counts, "Kansas" and "City".

How can I configure facets to recognize "Kansas City" as one tag?

Take care,

Royce

--


(mohammad) #6

Hello everyone,
well i am new to elastic search and i am facing some similar difficulties as mentioned above. i tried implementing some of the suggested solution but to no avail.
I am posting part of codes and will be very grateful if somebody could help me out. Thanks in advance.

the codes are written in java:
// i have the following in the mapping part
CreateIndexRequestBuilder builder = client.admin().indices().prepareCreate(index)
.setSettings(ImmutableSettings.settingsBuilder().loadFromSource(configIndex));

	builder.addMapping("StatTest",  "{\n" + 
	"	\"StatTest\" : {\n" + 
	"		\"_all\" : { \n" + 
	"			\"analyzer\":\"francais\" \n" + 
	"		},\n" + 
	"		\"properties\" : {\n" + 
	"			\"idUser\" : {\"type\" : \"string\", \"analyzer\":\"francais\"},\n" +
	"			\"loginOfUser\" : {\"type\" : \"string\", \"analyzer\":\"francais\"},\n" + 
	"			\"nameOfUser\" : {\"type\" : \"string\", \"analyzer\":\"francais\"},\n" + 
	"		}\n" + 
	"	}\n" + 
	"}");	

//the sample data stored are the following
{idUser: "0121", loginOfUser: "login0121", nameOfUser :"mona lisa"},
{idUser: "0122", loginOfUser: "login0122", nameOfUser :"James Dean"},

//i am trying to get facets based upon name of user
//TermsFacetBuilder fb = FacetBuilders.termsFacet("idOfUser").field("loginOfUser");
TermsFacetBuilder fb = FacetBuilders.termsFacet("idOfUser").field("nameOfUser");
SearchRequestBuilder srb1 = client.prepareSearch().setIndices(index).addFacet(fb);
AndFilterBuilder myFilters = FilterBuilders.andFilter();
myFilters.add(FilterBuilders.termFilter("year", "2014"));
FilterBuilder fbBuilder = FilterBuilders.andFilter(myFilters);
FilteredQueryBuilder q = QueryBuilders.filteredQuery(QueryBuilders.matchAllQuery(),fbBuilder);
SearchResponse sr = srb1.setQuery(q).execute().actionGet();
TermsFacet f = (TermsFacet) sr.getFacets().facetsAsMap().get("idOfUser");
for (TermsFacet.Entry entry : f) {
String type = entry.getTerm().toString();
//System.out.println("....enter type : "+type);
//System.out.println("....enter entry.getCount() : "+entry.getCount());

		}

//problems faced whenever i am trying to do a facet based on login of user,
everything works well
the variable type returns :
login0121
login0122

however when i try to do a facet based on nameOfUser , the following is returned:
mona
lisa
James
Dean

/////
i want to retriev the usernames as one token only,
am i missing some codes somewhere
i will be very thankful if any one can help me on this
thanks in advance


(jsbonline2006) #7

Hi All,

Here is the solution for all of you:

  1. You have to define your facet as multi_field value as follows

"mappings": {
"data": {
"properties": {
"name": {
"type": "multi_field",
"fields": {
"name": {
"type": "string",
"index": "analyzed"
},
"untouched": {
"type": "string",
"index": "not_analyzed"
}
}
},

Here my "name" field is multi_field value. I can use "name" for searching
purpose and "name.untouched" for faceting purpose.

I was facing same issue earlier as you guys mentioned in above thread. and
then above mapping and usage helped me in resolving this issue

Regards,
Jayesh Bhoyar
http://www.linkedin.com/in/jayeshbhoyar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/20772e3f-2244-42e8-bf19-ac37c0efbaab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8