Faceted search grouped by full term

Hi,
I've been working with ES in the last two months and each day I discover new fantastic features.

I would like to know if it it possible to implement the following case:

I have many docs indexed in the ES server, with a location field that stores location information (locality, region and more). One example of the location field (type string) is:
location: 'Brooklyn, New York, USA'

I would like to do a faceted search and have those results grouped by location information. However if I do the following request:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "location"],
"query": "xxx"
}
},
"facets": {
"location": {
"terms": {
"field" : "location"
}
}
}
}'

I do obtain the results grouped by location, but by each of the terms in location field, this is:

{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New",
"count" : 4
}, {
"term" : "York",
"count" : 4
}, {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

If I declare in the mapping the location field as not analyzed, then the whole field is taken as a facet.
I would like an intermediate solution, where the field is analyzed but instead of splitting the term by whitespace, splittiing it by a comma, this is, I would like to obtain a reponse similar to:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New York",
"count" : 4
} {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

I have been searching for this in the ES guide, and maybe setting an appropriate analyzer could make it but I cant guess how to do it, even if it is a correct approach.
I appreciate any help or guidance. Thanks in advance!
Tania

Heya,

You need to create a custom analyzer to split data by commas. One such
analyzer can be built using the pattern

analyzer,
or maybe the pattern tokenizer, and your own custom token filters.

On Wed, Jul 20, 2011 at 12:42 PM, Tania yosoythania@hotmail.com wrote:

Hi,
I've been working with ES in the last two months and each day I discover
new
fantastic features.

I would like to know if it it possible to implement the following case:

I have many docs indexed in the ES server, with a location field that
stores
location information (locality, region and more). One example of the
location field (type string) is:
location: 'Brooklyn, New York, USA'

I would like to do a faceted search and have those results grouped by
location information. However if I do the following request:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "location"],
"query": "xxx"
}
},
"facets": {
"location": {
"terms": {
"field" : "location"
}
}
}
}'

I do obtain the results grouped by location, but by each of the terms in
location field, this is:

{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New",
"count" : 4
}, {
"term" : "York",
"count" : 4
}, {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

If I declare in the mapping the location field as not analyzed, then the
whole field is taken as a facet.
I would like an intermediate solution, where the field is analyzed but
instead of splitting the term by whitespace, splittiing it by a comma, this
is, I would like to obtain a reponse similar to:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New York",
"count" : 4
} {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

I have been searching for this in the ES guide, and maybe setting an
appropriate analyzer could make it but I cant guess how to do it, even if
it
is a correct approach.
I appreciate any help or guidance. Thanks in advance!
Tania

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-grouped-by-full-term-tp3185007p3185007.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Thanks! I did what you explain and it worked!
In case anyone needs it, I explain briefly how I solved it.
While the index is being created, to especify the special mappings needed, you can also define the analyzers linked to this index.
In this case:

curl -POST 'localhost:9200/test' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "pattern",
"pattern":","
}
}
}
},
"mappings" : {
"photos" : {
"properties" : {
"tags" : {
"type" : "string",
"analyzer": "comma"
}
}
}
}
}'

After setting the index mappings and settings specifying that the "tags" field is anayzed using the pattern analyzer called "comma" (consists on a comma), the search and facet tasks and so on consider that the token to analyze this strings is a comma, instead of whitespace.

Many thanks kimchy!

Date: Sat, 23 Jul 2011 19:28:21 -0700

Heya,

You need to create a custom analyzer to split data by commas. One such analyzer can be built using the pattern http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer.html analyzer, or maybe the pattern tokenizer, and your own custom token filters.

On Wed, Jul 20, 2011 at 12:42 PM, Tania <[hidden email]> wrote:

Hi,

I've been working with ES in the last two months and each day I discover new

fantastic features.

I would like to know if it it possible to implement the following case:

I have many docs indexed in the ES server, with a location field that stores

location information (locality, region and more). One example of the

location field (type string) is:

location: 'Brooklyn, New York, USA'

I would like to do a faceted search and have those results grouped by

location information. However if I do the following request:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{

"query": {

"query_string" :{

  "fields" : ["title", "description", "location"],

  "query": "xxx"

}

},

"facets": {

"location": {

  "terms": {

    "field" : "location"

  }

}

}

}'

I do obtain the results grouped by location, but by each of the terms in

location field, this is:

{

"took" : 6,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : { ...

},

"facets" : {

"location" : {

  "_type" : "terms",

  "missing" : 15,

  "terms" : [ {

    "term" : "USA",

    "count" : 10

  }, {

    "term" : "*New*",

    "count" : 4

  }, {

    "term" : "*York*",

    "count" : 4

  }, {

    "term" : "Brooklyn",

    "count" : 2

  },]

}

}

}

If I declare in the mapping the location field as not analyzed, then the

whole field is taken as a facet.

I would like an intermediate solution, where the field is analyzed but

instead of splitting the term by whitespace, splittiing it by a comma, this

is, I would like to obtain a reponse similar to:

{

"took" : 6,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : { ...

},

"facets" : {

"location" : {

  "_type" : "terms",

  "missing" : 15,

  "terms" : [ {

    "term" : "USA",

    "count" : 10

  }, {

    "term" : "*New York*",

    "count" : 4

  } {

    "term" : "Brooklyn",

    "count" : 2

  },]

}

}

}

I have been searching for this in the ES guide, and maybe setting an

appropriate analyzer could make it but I cant guess how to do it, even if it

is a correct approach.

I appreciate any help or guidance. Thanks in advance!

Tania

--

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-grouped-by-full-term-tp3185007p3185007.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

	If you reply to this email, your message will be added to the discussion below:
	http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-grouped-by-full-term-tp3185007p3194581.html


	
	To unsubscribe from Faceted search grouped by full term, click here.