Faceted search grouped by full term


(Tania) #1

Hi,
I've been working with ES in the last two months and each day I discover new fantastic features.

I would like to know if it it possible to implement the following case:

I have many docs indexed in the ES server, with a location field that stores location information (locality, region and more). One example of the location field (type string) is:
location: 'Brooklyn, New York, USA'

I would like to do a faceted search and have those results grouped by location information. However if I do the following request:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "location"],
"query": "xxx"
}
},
"facets": {
"location": {
"terms": {
"field" : "location"
}
}
}
}'

I do obtain the results grouped by location, but by each of the terms in location field, this is:

{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New",
"count" : 4
}, {
"term" : "York",
"count" : 4
}, {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

If I declare in the mapping the location field as not analyzed, then the whole field is taken as a facet.
I would like an intermediate solution, where the field is analyzed but instead of splitting the term by whitespace, splittiing it by a comma, this is, I would like to obtain a reponse similar to:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New York",
"count" : 4
} {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

I have been searching for this in the ES guide, and maybe setting an appropriate analyzer could make it but I cant guess how to do it, even if it is a correct approach.
I appreciate any help or guidance. Thanks in advance!
Tania


(Shay Banon) #2

Heya,

You need to create a custom analyzer to split data by commas. One such
analyzer can be built using the pattern
http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer.html
analyzer,
or maybe the pattern tokenizer, and your own custom token filters.

On Wed, Jul 20, 2011 at 12:42 PM, Tania yosoythania@hotmail.com wrote:

Hi,
I've been working with ES in the last two months and each day I discover
new
fantastic features.

I would like to know if it it possible to implement the following case:

I have many docs indexed in the ES server, with a location field that
stores
location information (locality, region and more). One example of the
location field (type string) is:
location: 'Brooklyn, New York, USA'

I would like to do a faceted search and have those results grouped by
location information. However if I do the following request:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{
"query": {
"query_string" :{
"fields" : ["title", "description", "location"],
"query": "xxx"
}
},
"facets": {
"location": {
"terms": {
"field" : "location"
}
}
}
}'

I do obtain the results grouped by location, but by each of the terms in
location field, this is:

{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New",
"count" : 4
}, {
"term" : "York",
"count" : 4
}, {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

If I declare in the mapping the location field as not analyzed, then the
whole field is taken as a facet.
I would like an intermediate solution, where the field is analyzed but
instead of splitting the term by whitespace, splittiing it by a comma, this
is, I would like to obtain a reponse similar to:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : { ...
},
"facets" : {
"location" : {
"_type" : "terms",
"missing" : 15,
"terms" : [ {
"term" : "USA",
"count" : 10
}, {
"term" : "New York",
"count" : 4
} {
"term" : "Brooklyn",
"count" : 2
},]
}
}
}

I have been searching for this in the ES guide, and maybe setting an
appropriate analyzer could make it but I cant guess how to do it, even if
it
is a correct approach.
I appreciate any help or guidance. Thanks in advance!
Tania

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-grouped-by-full-term-tp3185007p3185007.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(Tania) #3

Thanks! I did what you explain and it worked!
In case anyone needs it, I explain briefly how I solved it.
While the index is being created, to especify the special mappings needed, you can also define the analyzers linked to this index.
In this case:

curl -POST 'localhost:9200/test' -d '
{
"settings":{
"analysis": {
"analyzer": {
"comma":{
"type": "pattern",
"pattern":","
}
}
}
},
"mappings" : {
"photos" : {
"properties" : {
"tags" : {
"type" : "string",
"analyzer": "comma"
}
}
}
}
}'

After setting the index mappings and settings specifying that the "tags" field is anayzed using the pattern analyzer called "comma" (consists on a comma), the search and facet tasks and so on consider that the token to analyze this strings is a comma, instead of whitespace.

Many thanks kimchy!

Date: Sat, 23 Jul 2011 19:28:21 -0700

Heya,

You need to create a custom analyzer to split data by commas. One such analyzer can be built using the pattern http://www.elasticsearch.org/guide/reference/index-modules/analysis/pattern-analyzer.html analyzer, or maybe the pattern tokenizer, and your own custom token filters.

On Wed, Jul 20, 2011 at 12:42 PM, Tania <[hidden email]> wrote:

Hi,

I've been working with ES in the last two months and each day I discover new

fantastic features.

I would like to know if it it possible to implement the following case:

I have many docs indexed in the ES server, with a location field that stores

location information (locality, region and more). One example of the

location field (type string) is:

location: 'Brooklyn, New York, USA'

I would like to do a faceted search and have those results grouped by

location information. However if I do the following request:

curl -XGET http://localhost:9200/test/_search?pretty=true -d '{

"query": {

"query_string" :{

  "fields" : ["title", "description", "location"],

  "query": "xxx"

}

},

"facets": {

"location": {

  "terms": {

    "field" : "location"

  }

}

}

}'

I do obtain the results grouped by location, but by each of the terms in

location field, this is:

{

"took" : 6,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : { ...

},

"facets" : {

"location" : {

  "_type" : "terms",

  "missing" : 15,

  "terms" : [ {

    "term" : "USA",

    "count" : 10

  }, {

    "term" : "*New*",

    "count" : 4

  }, {

    "term" : "*York*",

    "count" : 4

  }, {

    "term" : "Brooklyn",

    "count" : 2

  },]

}

}

}

If I declare in the mapping the location field as not analyzed, then the

whole field is taken as a facet.

I would like an intermediate solution, where the field is analyzed but

instead of splitting the term by whitespace, splittiing it by a comma, this

is, I would like to obtain a reponse similar to:

{

"took" : 6,

"timed_out" : false,

"_shards" : {

"total" : 5,

"successful" : 5,

"failed" : 0

},

"hits" : { ...

},

"facets" : {

"location" : {

  "_type" : "terms",

  "missing" : 15,

  "terms" : [ {

    "term" : "USA",

    "count" : 10

  }, {

    "term" : "*New York*",

    "count" : 4

  } {

    "term" : "Brooklyn",

    "count" : 2

  },]

}

}

}

I have been searching for this in the ES guide, and maybe setting an

appropriate analyzer could make it but I cant guess how to do it, even if it

is a correct approach.

I appreciate any help or guidance. Thanks in advance!

Tania

--

View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-grouped-by-full-term-tp3185007p3185007.html

Sent from the ElasticSearch Users mailing list archive at Nabble.com.

	If you reply to this email, your message will be added to the discussion below:
	http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-grouped-by-full-term-tp3185007p3194581.html


	
	To unsubscribe from Faceted search grouped by full term, click here.

(system) #4