Elasticsearch splits field by space on facets


(pere roca ristol) #1

I am tryingo to do a simple facet request over a field having more than a
simple word (simply 'Name1 Name2', sometimes with dots and commas inside)
but what I get is...

"terms" : [{
"term" : "Name1",
"count" : 15
},
{
"term" : "Name2",
"count" : 15
}]

so my field is splitted by spaces and then runs the facet request...

Query example:

curl -XGET http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true
-d '{ "query": { "query_string" :{ "fields" : ["dataset"], "query": "2",
"default_operator" : "AND" } }, "facets": { "test": { "terms":
{ "field" :["speciesName"],"size" : 50000 } } } }'

thanks in advance

--


(Rio) #2

So you want to get the facets returned based on the full key field?

I'm actually try to do the same thing and didn't get any response:
https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/TEcXc4UhXTI

Hopefully someone has some helpful suggestions.

-Mario

--


(Ivan Brusic) #3

The speciesName field needs to be not analyzed. All the tokens are
tokens are appearing because the field is being analyzed.

--
Ivan

On Tue, Sep 11, 2012 at 7:14 AM, pere roca ristol peroc79@gmail.com wrote:

I am tryingo to do a simple facet request over a field having more than a
simple word (simply 'Name1 Name2', sometimes with dots and commas inside)
but what I get is...

"terms" : [{
"term" : "Name1",
"count" : 15
},
{
"term" : "Name2",
"count" : 15
}]

so my field is splitted by spaces and then runs the facet request...

Query example:

curl -XGET
http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true
-d '{ "query": { "query_string" :{ "fields" : ["dataset"], "query": "2",
"default_operator" : "AND" } }, "facets": { "test": { "terms":
{ "field" :["speciesName"],"size" : 50000 } } } }'

thanks in advance

--

--


(pere roca ristol) #4

thanks Ivan.

In fact I need it to be analyzed for other operations. So I think I
should apply the multi-field http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html

I'am afraid i have no alternative than re-constructing the data with this
multi-field option or there is some way to avoid that?

thanks,

Pere

On Tuesday, 11 September 2012 17:02:02 UTC+2, Ivan Brusic wrote:

The speciesName field needs to be not analyzed. All the tokens are
tokens are appearing because the field is being analyzed.

--
Ivan

On Tue, Sep 11, 2012 at 7:14 AM, pere roca ristol <per...@gmail.com<javascript:>>
wrote:

I am tryingo to do a simple facet request over a field having more than
a
simple word (simply 'Name1 Name2', sometimes with dots and commas
inside)
but what I get is...

"terms" : [{
"term" : "Name1",
"count" : 15
},
{
"term" : "Name2",
"count" : 15
}]

so my field is splitted by spaces and then runs the facet request...

Query example:

curl -XGET
http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true
-d '{ "query": { "query_string" :{ "fields" : ["dataset"], "query": "2",
"default_operator" : "AND" } }, "facets": { "test": { "terms":
{ "field" :["speciesName"],"size" : 50000 } } } }'

thanks in advance

--

--


(Ivan Brusic) #5

Pere,

Your use case is exactly why multi-field is used: the field needs to
be analyzed for some situations and not analyzed for others.

You can simply create another field in your indexing code instead of
using ElasticSearch's built-in multi-field, but in either way you
would need to reindex your data.

--
Ivan

On Tue, Sep 11, 2012 at 8:28 AM, pere roca ristol peroc79@gmail.com wrote:

thanks Ivan.

In fact I need it to be analyzed for other operations. So I think I should
apply the multi-field

I'am afraid i have no alternative than re-constructing the data with this
multi-field option or there is some way to avoid that?

thanks,

Pere

On Tuesday, 11 September 2012 17:02:02 UTC+2, Ivan Brusic wrote:

The speciesName field needs to be not analyzed. All the tokens are
tokens are appearing because the field is being analyzed.

--
Ivan

On Tue, Sep 11, 2012 at 7:14 AM, pere roca ristol per...@gmail.com
wrote:

I am tryingo to do a simple facet request over a field having more than
a
simple word (simply 'Name1 Name2', sometimes with dots and commas
inside)
but what I get is...

"terms" : [{
"term" : "Name1",
"count" : 15
},
{
"term" : "Name2",
"count" : 15
}]

so my field is splitted by spaces and then runs the facet request...

Query example:

curl -XGET
http://my_server:9200/idx_occurrence/Occurrence/_search?pretty=true
-d '{ "query": { "query_string" :{ "fields" : ["dataset"], "query": "2",
"default_operator" : "AND" } }, "facets": { "test": { "terms":
{ "field" :["speciesName"],"size" : 50000 } } } }'

thanks in advance

--

--

--


(system) #6