Mapping, Searching, Ordering


(Nathan Ekstrom) #1

I'm new to elasticsearch and lucene and am rather confused on a couple of
points.

First when creating a mapping what does the option "index" with either a
value of "analyzed" or "not_analyzed" do? How does it affect the index.
Next I have the following mapping. Below are just the properties

{
'path': {
"type": "multi_field",
"fields": {
"path": {"type": "string", "index": "not_analyzed"},
"sub_path": {"type": "string", "index": "not_analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
"extension": {"type": "string", "index": "not_analyzed"},
},
},
'owner': {
"type": "multi_field",
"fields": {
"owner": {"type": "integer", "index": "analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
},
},
'data_store': {
"type": "integer",
"index": "analyzed",
},
'size': {
"type": "integer",
"index": "analyzed",
},
'sharing': {
"type": "object",
"properties": {
"name": {"type": "string", "index": "not_analyzed"},
"id": {"type": "integer", "index": "analyzed"},
"permission": {"type": "integer", "index": "analyzed"},
},
},
}

When I try to sort on path or path.sub_path it throws an error saying,

"java.io.IOException: Can't sort on string types with more than one value
per doc, or more than one token per field"

path.sub_path is an array of strings but path is just a string. I don't
understand what I'm doing wrong. If I try to sort on size it doesn't throw
an error but doesn't sort it either.

My sort looks like

[{"size": {"order": "desc"},},] <--- Doesn't throw an error but doesn't sort
either

[{"path": {"order":"desc"},},] <--- Causes the error to be thrown

Any advice or help is appreciated. I've tried reading the docs but I feel
like I'm missing some foundational pieces that would explain this. So if
there are any blog posts or articles covering basics/foundations that
someone can point me to I would appreciate it.

Thanks,

Nathan


(Shay Banon) #2

The fact that path.sub_path is an array of values, means it has more than one value, and you can't sort on a field with more than one value.

Its the same thing with analyzed, if the analysis process generates more than one token for a field, then it has several values.

The analyze option of index means that the field will be tokenized into tokens and one will be able to search on its tokenized form. This is how search engines work, you take text, break it down into tokens, and then you can search across those tokens. This process is the analysis process of a field. If you specify the field as not_analyzed, it means that the field will be indexed and searchable, but, it will not be tokenized, and its value will be considered as a single token.
On Thursday, March 24, 2011 at 6:26 AM, Nathan Ekstrom wrote:

I'm new to elasticsearch and lucene and am rather confused on a couple of points.

First when creating a mapping what does the option "index" with either a value of "analyzed" or "not_analyzed" do? How does it affect the index. Next I have the following mapping. Below are just the properties

{
'path': {
"type": "multi_field",
"fields": {
"path": {"type": "string", "index": "not_analyzed"},
"sub_path": {"type": "string", "index": "not_analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
"extension": {"type": "string", "index": "not_analyzed"},
},
},
'owner': {
"type": "multi_field",
"fields": {
"owner": {"type": "integer", "index": "analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
},
},
'data_store': {
"type": "integer",
"index": "analyzed",
},
'size': {
"type": "integer",
"index": "analyzed",
},
'sharing': {
"type": "object",
"properties": {
"name": {"type": "string", "index": "not_analyzed"},
"id": {"type": "integer", "index": "analyzed"},
"permission": {"type": "integer", "index": "analyzed"},
},
},
}

When I try to sort on path or path.sub_path it throws an error saying,

"java.io.IOException: Can't sort on string types with more than one value per doc, or more than one token per field"

path.sub_path is an array of strings but path is just a string. I don't understand what I'm doing wrong. If I try to sort on size it doesn't throw an error but doesn't sort it either.

My sort looks like

[{"size": {"order": "desc"},},] <--- Doesn't throw an error but doesn't sort either

[{"path": {"order":"desc"},},] <--- Causes the error to be thrown

Any advice or help is appreciated. I've tried reading the docs but I feel like I'm missing some foundational pieces that would explain this. So if there are any blog posts or articles covering basics/foundations that someone can point me to I would appreciate it.

Thanks,

Nathan


(Nathan Ekstrom) #3

So to sort on a string it needs to be not_analyzed and not an array. For
others who may have the same question I found this helped
http://wiki.apache.org/lucene-java/ConceptsAndDefinitions.

Thanks,

Nathan

On Thu, Mar 24, 2011 at 4:28 AM, Shay Banon shay.banon@elasticsearch.comwrote:

The fact that path.sub_path is an array of values, means it has more than
one value, and you can't sort on a field with more than one value.

Its the same thing with analyzed, if the analysis process generates more
than one token for a field, then it has several values.

The analyze option of index means that the field will be tokenized into
tokens and one will be able to search on its tokenized form. This is how
search engines work, you take text, break it down into tokens, and then you
can search across those tokens. This process is the analysis process of a
field. If you specify the field as not_analyzed, it means that the field
will be indexed and searchable, but, it will not be tokenized, and its value
will be considered as a single token.

On Thursday, March 24, 2011 at 6:26 AM, Nathan Ekstrom wrote:

I'm new to elasticsearch and lucene and am rather confused on a couple of
points.

First when creating a mapping what does the option "index" with either a
value of "analyzed" or "not_analyzed" do? How does it affect the index.
Next I have the following mapping. Below are just the properties

{
'path': {
"type": "multi_field",
"fields": {
"path": {"type": "string", "index": "not_analyzed"},
"sub_path": {"type": "string", "index": "not_analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
"extension": {"type": "string", "index": "not_analyzed"},
},
},
'owner': {
"type": "multi_field",
"fields": {
"owner": {"type": "integer", "index": "analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
},
},
'data_store': {
"type": "integer",
"index": "analyzed",
},
'size': {
"type": "integer",
"index": "analyzed",
},
'sharing': {
"type": "object",
"properties": {
"name": {"type": "string", "index": "not_analyzed"},
"id": {"type": "integer", "index": "analyzed"},
"permission": {"type": "integer", "index": "analyzed"},
},
},
}

When I try to sort on path or path.sub_path it throws an error saying,

"java.io.IOException: Can't sort on string types with more than one value
per doc, or more than one token per field"

path.sub_path is an array of strings but path is just a string. I don't
understand what I'm doing wrong. If I try to sort on size it doesn't throw
an error but doesn't sort it either.

My sort looks like

[{"size": {"order": "desc"},},] <--- Doesn't throw an error but doesn't
sort either

[{"path": {"order":"desc"},},] <--- Causes the error to be thrown

Any advice or help is appreciated. I've tried reading the docs but I feel
like I'm missing some foundational pieces that would explain this. So if
there are any blog posts or articles covering basics/foundations that
someone can point me to I would appreciate it.

Thanks,

Nathan


(system) #4