Terms facet is tokenizing a field with special characters


(jason) #1

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GET http://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.


(Shay Banon) #2

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).
On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GET http://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.


(jason) #3

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -

(Shay Banon) #4

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).
On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -

(jason) #5

Ok, I am doing this:

        	String mappings =

XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

			client.admin().indices().preparePutMapping().setType("type1")
				.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extractMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(jason) #6

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

                    String mappings =

XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

                            client.admin().indices().preparePutMapping().setType("type1")
                                    .setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(Shay Banon) #7

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.
On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(jason) #8

Shay,

It appears I didn't fix the error. I run the following:

     "query" : {
"match_all": {}
     },
     "facets" : {
           "url" : {
                     "terms" : {
                               "field" : "url"
                    }
           }
    }

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GET http://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(Shay Banon) #9

Please post a full recreation: http://www.elasticsearch.org/help (including putting the mapping and indexing data), simpler to help then.
On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GET http://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(jason) #10

Here is how I created index 'test' and put mappings for 'type1':

client.admin().indices().prepareCreate("test").execute().actionGet();

        	String mappings =

XContentFactory.jsonBuilder().startObject().startObject("type1").startObject("properties")
.startObject("url").field("type", "string").field("store",
"yes").field("index", "not_analyzed").endObject()
.startObject("timestamp").field("type",
"string").field("store", "yes").endObject()
.endObject().endObject().string();

client.admin().indices().preparePutMapping("test").setType("type1").setSource(mappings).execute().actionGet();

Then, I am writing data to the ES under 'test' index (note: url is
encrypted, where %3F corresponds to ?, %26 to &, and %3D to =.

                                 String url = "http%3A//

www.someurl.com/someservice%3FcityVegas%26State%3DNevada";
SimpleDateFormat sdf2 = new
SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Calendar cal =
Calendar.getInstance();
IndexResponse response = client.prepareIndex("test",
"type1").setSource(jsonBuilder()
.startObject()
.field("url", url)
.field("timestamp",
sdf2.format(cal.getTime()))
.endObject()
)
.execute()
.actionGet();

Here is my response to :>curl -X GET http://localhost:9200/test/type1/_mapping
(assuming I put more than one documents as above):

..."facets":{"url":{"_type":"terms","missing":0,"terms":
[{"term":"www.someurl.com","count":4},{"term":"photoflipper","count":
4},{"term":"http
,"count":4},{"term":"3fcity","count":4},{"term":"3a","count":4},
{"term":"26state","count":4},{"term":"26model","count":4},
{"term":"3dNevada","count":2}]}}}

On Mar 15, 12:00 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Please post a full recreation:http://www.elasticsearch.org/help(including putting the mapping and indexing data), simpler to help then.

On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GEThttp://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(Shay Banon) #11

Can you provide a curl recreation?
On Tuesday, March 15, 2011 at 9:54 AM, eugene wrote:

Here is how I created index 'test' and put mappings for 'type1':

client.admin().indices().prepareCreate("test").execute().actionGet();

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1").startObject("properties")
.startObject("url").field("type", "string").field("store",
"yes").field("index", "not_analyzed").endObject()
.startObject("timestamp").field("type",
"string").field("store", "yes").endObject()
.endObject().endObject().string();

client.admin().indices().preparePutMapping("test").setType("type1").setSource(mappings).execute().actionGet();

Then, I am writing data to the ES under 'test' index (note: url is
encrypted, where %3F corresponds to ?, %26 to &, and %3D to =.

String url = "http%3A//
www.someurl.com/someservice%3FcityVegas%26State%3DNevada";
SimpleDateFormat sdf2 = new
SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Calendar cal =
Calendar.getInstance();
IndexResponse response = client.prepareIndex("test",
"type1").setSource(jsonBuilder()
.startObject()
.field("url", url)
.field("timestamp",
sdf2.format(cal.getTime()))
.endObject()
)
.execute()
.actionGet();

Here is my response to :>curl -X GET http://localhost:9200/test/type1/_mapping
(assuming I put more than one documents as above):

..."facets":{"url":{"_type":"terms","missing":0,"terms":
[{"term":"www.someurl.com","count":4},{"term":"photoflipper","count":
4},{"term":"http
,"count":4},{"term":"3fcity","count":4},{"term":"3a","count":4},
{"term":"26state","count":4},{"term":"26model","count":4},
{"term":"3dNevada","count":2}]}}}

On Mar 15, 12:00 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Please post a full recreation:http://www.elasticsearch.org/help(including putting the mapping and indexing data), simpler to help then.

On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GEThttp://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(jason) #12

I created it using java. Could there a difference if I do this in
java instead of curl?

On Mar 15, 12:56 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a curl recreation?

On Tuesday, March 15, 2011 at 9:54 AM, eugene wrote:

Here is how I created index 'test' and put mappings for 'type1':

client.admin().indices().prepareCreate("test").execute().actionGet();

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1").startObjec­t("properties")
.startObject("url").field("type", "string").field("store",
"yes").field("index", "not_analyzed").endObject()
.startObject("timestamp").field("type",
"string").field("store", "yes").endObject()
.endObject().endObject().string();

client.admin().indices().preparePutMapping("test").setType("type1").setSour­ce(mappings).execute().actionGet();

Then, I am writing data to the ES under 'test' index (note: url is
encrypted, where %3F corresponds to ?, %26 to &, and %3D to =.

String url = "http%3A//
www.someurl.com/someservice%3FcityVegas%26State%3DNevada";
SimpleDateFormat sdf2 = new
SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Calendar cal =
Calendar.getInstance();
IndexResponse response = client.prepareIndex("test",
"type1").setSource(jsonBuilder()
.startObject()
.field("url", url)
.field("timestamp",
sdf2.format(cal.getTime()))
.endObject()
)
.execute()
.actionGet();

Here is my response to :>curl -X GEThttp://localhost:9200/test/type1/_mapping
(assuming I put more than one documents as above):

..."facets":{"url":{"_type":"terms","missing":0,"terms":
[{"term":"www.someurl.com","count":4},{"term":"photoflipper","count":
4},{"term":"http
,"count":4},{"term":"3fcity","count":4},{"term":"3a","count":4},
{"term":"26state","count":4},{"term":"26model","count":4},
{"term":"3dNevada","count":2}]}}}

On Mar 15, 12:00 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Please post a full recreation:http://www.elasticsearch.org/help(includingputting the mapping and indexing data), simpler to help then.

On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GEThttp://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(Shay Banon) #13

No, there isn't a difference, its just simpler to recreate it with curl. The REST API is built on top of the Java API.

If you have problems with curl, then ->gist<- a simple test case that recreates it with Java, I can give it a go as well.
On Tuesday, March 15, 2011 at 5:57 PM, eugene wrote:

I created it using java. Could there a difference if I do this in
java instead of curl?

On Mar 15, 12:56 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a curl recreation?

On Tuesday, March 15, 2011 at 9:54 AM, eugene wrote:

Here is how I created index 'test' and put mappings for 'type1':

client.admin().indices().prepareCreate("test").execute().actionGet();

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1").startObjec­t("properties")
.startObject("url").field("type", "string").field("store",
"yes").field("index", "not_analyzed").endObject()
.startObject("timestamp").field("type",
"string").field("store", "yes").endObject()
.endObject().endObject().string();

client.admin().indices().preparePutMapping("test").setType("type1").setSour­ce(mappings).execute().actionGet();

Then, I am writing data to the ES under 'test' index (note: url is
encrypted, where %3F corresponds to ?, %26 to &, and %3D to =.

String url = "http%3A//
www.someurl.com/someservice%3FcityVegas%26State%3DNevada";
SimpleDateFormat sdf2 = new
SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Calendar cal =
Calendar.getInstance();
IndexResponse response = client.prepareIndex("test",
"type1").setSource(jsonBuilder()
.startObject()
.field("url", url)
.field("timestamp",
sdf2.format(cal.getTime()))
.endObject()
)
.execute()
.actionGet();

Here is my response to :>curl -X GEThttp://localhost:9200/test/type1/_mapping
(assuming I put more than one documents as above):

..."facets":{"url":{"_type":"terms","missing":0,"terms":
[{"term":"www.someurl.com","count":4},{"term":"photoflipper","count":
4},{"term":"http
,"count":4},{"term":"3fcity","count":4},{"term":"3a","count":4},
{"term":"26state","count":4},{"term":"26model","count":4},
{"term":"3dNevada","count":2}]}}}

On Mar 15, 12:00 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Please post a full recreation:http://www.elasticsearch.org/help(includingputting the mapping and indexing data), simpler to help then.

On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GEThttp://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(jason) #14

Here is how I created it with curl:

C:>curl -X PUT http://localhost:9200/test/type1/_mappings -d
@mappings.json
{"ok":true,"_index":"test","_type":"type2","_id":"_mappings","_version":
1}

where the contents of mappings.json file are as follows:

{
"type1" : {
"properties" : {
"url" : {"type" : "string", "store" : "yes", "index" :
"not_analyzed"},
"timestamp" : {"type" : "string", "store" : "yes"}
}
}
}

On Mar 15, 9:12 am, Shay Banon shay.ba...@elasticsearch.com wrote:

No, there isn't a difference, its just simpler to recreate it with curl. The REST API is built on top of the Java API.

If you have problems with curl, then ->gist<- a simple test case that recreates it with Java, I can give it a go as well.

On Tuesday, March 15, 2011 at 5:57 PM, eugene wrote:

I created it using java. Could there a difference if I do this in
java instead of curl?

On Mar 15, 12:56 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a curl recreation?

On Tuesday, March 15, 2011 at 9:54 AM, eugene wrote:

Here is how I created index 'test' and put mappings for 'type1':

client.admin().indices().prepareCreate("test").execute().actionGet();

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1").startObjec­­t("properties")
.startObject("url").field("type", "string").field("store",
"yes").field("index", "not_analyzed").endObject()
.startObject("timestamp").field("type",
"string").field("store", "yes").endObject()
.endObject().endObject().string();

client.admin().indices().preparePutMapping("test").setType("type1").setSour­­ce(mappings).execute().actionGet();

Then, I am writing data to the ES under 'test' index (note: url is
encrypted, where %3F corresponds to ?, %26 to &, and %3D to =.

String url = "http%3A//
www.someurl.com/someservice%3FcityVegas%26State%3DNevada";
SimpleDateFormat sdf2 = new
SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Calendar cal =
Calendar.getInstance();
IndexResponse response = client.prepareIndex("test",
"type1").setSource(jsonBuilder()
.startObject()
.field("url", url)
.field("timestamp",
sdf2.format(cal.getTime()))
.endObject()
)
.execute()
.actionGet();

Here is my response to :>curl -X GEThttp://localhost:9200/test/type1/_mapping
(assuming I put more than one documents as above):

..."facets":{"url":{"_type":"terms","missing":0,"terms":
[{"term":"www.someurl.com","count":4},{"term":"photoflipper","count":
4},{"term":"http
,"count":4},{"term":"3fcity","count":4},{"term":"3a","count":4},
{"term":"26state","count":4},{"term":"26model","count":4},
{"term":"3dNevada","count":2}]}}}

On Mar 15, 12:00 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Please post a full recreation:http://www.elasticsearch.org/help(includingputtingthe mapping and indexing data), simpler to help then.

On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GEThttp://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­­­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­­­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(Shay Banon) #15

Eugene,

I can see that curl is problematic, just gist a simple (yet complete) Java test case, I will have a look.

-shay.banon
On Tuesday, March 15, 2011 at 11:31 PM, eugene wrote:

Here is how I created it with curl:

C:>curl -X PUT http://localhost:9200/test/type1/_mappings -d
@mappings.json
{"ok":true,"_index":"test","_type":"type2","_id":"_mappings","_version":
1}

where the contents of mappings.json file are as follows:

{
"type1" : {
"properties" : {
"url" : {"type" : "string", "store" : "yes", "index" :
"not_analyzed"},
"timestamp" : {"type" : "string", "store" : "yes"}
}
}
}

On Mar 15, 9:12 am, Shay Banon shay.ba...@elasticsearch.com wrote:

No, there isn't a difference, its just simpler to recreate it with curl. The REST API is built on top of the Java API.

If you have problems with curl, then ->gist<- a simple test case that recreates it with Java, I can give it a go as well.

On Tuesday, March 15, 2011 at 5:57 PM, eugene wrote:

I created it using java. Could there a difference if I do this in
java instead of curl?

On Mar 15, 12:56 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Can you provide a curl recreation?

On Tuesday, March 15, 2011 at 9:54 AM, eugene wrote:

Here is how I created index 'test' and put mappings for 'type1':

client.admin().indices().prepareCreate("test").execute().actionGet();

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1").startObjec­­t("properties")
.startObject("url").field("type", "string").field("store",
"yes").field("index", "not_analyzed").endObject()
.startObject("timestamp").field("type",
"string").field("store", "yes").endObject()
.endObject().endObject().string();

client.admin().indices().preparePutMapping("test").setType("type1").setSour­­ce(mappings).execute().actionGet();

Then, I am writing data to the ES under 'test' index (note: url is
encrypted, where %3F corresponds to ?, %26 to &, and %3D to =.

String url = "http%3A//
www.someurl.com/someservice%3FcityVegas%26State%3DNevada";
SimpleDateFormat sdf2 = new
SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
Calendar cal =
Calendar.getInstance();
IndexResponse response = client.prepareIndex("test",
"type1").setSource(jsonBuilder()
.startObject()
.field("url", url)
.field("timestamp",
sdf2.format(cal.getTime()))
.endObject()
)
.execute()
.actionGet();

Here is my response to :>curl -X GEThttp://localhost:9200/test/type1/_mapping
(assuming I put more than one documents as above):

..."facets":{"url":{"_type":"terms","missing":0,"terms":
[{"term":"www.someurl.com","count":4},{"term":"photoflipper","count":
4},{"term":"http
,"count":4},{"term":"3fcity","count":4},{"term":"3a","count":4},
{"term":"26state","count":4},{"term":"26model","count":4},
{"term":"3dNevada","count":2}]}}}

On Mar 15, 12:00 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Please post a full recreation:http://www.elasticsearch.org/help(includingputtingthe mapping and indexing data), simpler to help then.

On Monday, March 14, 2011 at 7:59 PM, eugene wrote:

Shay,

It appears I didn't fix the error. I run the following:

"query" : {
"match_all": {}
},
"facets" : {
"url" : {
"terms" : {
"field" : "url"
}
}
}

I am still getting tokenized terms for "url" field. I checked the
url field and it is set not_analyzed.

C:>curl -X GEThttp://localhost:9200/test/type1/_mapping
{"test":{"type1":{"properties":{"timestamp":
{"store":"yes","type":"string"},"url":
{"index":"not_analyzed","store":"yes","type":"string"}}}}}

Am I using the wrong facet?

Thank you,
Eugene.

On Mar 11, 6:11 am, Shay Banon shay.ba...@elasticsearch.com wrote:

Not sure that this solved your problem, but a different change that you did where you don't use toString on the builder that represents the mapping. You can pass the builder directory to the putMapping API.

On Friday, March 11, 2011 at 4:14 AM, eugene wrote:

I solved this problem by creating a new index first, and only then
putting maps to it. I guess mappings should be done only once on a
new index/type pair.

-Eugene.

On Mar 10, 3:03 pm, eugene efur...@gmail.com wrote:

Ok, I am doing this:

String mappings =
XContentFactory.jsonBuilder().startObject().startObject("type1")
.startObject("myfield").field("type",
"string").field("store", "yes").field("index",
"not_analyzed").endObject().endObject().endObject().toString();

client.admin().indices().preparePutMapping().setType("type1")
.setSource(mappings).execute().actionGet();

I am getting the following exception (notice: I replaced real ip
address with "ip_address" on the first line).
What do you think it indicates? Thank you!

Eugene.

org.elasticsearch.transport.RemoteTransportException: [Stonecutter]
[inet[/ip_address:9300]][indices/mapping/put]
Caused by: org.elasticsearch.ElasticSearchParseException: Failed to
derive xcontent from
org.elasticsearch.common.xcontent.XContentBuilder@1c94b8f
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.­­­­­java:
136)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.extrac­­­­­tMapping(XContentDocumentMapperParser.java:
316)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­­XContentDocumentMapperParser.java:
114)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapperParser.parse(­­­­­XContentDocumentMapperParser.java:
54)
at
org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:
209)
at org.elasticsearch.cluster.metadata.MetaDataMappingService
$3.execute(MetaDataMappingService.java:193)
at org.elasticsearch.cluster.service.InternalClusterService
$2.run(InternalClusterService.java:175)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Mar 10, 4:16 am, Shay Banon shay.ba...@elasticsearch.com wrote:

There is an API to "putMapping" on an index, but, you can't change a field that is already analyzed to be "not_analyzed" (as its part of the indexing process, so all current indexed docs will be meaningless).

On Thursday, March 10, 2011 at 11:03 AM, eugene wrote:

Thank you Shay.

Is there any way to add mappings to existing index? I searched the
source code and only found the examples for CreateIndexBuilder class.

Eugene.

On Mar 9, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

You need to have the field you run the terms facet on marked as not_analyzed in its mapping. You will need to reindex the data in order for that to take affect (and of course, create the mappings before you index data).

On Thursday, March 10, 2011 at 5:12 AM, eugene wrote:

Hi,

I am running a terms facet query on non-numeric field, and it is
working fine. However, if the field contains % symbol, the resutls in
a TermsFacet are tokenized into separate terms. For example, if I
have a json document:

{
url: http%3Fwww.myserver.com/someaction/%3Fname%3DBob
}

and if I run: curl -X GEThttp://localhost:9200/_river/my_idx/_search
-d
{
"query" : {
"match_all" : {}
},
"facets" : {
"facet1" : {
"terms" : {
"field" : "url"
}
}
}
}

then it is returninng {"terms": [ {"term": "www.myserver.com",
"count":1}, {"term": "http", "count":1}, {"term":"3Fname", "count":1},
{"term":"3DBob", "count":1}

I think it is supposed to return {"terms": [ {"term":"http
%3Fwww.myserver.com/someaction/%3Fname%3DBob","count":1}

Please correct me if I am wrong. It never happens if a field doesn't
contain %.- Hide quoted text -

  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -- Hide quoted text -
  • Show quoted text -

(SANDPATH7) #16

Hi,

I am also facing the same problem. I am using facet to get all the unique values and their count for a field. And i am getting wrong result.

term: web
Count: 1191979
term: misc
Count: 1191979
term: passwd
Count: 1191979
term: etc
Count: 1191979

While the actual result should be:
term: WEB-MISC /etc/passwd
Count: 1191979

The term is getting truncated. Is some thing else, i need to do?

Thanks,


(system) #17