How to find a field with more than one value per doc?


(dbenson) #1

We have a field we're trying to sort on which is configured to analyze
as a single field.

Field mapping:
provider: {
omit_norms: true
analyzer: "lowercase_keyword"
type: "string"
}

Analyzer definition from the yml file:
lowercase_keyword :
type : custom
filter : [lowercase]
tokenizer : keyword

When we attempt to sort on this field we get the following error:
SearchPhaseExecutionException[Failed to execute phase [query], total
failure; shardFailures {[mmq2E8f3QUyBXbDrAgLINA][maestro-
report1_20101115173210][0]: QueryPhaseExecutionException[[maestro-
report1_20101115173210][0]:
query[ConstantScore(:)],from[0],size[10],sort[<custom:"provider":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b9537da>]: Query Failed [Failed to execute main query]]; nested:
IOException[Can't sort on string types with more than one value per
doc, or more than one token per field]; }{[mmq2E8f3QUyBXbDrAgLINA]...

How I find the documents which have multiple values in this field?

If we unintentionally indexed the same field twice could it end up as
an array in the JSON?

ES 0.13 Snapshot, using the Java API for indexing. Error is returned
by both the REST api and the Java api.

Thanks,

David


(ppearcy) #2

I work with David and wanted to mention that this seems like it may be
a regression. We still have a set up on 0.12 and it doesn't have this
issue using the same analyzer, granted the data may be slightly
different, so I can't be 100% sure.

We do have this tokenizer working great on a headline field that would
contain more varied data than the provider field.

While it will be helpful to track down the field causing the problem,
I don't understand how using a keyword tokenizer would ever result in
a multi-term field.

As always, thanks a ton for any guidance.

Best Regards,
Paul

On Nov 16, 2:08 pm, dbenson dben...@dbenson.net wrote:

We have a field we're trying to sort on which is configured to analyze
as a single field.

Field mapping:
provider: {
omit_norms: true
analyzer: "lowercase_keyword"
type: "string"

}

Analyzer definition from the yml file:
lowercase_keyword :
type : custom
filter : [lowercase]
tokenizer : keyword

When we attempt to sort on this field we get the following error:
SearchPhaseExecutionException[Failed to execute phase [query], total
failure; shardFailures {[mmq2E8f3QUyBXbDrAgLINA][maestro-
report1_20101115173210][0]: QueryPhaseExecutionException[[maestro-
report1_20101115173210][0]:
query[ConstantScore(:)],from[0],size[10],sort[<custom:"provider":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b9537da>]: Query Failed [Failed to execute main query]]; nested:
IOException[Can't sort on string types with more than one value per
doc, or more than one token per field]; }{[mmq2E8f3QUyBXbDrAgLINA]...

How I find the documents which have multiple values in this field?

If we unintentionally indexed the same field twice could it end up as
an array in the JSON?

ES 0.13 Snapshot, using the Java API for indexing. Error is returned
by both the REST api and the Java api.

Thanks,

David


(ppearcy) #3

Actually, take that back. Was able to reproduce in 0.12.

On Nov 16, 5:45 pm, Paul ppea...@gmail.com wrote:

I work with David and wanted to mention that this seems like it may be
a regression. We still have a set up on 0.12 and it doesn't have this
issue using the same analyzer, granted the data may be slightly
different, so I can't be 100% sure.

We do have this tokenizer working great on a headline field that would
contain more varied data than the provider field.

While it will be helpful to track down the field causing the problem,
I don't understand how using a keyword tokenizer would ever result in
a multi-term field.

As always, thanks a ton for any guidance.

Best Regards,
Paul

On Nov 16, 2:08 pm, dbenson dben...@dbenson.net wrote:

We have a field we're trying to sort on which is configured to analyze
as a single field.

Field mapping:
provider: {
omit_norms: true
analyzer: "lowercase_keyword"
type: "string"

}

Analyzer definition from the yml file:
lowercase_keyword :
type : custom
filter : [lowercase]
tokenizer : keyword

When we attempt to sort on this field we get the following error:
SearchPhaseExecutionException[Failed to execute phase [query], total
failure; shardFailures {[mmq2E8f3QUyBXbDrAgLINA][maestro-
report1_20101115173210][0]: QueryPhaseExecutionException[[maestro-
report1_20101115173210][0]:
query[ConstantScore(:)],from[0],size[10],sort[<custom:"provider":
org.elasticsearch.index.field.data.strings.StringFieldDataType
$1@5b9537da>]: Query Failed [Failed to execute main query]]; nested:
IOException[Can't sort on string types with more than one value per
doc, or more than one token per field]; }{[mmq2E8f3QUyBXbDrAgLINA]...

How I find the documents which have multiple values in this field?

If we unintentionally indexed the same field twice could it end up as
an array in the JSON?

ES 0.13 Snapshot, using the Java API for indexing. Error is returned
by both the REST api and the Java api.

Thanks,

David


(dbenson) #4

We never found a way to query for this, but it appears that we were
indexing the same field twice, using the Java api. Looking at the
source fields returned via the REST api, there was just a single
vaslue. But if you faceted that field for a single doc, you would get
back more than one value. We put a small check in our indexing code to
only permit a single value per field.

David


(system) #5