Error on missing field query


(vpunski) #1

Hi,
6 nodes working cluster, 26M entries, with following configuration:

Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}
}'

Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true" -
d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}
}
'

I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.

curl -XGET http://HOST:9200/my_index/my_object/_search?pretty=1 -d '{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}
}
'
I've successfully updated ~16M entries, (from java client, using above
query, bulk update, refresh=true)

Now, every time I execute the query, I get two errors in response
body:

  1. Every query the IP changes
    {
    "took" : 91,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
    10.11.10.74:9300]][search/phase/fetch/id]]; nested:
    FieldReaderException[Invalid numeric type: 38]; "
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }
    }

  2. Another version of response
    {
    "took" : 49,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "FieldReaderException[Invalid numeric type: 38]"
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }
    }

Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

elsticsearch.yml:

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1

#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128

#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs

#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000

#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size

#The period with no flush happening to force a flush/ Defaults to 60m/
#index.translog.flush_threshold_period

#Cache configurations

#defualt 20%
indices.cache.filter.size: 10%

#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100

#defualt -1
index.cache.filter.expire: 1m

#defualt -1
#index.cache.field.max_size: -1

#default -1
index.cache.field.expire: 1m

Every idea will be appreciated.
Thanks


(vpunski) #2

Please note:
"total" : 10,
"successful" : 9,

On shard fails.

On Oct 26, 2:06 pm, vadim vpun...@gmail.com wrote:

Hi,
6 nodes working cluster, 26M entries, with following configuration:

Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}

}'

Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true" -
d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}}

'

I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.

curl -XGEThttp://HOST:9200/my_index/my_object/_search?pretty=1-d '{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}}

'
I've successfully updated ~16M entries, (from java client, using above
query, bulk update, refresh=true)

Now, every time I execute the query, I get two errors in response
body:

  1. Every query the IP changes
    {
    "took" : 91,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
    10.11.10.74:9300]][search/phase/fetch/id]]; nested:
    FieldReaderException[Invalid numeric type: 38]; "
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

  1. Another version of response
    {
    "took" : 49,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "FieldReaderException[Invalid numeric type: 38]"
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}

elsticsearch.yml:

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1

#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128

#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs

#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000

#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size

#The period with no flush happening to force a flush/ Defaults to 60m/
#index.translog.flush_threshold_period

#Cache configurations

#defualt 20%
indices.cache.filter.size: 10%

#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100

#defualt -1
index.cache.filter.expire: 1m

#defualt -1
#index.cache.field.max_size: -1

#default -1
index.cache.field.expire: 1m

Every idea will be appreciated.
Thanks


(vpunski) #3

Any ideas?
Even regarding the error mesage:

"reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
10.11.10.74:9300]][search/phase/fetch/id]]; nested:
FieldReaderException[Invalid numeric type: 38]; "

What is FieldReaderException?
Trying to find out the reason in Lucene code... too low level.
Can anyone explain the meaning of the error?

Thanks

On Oct 26, 2:07 pm, vadim vpun...@gmail.com wrote:

Please note:
"total" : 10,
"successful" : 9,

On shard fails.

On Oct 26, 2:06 pm, vadim vpun...@gmail.com wrote:

Hi,
6 nodes working cluster, 26M entries, with following configuration:

Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}

}'

Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true" -
d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}}

'

I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.

curl -XGEThttp://HOST:9200/my_index/my_object/_search?pretty=1-d'{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}}

'
I've successfully updated ~16M entries, (from java client, using above
query, bulk update, refresh=true)

Now, every time I execute the query, I get two errors in response
body:

  1. Every query the IP changes
    {
    "took" : 91,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
    10.11.10.74:9300]][search/phase/fetch/id]]; nested:
    FieldReaderException[Invalid numeric type: 38]; "
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

  1. Another version of response
    {
    "took" : 49,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "FieldReaderException[Invalid numeric type: 38]"
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}

elsticsearch.yml:

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1

#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128

#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs

#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000

#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size

#The period with no flush happening to force a flush/ Defaults to 60m/
#index.translog.flush_threshold_period

#Cache configurations

#defualt 20%
indices.cache.filter.size: 10%

#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100

#defualt -1
index.cache.filter.expire: 1m

#defualt -1
#index.cache.field.max_size: -1

#default -1
index.cache.field.expire: 1m

Every idea will be appreciated.
Thanks


(vpunski) #4

Still trying to solve the problem, and I have huge advance.
I noticed that the error returns number of IPs equal to number of
replicas.
Using elasticsearch-head it was shard 9.
The idea that particular shard is broken was checked by opening all
three replicas in Luke,
Indeed, the same FieldReaderException exception returned in all
replicas of 9-th shard, by requesting some documents with
"problemmatic" IDs.

A number of questions arise:
How could it happened, that broken shard was replicated on two others?
Can someone explain the flow of system start up, data consistency
checking and shard relocation process?
May initial_shards=1 be a problem?
Does the system checks file consistency on startup or during
relocation to be sure that no corrupted data transferred?

Thanks

On Oct 27, 9:41 am, vadim vpun...@gmail.com wrote:

Any ideas?
Even regarding the error mesage:

"reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
10.11.10.74:9300]][search/phase/fetch/id]]; nested:
FieldReaderException[Invalid numeric type: 38]; "

What is FieldReaderException?
Trying to find out the reason in Lucene code... too low level.
Can anyone explain the meaning of the error?

Thanks

On Oct 26, 2:07 pm, vadim vpun...@gmail.com wrote:

Please note:
"total" : 10,
"successful" : 9,

On shard fails.

On Oct 26, 2:06 pm, vadim vpun...@gmail.com wrote:

Hi,
6 nodes working cluster, 26M entries, with following configuration:

Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}

}'

Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true" -
d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}}

'

I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.

curl -XGEThttp://HOST:9200/my_index/my_object/_search?pretty=1-d'{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}}

'
I've successfully updated ~16M entries, (from java client, using above
query, bulk update, refresh=true)

Now, every time I execute the query, I get two errors in response
body:

  1. Every query the IP changes
    {
    "took" : 91,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
    10.11.10.74:9300]][search/phase/fetch/id]]; nested:
    FieldReaderException[Invalid numeric type: 38]; "
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

  1. Another version of response
    {
    "took" : 49,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "FieldReaderException[Invalid numeric type: 38]"
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}

elsticsearch.yml:

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1

#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128

#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs

#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000

#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size

#The period with no flush happening to force a flush/ Defaults to 60m/
#index.translog.flush_threshold_period

#Cache configurations

#defualt 20%
indices.cache.filter.size: 10%

#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100

#defualt -1
index.cache.filter.expire: 1m

#defualt -1
#index.cache.field.max_size: -1

#default -1
index.cache.field.expire: 1m

Every idea will be appreciated.
Thanks


(Shay Banon) #5

Is it something that you can recreate? Basically, when an operation occurs
on a shard, it is also executed on the replica. When shards move around,
then sync against the primary shard in terms of actual index files.

On Thu, Oct 27, 2011 at 5:27 PM, vadim vpunski@gmail.com wrote:

Still trying to solve the problem, and I have huge advance.
I noticed that the error returns number of IPs equal to number of
replicas.
Using elasticsearch-head it was shard 9.
The idea that particular shard is broken was checked by opening all
three replicas in Luke,
Indeed, the same FieldReaderException exception returned in all
replicas of 9-th shard, by requesting some documents with
"problemmatic" IDs.

A number of questions arise:
How could it happened, that broken shard was replicated on two others?
Can someone explain the flow of system start up, data consistency
checking and shard relocation process?
May initial_shards=1 be a problem?
Does the system checks file consistency on startup or during
relocation to be sure that no corrupted data transferred?

Thanks

On Oct 27, 9:41 am, vadim vpun...@gmail.com wrote:

Any ideas?
Even regarding the error mesage:

"reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
10.11.10.74:9300]][search/phase/fetch/id]]; nested:
FieldReaderException[Invalid numeric type: 38]; "

What is FieldReaderException?
Trying to find out the reason in Lucene code... too low level.
Can anyone explain the meaning of the error?

Thanks

On Oct 26, 2:07 pm, vadim vpun...@gmail.com wrote:

Please note:
"total" : 10,
"successful" : 9,

On shard fails.

On Oct 26, 2:06 pm, vadim vpun...@gmail.com wrote:

Hi,
6 nodes working cluster, 26M entries, with following configuration:

Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}

}'

Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true"

d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}}

'

I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.

curl -XGEThttp://HOST:9200/my_index/my_object/_search?pretty=1-d'{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}}

'
I've successfully updated ~16M entries, (from java client, using
above

query, bulk update, refresh=true)

Now, every time I execute the query, I get two errors in response
body:

  1. Every query the IP changes
    {
    "took" : 91,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
    10.11.10.74:9300]][search/phase/fetch/id]]; nested:
    FieldReaderException[Invalid numeric type: 38]; "
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

  1. Another version of response
    {
    "took" : 49,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "FieldReaderException[Invalid numeric type: 38]"
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}

elsticsearch.yml:

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1

#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128

#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs

#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000

#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size

#The period with no flush happening to force a flush/ Defaults to
60m/

#index.translog.flush_threshold_period

#Cache configurations

#defualt 20%
indices.cache.filter.size: 10%

#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100

#defualt -1
index.cache.filter.expire: 1m

#defualt -1
#index.cache.field.max_size: -1

#default -1
index.cache.field.expire: 1m

Every idea will be appreciated.
Thanks


(vpunski) #6

I don't think it's possible to reproduce... may be by changing the
file manually, to simulate damage file ...
I had a lot of full cluster restart during last few weeks ... some of
them emergency stops with KILL...

Let me understand the flow:
If some file of primary shard get damaged, it will be replicated over
the cluster during startup?
Are there any flow to validate file consistency on start up, for
example?
Can you elaborate the life cycle please?
It's very

Thanks

On Oct 28, 7:52 am, Shay Banon kim...@gmail.com wrote:

Is it something that you can recreate? Basically, when an operation occurs
on a shard, it is also executed on the replica. When shards move around,
then sync against the primary shard in terms of actual index files.

On Oct 28, 7:52 am, Shay Banon kim...@gmail.com wrote:

Is it something that you can recreate? Basically, when an operation occurs
on a shard, it is also executed on the replica. When shards move around,
then sync against the primary shard in terms of actual index files.

On Thu, Oct 27, 2011 at 5:27 PM, vadim vpun...@gmail.com wrote:

Still trying to solve the problem, and I have huge advance.
I noticed that the error returns number of IPs equal to number of
replicas.
Using elasticsearch-head it was shard 9.
The idea that particular shard is broken was checked by opening all
three replicas in Luke,
Indeed, the same FieldReaderException exception returned in all
replicas of 9-th shard, by requesting some documents with
"problemmatic" IDs.

A number of questions arise:
How could it happened, that broken shard was replicated on two others?
Can someone explain the flow of system start up, data consistency
checking and shard relocation process?
May initial_shards=1 be a problem?
Does the system checks file consistency on startup or during
relocation to be sure that no corrupted data transferred?

Thanks

On Oct 27, 9:41 am, vadim vpun...@gmail.com wrote:

Any ideas?
Even regarding the error mesage:

"reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
10.11.10.74:9300]][search/phase/fetch/id]]; nested:
FieldReaderException[Invalid numeric type: 38]; "

What is FieldReaderException?
Trying to find out the reason in Lucene code... too low level.
Can anyone explain the meaning of the error?

Thanks

On Oct 26, 2:07 pm, vadim vpun...@gmail.com wrote:

Please note:
"total" : 10,
"successful" : 9,

On shard fails.

On Oct 26, 2:06 pm, vadim vpun...@gmail.com wrote:

Hi,
6 nodes working cluster, 26M entries, with following configuration:

Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}

}'

Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true"

d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}}

'

I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.

curl -XGEThttp://HOST:9200/my_index/my_object/_search?pretty=1-d'{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}}

'
I've successfully updated ~16M entries, (from java client, using
above

query, bulk update, refresh=true)

Now, every time I execute the query, I get two errors in response
body:

  1. Every query the IP changes
    {
    "took" : 91,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
    10.11.10.74:9300]][search/phase/fetch/id]]; nested:
    FieldReaderException[Invalid numeric type: 38]; "
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

  1. Another version of response
    {
    "took" : 49,
    "timed_out" : false,
    "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
    "status" : 500,
    "reason" : "FieldReaderException[Invalid numeric type: 38]"
    } ]
    },
    "hits" : {
    "total" : 686244,
    "max_score" : 1.0,
    "hits" : [ ]
    }

}

Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0

}

elsticsearch.yml:

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1

#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128

#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs

#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000

#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size

#The period with no flush happening to force a flush/ Defaults to
60m/

#index.translog.flush_threshold_period

#Cache configurations

#defualt 20%
indices.cache.filter.size: 10%

#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100

#defualt -1
index.cache.filter.expire: 1m

#defualt -1
#index.cache.field.max_size: -1

#default -1
index.cache.field.expire: 1m

Every idea will be appreciated.
Thanks


(system) #7