Hi,
6 nodes working cluster, 26M entries, with following configuration:
Index definition:
curl -XPUT "http://HOST:9200/my_index/_settings" -d '{
index: {
number_of_shards: 10,
number_of_replicas: 3,
"analysis": {
"analyzer": {
"parent_hierarchy_analyzer": {
"type": "custom",
"tokenizer": "path_hierarchy"
}
}
}
}
}'
Mapping definition:
curl -XPUT "http://HOST:9200/my_index/my_object/_mapping?pretty=true" -
d '
{
"infoclone": {
"properties": {
"parent_hierarchy": {
"type": "string",
"store": "no",
"omit_term_freq_and_positions" : true,
"analyzer": "parent_hierarchy_analyzer",
"index": "analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}
}
'
I'm trying to add additional field for every object, so I use filter
below to get all of the objects, without "parent_hierarchy" field, to
update it.
curl -XGET http://HOST:9200/my_index/my_object/_search?pretty=1 -d '{
"from" : 0,
"size" : 1000,
"query" : {
"constant_score" : {
"filter" : {
"bool" : {
"must" : {
"missing" : {
"field" : "parent_hierarchy"
}
}
}
}
}
}
}
'
I've successfully updated ~16M entries, (from java client, using above
query, bulk update, refresh=true)
Now, every time I execute the query, I get two errors in response
body:
-
Every query the IP changes
{
"took" : 91,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 9,
"failed" : 1,
"failures" : [ {
"status" : 500,
"reason" : "RemoteTransportException[[Schmidt, Johann][inet[/
10.11.10.74:9300]][search/phase/fetch/id]]; nested:
FieldReaderException[Invalid numeric type: 38]; "
} ]
},
"hits" : {
"total" : 686244,
"max_score" : 1.0,
"hits" : [ ]
}
} -
Another version of response
{
"took" : 49,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 9,
"failed" : 1,
"failures" : [ {
"status" : 500,
"reason" : "FieldReaderException[Invalid numeric type: 38]"
} ]
},
"hits" : {
"total" : 686244,
"max_score" : 1.0,
"hits" : [ ]
}
}
Health status below:
curl -s -XGET 'http://HOST:9200/_cluster/health?pretty=1'
{
"cluster_name" : "CMWELL_INDEX_CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 12,
"number_of_data_nodes" : 6,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
elsticsearch.yml:
cluster.name : MY_INDEX
path:
home: data/elasticsearch
logs: data/elasticsearch/logs
gateway:
recover_after_nodes: 5
recover_after_time: 1m
expected_nodes: 6
local:
initial_shards: 1
#default 10% of memory
#indices.memory.index_buffer_size : 1024m
#default false
index.compound_format : true
#default 1s
index.refresh_interval : 10s
#defualt 128
index.term_index_interval: 128
#Merging
#default 10
index.merge.policy.merge_factor: 30
#default 1.6mb
index.merge.policy.min_merge_size: 16mb
#default unbounded
#index.merge.policy.max_merge_size: 1024mb
#default unbounded
#index.merge.policy.maxMergeDocs
#Transaction log settings
#After how many operations to flush/ Defaults to 20000/
#index.translog.flush_threshold_ops: 20000
#Once the translog hits this size, a flush will happen/ Defaults to
500mb/
#index.translog.flush_threshold_size
#The period with no flush happening to force a flush/ Defaults to 60m/
#index.translog.flush_threshold_period
#Cache configurations
#defualt 20%
indices.cache.filter.size: 10%
#defualt -1
#1 entry ~1MB
#index.cache.filter.max_size: 100
#defualt -1
index.cache.filter.expire: 1m
#defualt -1
#index.cache.field.max_size: -1
#default -1
index.cache.field.expire: 1m
Every idea will be appreciated.
Thanks