Missing index issue in a cluster

I have a 5 node cluster , what I noticed is that when I create the ES external table on node1 , the index is being created on node5 (where hive is not installed) , and then when I try to select from this table I get the error index missing .
I know the index is there since I can select from it on node using curl .
how can I make hive to create the index on node1 and not node5?

hive> select * from pa_lane_txn_es ;
OK
Failed with exception java.io.IOException:org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [lane_txn/txn_id] missing and settings [es.index.read.missing.as.empty] is set to false
Time taken: 0.055 seconds
hive>

CREATE EXTERNAL TABLE pa_lane_txn_es (
txn_id BIGINT,
txn_process_date TIMESTAMP,
transp_id STRING,
ext_plaza_id STRING,
ext_lane_id STRING,
ext_date_time TIMESTAMP,
toll_amt_charged FLOAT,
toll_amt_collected FLOAT,
toll_amt_full FLOAT,
ent_plaza_id STRING,ent_date_time TIMESTAMP,ent_lane_id STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES( 'es.nodes.wan.only' = 'true','es.resource' = 'lane_txn/txn_id','es.mapping.id'='txn_id');

[root@hadoop5 ~]# curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana TE5kvKgeRx6mmU0x3kbN4g 1 1 2 0 8.5kb 8.5kb
yellow open lane_txn WQRSyvgBSKqUiuBgRcfI2g 5 1 10 0 32kb 32kb
[root@hadoop5 ~]#

The connector should be able to read data from the index on node 5 if it is indeed part of the elasticsearch cluster. Could you share your cluster setup (hive and Elasticsearch) as well as any failure logs from the hive workers/server?

I did not configure elasticsearch as cluster , I have a hadoop 5 node cluster running hortonworks.

wanted to upload the logfiles but it wont let me .

elasticsearch.yml file contents

cluster.name: elasticsearch
node.name: "node1"
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: 127.0.0.1
http.port: 9200

tail -300 elasticsearch.log

[2017-07-31T10:23:36,608][WARN ][o.e.b.JNANatives ] unable to install syscall filter:
java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:363) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.SystemCallFilter.init(SystemCallFilter.java:638) ~[elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.JNANatives.tryInstallSystemCallFilter(JNANatives.java:215) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Natives.tryInstallSystemCallFilter(Natives.java:99) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:111) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:194) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:351) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:114) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:67) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:122) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.cli.Command.main(Command.java:88) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:91) [elasticsearch-5.5.0.jar:5.5.0]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:84) [elasticsearch-5.5.0.jar:5.5.0]
[2017-07-31T10:23:36,707][INFO ][o.e.n.Node ] [node1] initializing ...
[2017-07-31T10:23:36,762][INFO ][o.e.e.NodeEnvironment ] [node1] using [1] data paths, mounts [[/ (/dev/mapper/vg_hadoop1-lv_root)]], net usable_space [12.3gb], net total_space [49gb], spins? [possibly], types [ext4]
[2017-07-31T10:23:36,762][INFO ][o.e.e.NodeEnvironment ] [node1] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-07-31T10:23:36,764][INFO ][o.e.n.Node ] [node1] node name [node1], node ID [foP4x5AWSJKBvq-yCSo4-A]
[2017-07-31T10:23:36,764][INFO ][o.e.n.Node ] [node1] version[5.5.0], pid[11538], build[260387d/2017-06-30T23:16:05.735Z], OS[Linux/2.6.32-642.6.2.el6.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b15]
[2017-07-31T10:23:36,764][INFO ][o.e.n.Node ] [node1] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/share/elasticsearch]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [aggs-matrix-stats]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [ingest-common]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-expression]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-groovy]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-mustache]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [lang-painless]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [parent-join]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [percolator]
[2017-07-31T10:23:37,429][INFO ][o.e.p.PluginsService ] [node1] loaded module [reindex]
[2017-07-31T10:23:37,430][INFO ][o.e.p.PluginsService ] [node1] loaded module [transport-netty3]
[2017-07-31T10:23:37,430][INFO ][o.e.p.PluginsService ] [node1] loaded module [transport-netty4]
[2017-07-31T10:23:37,430][INFO ][o.e.p.PluginsService ] [node1] no plugins loaded
[2017-07-31T10:23:38,505][INFO ][o.e.d.DiscoveryModule ] [node1] using discovery type [zen]
[2017-07-31T10:23:38,908][INFO ][o.e.n.Node ] [node1] initialized
[2017-07-31T10:23:38,908][INFO ][o.e.n.Node ] [node1] starting ...
[2017-07-31T10:23:39,027][INFO ][o.e.t.TransportService ] [node1] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2017-07-31T10:23:39,036][WARN ][o.e.b.BootstrapChecks ] [node1] max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048]
[2017-07-31T10:23:39,037][WARN ][o.e.b.BootstrapChecks ] [node1] system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2017-07-31T10:23:42,104][INFO ][o.e.c.s.ClusterService ] [node1] new_master {node1}{foP4x5AWSJKBvq-yCSo4-A}{Q6Vtni35R0WfWr8rh_LmPw}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-07-31T10:23:42,120][INFO ][o.e.h.n.Netty4HttpServerTransport] [node1] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2017-07-31T10:23:42,120][INFO ][o.e.n.Node ] [node1] started
[2017-07-31T10:23:42,138][INFO ][o.e.g.GatewayService ] [node1] recovered [0] indices into cluster_state

I don't know if this would be of any help. on node1 I commented out the clustername parameter . also noted that on node1 its not pulling up the indexes even though curl shows elasticsearch is up .
on other nodes it does show the indexes.

[root@hadoop1 elasticsearch]# curl '127.0.0.1:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

[root@hadoop1 elasticsearch]# curl 127.0.0.1:9200
{
"name" : "node1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "T7Lvu_xIRpmT4zBjdyKGBQ",
"version" : {
"number" : "5.5.0",
"build_hash" : "260387d",
"build_date" : "2017-06-30T23:16:05.735Z",
"build_snapshot" : false,
"lucene_version" : "6.6.0"
},
"tagline" : "You Know, for Search"
}

[root@hadoop5 ~]# curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana TE5kvKgeRx6mmU0x3kbN4g 1 1 2 0 8.5kb 8.5kb
yellow open lane_txn VDSqRpIaTlKVBdNSuwIwYg 5 1 10 0 32kb 32kb
[root@hadoop5 ~]#

[root@hadoop3 ~]# curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana -LYFpMNjRUmIHFiI24U18A 1 1 1 0 3.2kb 3.2kb
You have new mail in /var/spool/mail/root

I added this parameter to elasticsearch.yml on node1 but still I don't see its adding other nodes to the cluster and strangely enough it does create index on node5.

discovery.zen.ping.unicast.hosts: ["10.100.44.16", "10.100.44.17", "10.100.44.18", "10.100.44.19", "10.100.44.20"]

Are you running a separate Elasticsearch node on each Hadoop node?

You will need to configure the Elasticsearch nodes to form a cluster amongst themselves. If an index is created on node 5 and node 5 is not in a cluster with the other nodes, then those nodes will not be aware of the index created on node 5.

@aliyesami Please follow the instructions here on configuring Elasticsearch to run as a cluster.

hi James I followed the instructions best as I can and also tried to apply the info I found on web but the nodes are not discovering each other , please see my configuration files and the elastic log.

[root@hadoop1 ~]# cat /etc/elasticsearch/elasticsearch.yml
cluster.name: ftes
transport.host: 127.0.0.1
http.host: 0.0.0.0
network.bind_host: "hadoop2,hadoop3,hadoop4,hadoop5"
network.publish_host: non_loopback:ipv4
node.name: hadoop1
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: hadoop1
bootstrap.system_call_filter: false
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.unicast.hosts: ["10.100.44.16" ,"10.100.44.17" ,"10.100.44.18" ,"10.100.44.19" ,"10.100.44.20"]
discovery.zen.ping.unicast.hosts: ["hadoop1","hadoop2","hadoop3","hadoop4","hadoop5"]

[root@hadoop2 ~]# cat /etc/elasticsearch/elasticsearch.yml
cluster.name: ftes
transport.host: 127.0.0.1
http.host: 0.0.0.0
network.bind_host: "hadoop1,hadoop3,hadoop4,hadoop5"
network.publish_host: non_loopback:ipv4
node.name: hadoop2
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: hadoop2
bootstrap.system_call_filter: false
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.unicast.hosts: ["10.100.44.16" ,"10.100.44.17" ,"10.100.44.18" ,"10.100.44.19" ,"10.100.44.20"]
discovery.zen.ping.unicast.hosts: ["hadoop1","hadoop2","hadoop3","hadoop4","hadoop5"]

[root@hadoop3 ~]# cat /etc/elasticsearch/elasticsearch.yml
cluster.name: ftes
transport.host: 127.0.0.1
http.host: 0.0.0.0
network.bind_host: "hadoop1,hadoop2,hadoop4,hadoop5"
network.publish_host: non_loopback:ipv4
node.name: hadoop3
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: hadoop3
bootstrap.system_call_filter: false
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.unicast.hosts: ["10.100.44.16" ,"10.100.44.17" ,"10.100.44.18" ,"10.100.44.19" ,"10.100.44.20"]
discovery.zen.ping.unicast.hosts: ["hadoop1","hadoop2","hadoop3","hadoop4","hadoop5"]

[root@hadoop4 ~]# cat /etc/elasticsearch/elasticsearch.yml
cluster.name: ftes
transport.host: 127.0.0.1
http.host: 0.0.0.0
network.bind_host: "hadoop1,hadoop2,hadoop3,hadoop5"
network.publish_host: non_loopback:ipv4
node.name: hadoop4
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: hadoop4
bootstrap.system_call_filter: false
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.unicast.hosts: ["10.100.44.16" ,"10.100.44.17" ,"10.100.44.18" ,"10.100.44.19" ,"10.100.44.20"]
discovery.zen.ping.unicast.hosts: ["hadoop1","hadoop2","hadoop3","hadoop4","hadoop5"]

[root@hadoop5 ~]# cat /etc/elasticsearch/elasticsearch.yml
cluster.name: ftes
transport.host: 127.0.0.1
http.host: 0.0.0.0
network.bind_host: "hadoop1,hadoop2,hadoop3,hadoop4"
network.publish_host: non_loopback:ipv4
node.name: hadoop5
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: hadoop5
bootstrap.system_call_filter: false
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.unicast.hosts: ["10.100.44.16" ,"10.100.44.17" ,"10.100.44.18" ,"10.100.44.19" ,"10.100.44.20"]
discovery.zen.ping.unicast.hosts: ["hadoop1","hadoop2","hadoop3","hadoop4","hadoop5"]

[2017-08-02T09:11:22,652][INFO ][o.e.n.Node ] [hadoop1] initializing ...
[2017-08-02T09:11:22,730][INFO ][o.e.e.NodeEnvironment ] [hadoop1] using [1] data paths, mounts [[/ (/dev/mapper/vg_hadoop1-lv_root)]], net usable_space [12gb], net total_space [49gb], spins? [possibly], types [ext4]
[2017-08-02T09:11:22,731][INFO ][o.e.e.NodeEnvironment ] [hadoop1] heap size [1.9gb], compressed ordinary object pointers [true]
[2017-08-02T09:11:22,733][INFO ][o.e.n.Node ] [hadoop1] node name [hadoop1], node ID [sdZ-rxXISYuu4gkVf__-mw]
[2017-08-02T09:11:22,733][INFO ][o.e.n.Node ] [hadoop1] version[5.5.0], pid[26216], build[260387d/2017-06-30T23:16:05.735Z], OS[Linux/2.6.32-642.6.2.el6.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_111/25.111-b15]
[2017-08-02T09:11:22,733][INFO ][o.e.n.Node ] [hadoop1] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+DisableExplicitGC, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/share/elasticsearch]
[2017-08-02T09:11:23,502][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [aggs-matrix-stats]
[2017-08-02T09:11:23,502][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [ingest-common]
[2017-08-02T09:11:23,502][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [lang-expression]
[2017-08-02T09:11:23,502][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [lang-groovy]
[2017-08-02T09:11:23,502][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [lang-mustache]
[2017-08-02T09:11:23,502][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [parent-join]
[2017-08-02T09:11:23,503][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [percolator]
[2017-08-02T09:11:23,503][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [reindex]
[2017-08-02T09:11:23,503][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [transport-netty3]
[2017-08-02T09:11:23,503][INFO ][o.e.p.PluginsService ] [hadoop1] loaded module [transport-netty4]
[2017-08-02T09:11:23,503][INFO ][o.e.p.PluginsService ] [hadoop1] no plugins loaded
[2017-08-02T09:11:24,856][INFO ][o.e.d.DiscoveryModule ] [hadoop1] using discovery type [zen]
[2017-08-02T09:11:25,316][INFO ][o.e.n.Node ] [hadoop1] initialized
[2017-08-02T09:11:25,316][INFO ][o.e.n.Node ] [hadoop1] starting ...
[2017-08-02T09:11:25,460][INFO ][o.e.t.TransportService ] [hadoop1] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2017-08-02T09:11:25,471][WARN ][o.e.b.BootstrapChecks ] [hadoop1] max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048]
[2017-08-02T09:11:28,505][WARN ][o.e.d.z.ZenDiscovery ] [hadoop1] not enough master nodes discovered during pinging (found [[Candidate{node={hadoop1}{sdZ-rxXISYuu4gkVf__-mw}{Q1_mobLlTP6CHxHsbyICYA}{127.0.0.1}{127.0.0.1:9300}, clusterStateVersion=-1}]], but needed [3]), pinging again

I think your network settings need to be fixed up a little bit. In this case your network.host property should be fine (assuming hadoop1 is a resolvable hostname).

Your entries for transport.host and http.host are binding to local addresses. Those two settings normally default to whatever network.host is configured to. In setting them, you are overriding each network module to bind to the local address, causing the inability to communicate over the network.

I would also remove the network.bind_host and network.publish_host since those do not seem to be correctly set either. The bind host setting is for telling the node which addresses to listen on, which should be the node's network addresses. The publish host setting is what the node uses to tell other nodes where they can reach it. These settings are here in the event that you are running Elasticsearch with a very particular network setup, and for most cases, they can be ignored. They will default to the value of network.host if unset.

For more information on how the networking is configured in Elasticsearch, we have the general network settings documentation here, documentation about the transport level communication here, and documentation about the http level communication here.

what should i do with transport.host and http.host settings?

hadoop1 to hadoop5 are not in DNS but resolvable via hosts file .

and if i remove all the four parameters that you mentioned than Elasticsearch refuse to start even.

[elasticsearch@hadoop2 config]$ more elasticsearch.yml
cluster.name: ftes
node.name: hadoop2
node.master: true
node.data: true
path.data: /elastic/data
path.logs: /elastic/logs
network.host: hadoop2
bootstrap.system_call_filter: false
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.unicast.hosts: [“10.100.44.16” ,“10.100.44.17” ,“10.100.44.18” ,“10.100.44.19” ,“10.100.44.20”]
#discovery.zen.ping.unicast.hosts: [“hadoop1”,“hadoop2”,“hadoop3”,“hadoop4”,“hadoop5”]

[2017-08-02T16:14:38,411][INFO ][o.e.p.PluginsService ] [hadoop1] no plugins loaded
[2017-08-02T16:14:39,551][INFO ][o.e.d.DiscoveryModule ] [hadoop1] using discovery type [zen]
[2017-08-02T16:14:40,111][INFO ][o.e.n.Node ] [hadoop1] initialized
[2017-08-02T16:14:40,112][INFO ][o.e.n.Node ] [hadoop1] starting ...
[2017-08-02T16:14:40,241][INFO ][o.e.t.TransportService ] [hadoop1] publish_address {10.100.44.17:9300}, bound_addresses {10.100.44.17:9300}
[2017-08-02T16:14:40,250][INFO ][o.e.b.BootstrapChecks ] [hadoop1] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-02T16:14:40,251][ERROR][o.e.b.Bootstrap ] [hadoop1] node validation exception
[1] bootstrap checks failed
[1]: max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048]
[2017-08-02T16:14:40,253][INFO ][o.e.n.Node ] [hadoop1] stopping ...
[2017-08-02T16:14:40,316][INFO ][o.e.n.Node ] [hadoop1] stopped
[2017-08-02T16:14:40,316][INFO ][o.e.n.Node ] [hadoop1] closing ...
[2017-08-02T16:14:40,326][INFO ][o.e.n.Node ] [hadoop1] closed
[elasticsearch@hadoop1 config]$

did more tests and narrowed it down to this parameter , without this parameter the elasticsearch is not starting .

transport.host: 127.0.0.1

@aliyesami You can leave transport.host and http.host off entirely. They'll default to the network.host property value.

The errors you are seeing in your logs indicate that you are failing on the bootstrap checks that run during the Elasticsearch node startup. You can look here for a breakdown for why these checks exist, as well as here for your specific issue in the logs.

awesome thanks !
after I removed the two parameters you mentioned and added the nproc and nofiles parameter to /etc/security/limits.conf the cluster came up .
It also fixed my other issue of hive not being able to read the created index.

appreciate your help

Glad to hear! Cheers :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.