Hi Everyone,
First, I setup the ES server on AWS by the following instructions and the
tcp port 9200, 9300 is allowed on security group:
wget
https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.5.2.deb
sudo dpkg -i elasticsearch-1.5.2.deb
And by using sudo netstat -atnp to make sure the above ports are listening:
tcp6 0 0 :::9200 :::* LISTEN
tcp6 0 0 :::9300 :::* LISTEN
Then, my scala code:
val sparkConf = new SparkConf().setAppName("Test")
.setMaster("local[2]")
.set("es.nodes", "52.68.202.80:9200")
.set("es.nodes.discovery", "false")
val sc = new SparkContext(sparkConf)
// total 267 hits
val query =
"{"query":{"bool":{"must":[{"range":{"scan_time":{"from":"2014-09-01T00:00:00","to":"2014-09-01T00:00:59"}}}]}}}";
val data = sc.esRDD("wifi-collection/final_data", query)
data.collection().foreach(println)
It's weird that when I run the code on laptop (localhost), I always got the
following message:
15/05/14 00:23:36 INFO SparkContext: Running Spark version 1.3.1
15/05/14 00:23:37 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/05/14 00:23:37 INFO SecurityManager: Changing view acls to: jeremy
15/05/14 00:23:37 INFO SecurityManager: Changing modify acls to: jeremy
15/05/14 00:23:37 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(jeremy); users
with modify permissions: Set(jeremy)
15/05/14 00:23:37 INFO Slf4jLogger: Slf4jLogger started
15/05/14 00:23:37 INFO Remoting: Starting remoting
15/05/14 00:23:37 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@orion:58665]
15/05/14 00:23:37 INFO Utils: Successfully started service 'sparkDriver' on
port 58665.
15/05/14 00:23:37 INFO SparkEnv: Registering MapOutputTracker
15/05/14 00:23:37 INFO SparkEnv: Registering BlockManagerMaster
15/05/14 00:23:37 INFO DiskBlockManager: Created local directory at
/var/folders/lz/bc5hqqsn1gvg2hl4b8svwd_w0000gn/T/spark-4d9ee290-78d6-4537-8975-33886ece0b86/blockmgr-469a0ac4-e62d-4e55-b848-c3ddc5bf121f
15/05/14 00:23:37 INFO MemoryStore: MemoryStore started with capacity
1966.1 MB
15/05/14 00:23:37 INFO HttpFileServer: HTTP File server directory is
/var/folders/lz/bc5hqqsn1gvg2hl4b8svwd_w0000gn/T/spark-81548d67-dcb2-4e79-b382-79a2a0a32a76/httpd-e77bb5ee-c571-4a0b-bdab-8e8037a3205e
15/05/14 00:23:37 INFO HttpServer: Starting HTTP Server
15/05/14 00:23:37 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/14 00:23:37 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:58666
15/05/14 00:23:37 INFO Utils: Successfully started service 'HTTP file
server' on port 58666.
15/05/14 00:23:37 INFO SparkEnv: Registering OutputCommitCoordinator
15/05/14 00:23:37 INFO Server: jetty-8.y.z-SNAPSHOT
15/05/14 00:23:37 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/05/14 00:23:37 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
15/05/14 00:23:37 INFO SparkUI: Started SparkUI at http://orion:4040
15/05/14 00:23:38 INFO Executor: Starting executor ID on host
localhost
15/05/14 00:23:38 INFO AkkaUtils: Connecting to HeartbeatReceiver:
akka.tcp://sparkDriver@orion:58665/user/HeartbeatReceiver
15/05/14 00:23:38 INFO NettyBlockTransferService: Server created on 58667
15/05/14 00:23:38 INFO BlockManagerMaster: Trying to register BlockManager
15/05/14 00:23:38 INFO BlockManagerMasterActor: Registering block manager
localhost:58667 with 1966.1 MB RAM, BlockManagerId(, localhost,
58667)
15/05/14 00:23:38 INFO BlockManagerMaster: Registered BlockManager
15/05/14 00:23:39 INFO Version: Elasticsearch Hadoop v2.1.0.Beta4
[2c62e273d2]
15/05/14 00:23:39 INFO ScalaEsRDD: Reading from [wifi-collection/final_data]
15/05/14 00:23:39 INFO ScalaEsRDD: Discovered mapping
{wifi-collection=[mappings=[final_data=[bssid=STRING, gps_lat=DOUBLE,
gps_lng=DOUBLE, imei=STRING, m_lat=DOUBLE, m_lng=DOUBLE, net_lat=DOUBLE,
net_lng=DOUBLE, no=LONG, rss=DOUBLE, s_no=LONG, scan_time=DATE,
source=STRING, ssid=STRING, trace=LONG]]]} for [wifi-collection/final_data]
15/05/14 00:23:39 INFO SparkContext: Starting job: collect at App.scala:19
15/05/14 00:23:39 INFO DAGScheduler: Got job 0 (collect at App.scala:19)
with 5 output partitions (allowLocal=false)
15/05/14 00:23:39 INFO DAGScheduler: Final stage: Stage 0(collect at
App.scala:19)
15/05/14 00:23:39 INFO DAGScheduler: Parents of final stage: List()
15/05/14 00:23:39 INFO DAGScheduler: Missing parents: List()
15/05/14 00:23:39 INFO DAGScheduler: Submitting Stage 0 (ScalaEsRDD[0] at
RDD at AbstractEsRDD.scala:17), which has no missing parents
15/05/14 00:23:39 INFO MemoryStore: ensureFreeSpace(1496) called with
curMem=0, maxMem=2061647216
15/05/14 00:23:39 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1496.0 B, free 1966.1 MB)
15/05/14 00:23:39 INFO MemoryStore: ensureFreeSpace(1148) called with
curMem=1496, maxMem=2061647216
15/05/14 00:23:39 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1148.0 B, free 1966.1 MB)
15/05/14 00:23:39 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on localhost:58667 (size: 1148.0 B, free: 1966.1 MB)
15/05/14 00:23:39 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/05/14 00:23:39 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:839
15/05/14 00:23:39 INFO DAGScheduler: Submitting 5 missing tasks from Stage
0 (ScalaEsRDD[0] at RDD at AbstractEsRDD.scala:17)
15/05/14 00:23:39 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks
15/05/14 00:23:39 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, localhost, ANY, 3352 bytes)
15/05/14 00:23:39 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1, localhost, ANY, 3352 bytes)
15/05/14 00:23:39 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/05/14 00:23:39 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/05/14 00:24:54 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:24:54 INFO HttpMethodDirector: Retrying request
15/05/14 00:24:54 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:24:54 INFO HttpMethodDirector: Retrying request
15/05/14 00:26:09 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:26:09 INFO HttpMethodDirector: Retrying request
15/05/14 00:26:09 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:26:09 INFO HttpMethodDirector: Retrying request
15/05/14 00:27:25 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:27:25 INFO HttpMethodDirector: Retrying request
15/05/14 00:27:25 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:27:25 INFO HttpMethodDirector: Retrying request
15/05/14 00:28:40 ERROR NetworkClient: Node [Operation timed out] failed
(172.31.14.100:9200); selected next node [52.68.202.80:9200]
15/05/14 00:28:40 ERROR NetworkClient: Node [Operation timed out] failed
(172.31.14.100:9200); selected next node [52.68.202.80:9200]
15/05/14 00:28:40 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
18184 bytes result sent to driver
15/05/14 00:28:40 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID
2, localhost, ANY, 3352 bytes)
15/05/14 00:28:40 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
15/05/14 00:28:40 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 301234 ms on localhost (1/5)
15/05/14 00:28:40 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1).
19532 bytes result sent to driver
15/05/14 00:28:40 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID
3, localhost, ANY, 3352 bytes)
15/05/14 00:28:40 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
15/05/14 00:28:40 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID
- in 301315 ms on localhost (2/5)
15/05/14 00:29:55 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:29:55 INFO HttpMethodDirector: Retrying request
15/05/14 00:29:56 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:29:56 INFO HttpMethodDirector: Retrying request
15/05/14 00:31:11 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:31:11 INFO HttpMethodDirector: Retrying request
15/05/14 00:31:11 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:31:11 INFO HttpMethodDirector: Retrying request
15/05/14 00:32:26 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:32:26 INFO HttpMethodDirector: Retrying request
15/05/14 00:32:26 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:32:26 INFO HttpMethodDirector: Retrying request
15/05/14 00:33:42 ERROR NetworkClient: Node [Operation timed out] failed
(172.31.14.100:9200); selected next node [52.68.202.80:9200]
15/05/14 00:33:42 ERROR NetworkClient: Node [Operation timed out] failed
(172.31.14.100:9200); selected next node [52.68.202.80:9200]
15/05/14 00:33:42 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2).
18177 bytes result sent to driver
15/05/14 00:33:42 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID
4, localhost, ANY, 3352 bytes)
15/05/14 00:33:42 INFO Executor: Running task 4.0 in stage 0.0 (TID 4)
15/05/14 00:33:42 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID - in 301864 ms on localhost (3/5)
15/05/14 00:33:42 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3).
18176 bytes result sent to driver
15/05/14 00:33:42 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID - in 301882 ms on localhost (4/5)
15/05/14 00:34:58 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:34:58 INFO HttpMethodDirector: Retrying request
15/05/14 00:36:13 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:36:13 INFO HttpMethodDirector: Retrying request
15/05/14 00:37:29 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:37:29 INFO HttpMethodDirector: Retrying request
15/05/14 00:38:44 ERROR NetworkClient: Node [Operation timed out] failed
(172.31.14.100:9200); selected next node [52.68.202.80:9200]
15/05/14 00:38:45 INFO Executor: Finished task 4.0 in stage 0.0 (TID 4).
19190 bytes result sent to driver
15/05/14 00:38:45 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID - in 302573 ms on localhost (5/5)
15/05/14 00:38:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
15/05/14 00:38:45 INFO DAGScheduler: Stage 0 (collect at App.scala:19)
finished in 905.680 s
15/05/14 00:38:45 INFO DAGScheduler: Job 0 finished: collect at
App.scala:19, took 905.884621 s
- Finally, after the long way, starting to foreach(println)
(AU1KNAQHKoN_e7xsz2J7,Map(no -> 38344049, s_no -> 2988722, source ->
wifiscan2, bssid -> d850e6d5a770, ssid -> jasonlan, rss -> -91.0, gps_lat
-> 24.99175413, gps_lng -> 121.28153416, net_lat -> -10000.0, net_lng ->
-10000.0, m_lat -> 24.9916650318, m_lng -> 121.281452128, imei ->
352842060663324, scan_time -> Mon Sep 01 08:00:06 CST 2014, trace -> 1))
- and keep trying to connect
15/05/14 00:38:45 INFO SparkContext: Starting job: count at App.scala:20
15/05/14 00:38:45 INFO DAGScheduler: Got job 1 (count at App.scala:20) with
5 output partitions (allowLocal=false)
15/05/14 00:38:45 INFO DAGScheduler: Final stage: Stage 1(count at
App.scala:20)
15/05/14 00:38:45 INFO DAGScheduler: Parents of final stage: List()
15/05/14 00:38:45 INFO DAGScheduler: Missing parents: List()
15/05/14 00:38:45 INFO DAGScheduler: Submitting Stage 1 (ScalaEsRDD[0] at
RDD at AbstractEsRDD.scala:17), which has no missing parents
15/05/14 00:38:45 INFO MemoryStore: ensureFreeSpace(1464) called with
curMem=2644, maxMem=2061647216
15/05/14 00:38:45 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 1464.0 B, free 1966.1 MB)
15/05/14 00:38:45 INFO MemoryStore: ensureFreeSpace(1121) called with
curMem=4108, maxMem=2061647216
15/05/14 00:38:45 INFO MemoryStore: Block broadcast_1_piece0 stored as
bytes in memory (estimated size 1121.0 B, free 1966.1 MB)
15/05/14 00:38:45 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on localhost:58667 (size: 1121.0 B, free: 1966.1 MB)
15/05/14 00:38:45 INFO BlockManagerMaster: Updated info of block
broadcast_1_piece0
15/05/14 00:38:45 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:839
15/05/14 00:38:45 INFO DAGScheduler: Submitting 5 missing tasks from Stage
1 (ScalaEsRDD[0] at RDD at AbstractEsRDD.scala:17)
15/05/14 00:38:45 INFO TaskSchedulerImpl: Adding task set 1.0 with 5 tasks
15/05/14 00:38:45 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
5, localhost, ANY, 3352 bytes)
15/05/14 00:38:45 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
6, localhost, ANY, 3352 bytes)
15/05/14 00:38:45 INFO Executor: Running task 0.0 in stage 1.0 (TID 5)
15/05/14 00:38:45 INFO Executor: Running task 1.0 in stage 1.0 (TID 6)
15/05/14 00:40:00 INFO HttpMethodDirector: I/O exception
(java.net.ConnectException) caught when processing request: Operation timed
out
15/05/14 00:40:00 INFO HttpMethodDirector: Retrying request
It's weird, when I sbt package --> deploy it to the ES server -->
spark-submit --> everything is ok without waiting and error messages
I am sure I am missing some thing but unable to figure out that. >_<
Thanks for your help!!
--
Please update your bookmarks! We have moved to https://discuss.elastic.co/
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7589e5a7-8428-4b0a-aefe-7f082e5f480a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.