Hi everyone,
I have successfully employed elasticsearch-jdbc plugin to pump my data (in river mode) from a Postgres db. Everything ran smoothly until I tried to map my data from joined tables into nested documents - I found the river to be fetching only a part of the embedded objects.
The SQL query, run independently from my postgres console, yields 15 rows. In ES index there seem to be only 6 of them merged into nested documents. (even though ES console claims to have imported 15 rows at a time)
Have I messed up my sql query or is it some bug in elasticsearch-jdbc merging the documents?
Cheers,
Anna
-
river definition:
curl -XPUT 'localhost:9200/_river/nexit_river/_meta' --data @my_river.json
-
my_river.json:
{
"type": "jdbc",
"jdbc": [{
"url": "jdbc:postgresql://localhost/hello",
"user": "hi",
"sql": "SELECT
v1.id as _id,
c.id || 'c' as "tag[id_type]"
FROM exit e
INNER JOIN
venue v1 ON e.id = v1.exit_id
INNER JOIN category_venue cv ON v1.id = cv.venue_id
INNER JOIN category c on c.id = cv.category_id
UNION SELECT
v2.id as _id, a.id || 'a' as "tag[id_type]"
FROM exit e
INNER JOIN venue v2 ON e.id = v2.exit_id
INNER JOIN amenity a ON v2.id = a.venue_id",
"password": "yo",
"treat_binary_as_string": true,
"index": "hello",
"type": "tags",
"fetchsize" : 100000
}
],
"schedule": "0 0-59 0-23 ? * *"
} -
SQL query results:
_id; tag[id_type]
1;"2c"
1;"4c"
1;"22a"
1;"1a"
2;"3c"
3;"3c"
1;"2a"
2;"1c"
3;"1c"
4;"4c"
2;"2c"
2;"4c"
3;"4c"
1;"3c"
1;"1c"
-
elasticsearch query result:
curl 'localhost:9200/hello/tags/_search?pretty'
{"took" : 21, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 4, "max_score" : 1.0, "hits" : [ { "_index" : "hello", "_type" : "tags", "_id" : "4", "_score" : 1.0, "_source":{"tag":[{"id_type":"4c"}]} }, { "_index" : "hello", "_type" : "tags", "_id" : "1", "_score" : 1.0, "_source":{"tag":[{"id_type":"3c"},{"id_type":"1c"}]} }, { "_index" : "hello", "_type" : "tags", "_id" : "2", "_score" : 1.0, "_source":{"tag":[{"id_type":"2c"},{"id_type":"4c"}]} }, { "_index" : "hello", "_type" : "tags", "_id" : "3", "_score" : 1.0, "_source":{"tag":[{"id_type":"4c"}]} } ] } }
-
Elasticsearch console:
[2015-05-13 09:25:00,089][INFO ][river.jdbc.RiverMetrics ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@7ddc13ca complete: river jdbc/hello_river metrics: 15 rows, 0.2862853578324369 mean, (0.0 0.0 0.0), ingest metrics: elapsed 0 seconds, 846.0 bytes bytes, 52.0 bytes avg, 0.009 MB/s
[2015-05-13 09:26:00,002][INFO ][index.shard ] [Kraken] [hello][0] updating refresh_interval from [1s] to [-1]
[2015-05-13 09:26:00,002][INFO ][index.shard ] [Kraken] [hello][1] updating refresh_interval from [1s] to [-1]
[2015-05-13 09:26:00,002][INFO ][index.shard ] [Kraken] [hello][2] updating refresh_interval from [1s] to [-1]
[2015-05-13 09:26:00,002][INFO ][index.shard ] [Kraken] [hello][3] updating refresh_interval from [1s] to [-1]
[2015-05-13 09:26:00,002][INFO ][index.shard ] [Kraken] [hello][4] updating refresh_interval from [1s] to [-1]
[2015-05-13 09:26:00,048][INFO ][index.shard ] [Kraken] [hello][0] updating refresh_interval from [-1] to [1s]
[2015-05-13 09:26:00,048][INFO ][index.shard ] [Kraken] [hello][1] updating refresh_interval from [-1] to [1s]
[2015-05-13 09:26:00,048][INFO ][index.shard ] [Kraken] [hello][2] updating refresh_interval from [-1] to [1s]
[2015-05-13 09:26:00,048][INFO ][index.shard ] [Kraken] [hello][3] updating refresh_interval from [-1] to [1s]
[2015-05-13 09:26:00,048][INFO ][index.shard ] [Kraken] [hello][4] updating refresh_interval from [-1] to [1s]
[2015-05-13 09:26:00,089][INFO ][river.jdbc.RiverMetrics ] pipeline org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@a23e789 complete: river jdbc/hello_river metrics: 30 rows, 0.2669148855001983 mean, (0.0 0.0 0.0), ingest metrics: elapsed 0 seconds, 846.0 bytes bytes, 27.0 bytes avg, 0.009 MB/s