I have like billion records on 20 nodes and would like to run custom
map/reduce or "aggregation" (word count,sentiment analysis,etc) immediately
after the ES result set is determined.
I came up with using Plugin system to customize "aggregation" like this:
but was not sure about the memory usage or customization,I decide to run
hazelcast or Spark on the same node or jvm and use their map/reduce
framework.I use Filter phase to put the ES data like this:
but it just takes quite long time to put data on those in-memory
middleware...
Is there any best practice to put ES data to in-memory middleware, just to
re-use the same data efficiently in subsequent program?
I don't think I can use the ES query result set (on each shard) which seems
to be on memory ,in my program,am I right?
I have like billion records on 20 nodes and would like to run custom
map/reduce or "aggregation" (word count,sentiment analysis,etc) immediately
after the ES result set is determined.
I came up with using Plugin system to customize "aggregation" like this:
but was not sure about the memory usage or customization,I decide to run
hazelcast or Spark on the same node or jvm and use their map/reduce
framework.I use Filter phase to put the ES data like this:
but it just takes quite long time to put data on those in-memory
middleware...
Is there any best practice to put ES data to in-memory middleware, just to
re-use the same data efficiently in subsequent program?
I don't think I can use the ES query result set (on each shard) which
seems to be on memory ,in my program,am I right?
I use hazelcast on same jvm and run map/reduce in memory.It works really
well.For about 100000 blog datas and word count,es request with hz
map/reduce finish in less than 3 seconds.
I have like billion records on 20 nodes and would like to run custom
map/reduce or "aggregation" (word count,sentiment analysis,etc) immediately
after the ES result set is determined.
I came up with using Plugin system to customize "aggregation" like this: https://github.com/algolia/elasticsearch-cardinality-
plugin/tree/1.0.X/src/main/java/org/alg/elasticsearch/
search/aggregations/cardinality
but it just takes quite long time to put data on those in-memory
middleware...
Is there any best practice to put ES data to in-memory middleware, just
to re-use the same data efficiently in subsequent program?
I don't think I can use the ES query result set (on each shard) which
seems to be on memory ,in my program,am I right?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.