What am i doing:
Parsing 50-100 GB of information per day from API (most part of information are duplicated, stored less than 1/1000 or even 1/10000 of information).
Question:
Is there any chance that ElasticSearch have a build-in alghoritm that can deliver information in real time to the clients with specific quary? If no, how would you do that?
PS Currently i'm in a planning stage of my development environment. Planning to use Node.js + MongoDB (for static) + ElasticSearch, but i might change it if there are better way to implement that feature.
It looks like to me what percolator feature is built for but you will have to write some code to have the exact feature you want as it does not OOTB streaming data to users.
It's "just" a system which compares a document to previously registered queries.
I do understand that i will have to write code 100% and i'm not scared of it. The question is almost "How to do that most efficient way?". I have only 1 idea - send data to the clients when i parse them via WebSocket, but that is seems not very good and very resource-intensive. Well, i can say that i'm a student and i don't have expirience of building that kind of architecture. I'm scared to create a poop (sorry). Can you help me with architechure a little bit?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.