Based on my understanding ES has 4 out of the box options to index data.
although I understand in theory how they work.
I'm wondering if someone has done some real world implementation and
comparison on what works best considering huge volumes of data (100 K
records per hour) in regular intervals that needs to be indexed and error
handling is of importance in case indexing fails.
curl -XPUT - This is perhaps the simplest way to index a document, you just
perform a PUT on a REST endpoint,
this is best seen as an option during development to index documents for
HTTP Bulk API - Push approach to index data, if you have an external
application that consolidates the data in a timely manner
and then formats it to JSON to be indexed. This is much more reliable as
compared to UDP bulk import as you get an acknowledgement
of index operation and can take corrective steps based on the response.
UDP Bulk API - Connectionless datagram protocol. This is faster but not so
E.g. cat bulk.txt | nc -w 0 -u localhost 9700
River plugin - Pull approach, runs within ES node and can pull data from
Can be used when we are expecting a constant change of data that needs to
be indexed and we
don't want to write another external application to push data into ES for
River plugin also supports import using Bulk API, this is usefull in cases
where the river plugin
wants to accumulate the data for certain threshold before performing an
import / indexing.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/groups/opt_out.