Hi,
In my first post to this list, I'd like to start a brief discussion
where I can make sure I'm properly understanding the way the clients
for Elasticsearch (ES) work (consider just one single cluster). When I
say clients, I refer to the three types I've seen so far:
- Node:
When creating this client from our application, our application
becomes a node of the ES cluster, but it doesn't store any data; i.e.
my application will execute indexing/searching/etc processes, but the
data will actually be in some other node (which in turn can be in the
same machine/same JVM, same machine/different JVM, different machine/
different JVM). - Local:
When creating this client from our application, our application
behaves like being a node plus it stores data as well locally. - Transport:
When we create this type of client, we're just getting sort of a
pointer to ES cluster, but we are not serving as a node whatsoever. We
can index/search/etc, but our application is not part of the process,
it just delegates the operations to some other node.
If all my assumptions are correct, the typical scenarios for each type
of client would be:
- Node or Local: Imagine an application that basically scraps text
from web pages and index it. I could have this application distributed
in several JVMs (no matter whether they are in the same or different
machines), and then make each JVM have a node client, so all my
application as a whole serves as the indexer. Then I could create
another node client (or nodes), to perform the searches. - Transport: This is actually what I'm looking for. My real scenario
is the following: I have a web site that has a directory of shops in
general. I want to provide a search functionality of all the products
of all the shops classified by category, price, shops they belong to,
etc. I get the products information through an HTTP request in a CSV
format, so basically I want my application to read the CSV file, parse
it, and then index each and every product it finds. However, for
performance reasons, I don't want my application to be a node or the
ES cluster itself, but have another JVM running with the ES cluster.
So I'd have one JVM for my web application, then I would delegate the
indexing to the other JVM running the ES cluster using the transport
client. That way my application would neither be impacted by indexing/
searching process nor would have to store the product's information.
What do you think? Is my preferred scenario feasible with the
transport client? If not, which client should I use then? Why?
Thanks a lot for your support and thanks to ES team for such an
awesome tool.