Should I use Java Client or REST API and few other architectural questions?

I am undecided regarding whether to use java client or REST API in my case
and would greatly appreciate your opinions.

We have a fairly small number of records - less than 1M (currently 0.3M).
They are around 2K in size and have over 100 fields (size and number of
fields will grow).
It will be a read heavy application with may be a 100-200 of docs updated
every minute (every minute (or any predefined interval) batch updates will
posted from our main application).

The main ExtJS rich application currently connects directly from user
browser to ES via REST (data is not channeled via our app server) and does
lots of faceting and heavy searches. In the future, due to security
requirements we may need to enforce access control to groups of documents
so we may need to use RestChannel or some other layer to do so and data
will be channeled via the main app server or the whole ES node will be
running as embedded with authentication and ACL performed by web container.
Data can be reindexed fairly easily so loss of index is an operational
concern (affect users) but not a concern as far as loss of the data

First of all we are planning to use single node with one shard. mostly
because with small number of documents and precision requirements for facet
counts and searches sharding will be detrimental and partially because the
server will be supported by "generic" unix support stuff and I cringe
thinking about all potential clustering issues after lurking on this groups
for a while.

Secondly, at least for now, I am planning to host code which pushes updates
from main system to ES in the main application. Chief reason is that I am
most concerned with the effect of the indexing of the updated records on
our Oracle database. Domain objects to be indexed involve dozens of tables
producing heavy load on Oracle and doing it in the main app server will let
me use in-process caches (JDO secondary cache) and significantly reduce
load on the database

Now the question:

  1. Use REST bulk API for bulk updates - will let me avoid ES dependencies
    and upgrade ES server without having to release the application. Same with
    search REST API although currently our ExtJS java script application mostly
    consumes ES data directly from the browser. Independent deployments are
    very attractive. I do not want ES upgrades to force application releases
    with full regression and acceptance testing and client involvement.

  2. Use native java client - tightly couple our app and ES. If I use it I
    would rather move batch update from the app to either our ES data node (we
    are pretty low volume as I described above) or a separate indexer node. The
    downside is lot heavier impact on oracle database due to having to read
    from domain objects from database rather than app server caches

So while I would love to use native client, my current thinking is to use
REST API via Apache HttpClient to do bulk indexing and whatever searches we
need until we have a clear need to use the native client. By that time ES
may have a light weight native client less sensitive to ES server version
changes

Your thoughts will be greatly appreciated
Alex

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It boils down if you want to use different (older) clients with (newer)
servers, if you want failsafe clients, and if you care for performance.
You can't use different versions in Java API or different Java VMs
between server and client without getting into trouble. If you can't use
non-HTTP ports, but must use HTTP for web apps, that is also an argument
for a HTTP client. With HTTP you can proxy the connection which may help
in some network settings.

WIth a TransportClient, you can automatically connect to many nodes, for
fault tolerance. This is not possible with a HTTP client where you must
configure the nodes manually.

Note, also the REST API may change. I think more features will be added.
But it may also be the case some REST parameters will go away. I hope
the ES team will mark parameters or endpoints as obsolete before they
will get dropped/unsupported. There are many good things going on.

I have started some months ago an effort to wrap the Java API into a
Netty HTTP client, but it's not complete, and I hope to continue after
ES 1.00 with the work. By doing this, a Java programmer can switch
between TransportClient and a HttpTransportClient transparently without
changing a single line of code. Note, there is an extra overhead in
serializing and deserializing data for the REST channel which can be
significant. It's an extra Jackson ObjectMapper run to parse JSON into
Java. Bulk indexing will be awkward with a Java HTTP client - every
request will get serialized (and maybe compressed) twice on the way to
the target shard.

My motivation for a HTTP client is ease of use for Java client
application programmers. I think having a Java HTTP client for ES is the
best method to provide a client version that can connect to many
different ES versions, over the Web, with a small subset of the full
command set - just for simple index, search, and get operations, which
will be sufficient for connecting third party web apps to ES using
standard port 80. This ease of use will come at a price, less
performance, and less fault-tolerance.

Jörg

Am 22.04.13 16:41, schrieb AlexR:

I am undecided regarding whether to use java client or REST API in my
case and would greatly appreciate your opinions.

We have a fairly small number of records - less than 1M (currently
0.3M). They are around 2K in size and have over 100 fields (size and
number of fields will grow).
It will be a read heavy application with may be a 100-200 of docs
updated every minute (every minute (or any predefined interval) batch
updates will posted from our main application).

The main ExtJS rich application currently connects directly from user
browser to ES via REST (data is not channeled via our app server) and
does lots of faceting and heavy searches. In the future, due to
security requirements we may need to enforce access control to groups
of documents so we may need to use RestChannel or some other layer to
do so and data will be channeled via the main app server or the whole
ES node will be running as embedded with authentication and ACL
performed by web container.
Data can be reindexed fairly easily so loss of index is an operational
concern (affect users) but not a concern as far as loss of the data

First of all we are planning to use single node with one shard. mostly
because with small number of documents and precision requirements for
facet counts and searches sharding will be detrimental and partially
because the server will be supported by "generic" unix support stuff
and I cringe thinking about all potential clustering issues after
lurking on this groups for a while.

Secondly, at least for now, I am planning to host code which pushes
updates from main system to ES in the main application. Chief reason
is that I am most concerned with the effect of the indexing of the
updated records on our Oracle database. Domain objects to be indexed
involve dozens of tables producing heavy load on Oracle and doing it
in the main app server will let me use in-process caches (JDO
secondary cache) and significantly reduce load on the database

Now the question:

  1. Use REST bulk API for bulk updates - will let me avoid ES
    dependencies and upgrade ES server without having to release the
    application. Same with search REST API although currently our ExtJS
    java script application mostly consumes ES data directly from the
    browser. Independent deployments are very attractive. I do not want ES
    upgrades to force application releases with full regression and
    acceptance testing and client involvement.

  2. Use native java client - tightly couple our app and ES. If I use it
    I would rather move batch update from the app to either our ES data
    node (we are pretty low volume as I described above) or a separate
    indexer node. The downside is lot heavier impact on oracle database
    due to having to read from domain objects from database rather than
    app server caches

So while I would love to use native client, my current thinking is to
use REST API via Apache HttpClient to do bulk indexing and whatever
searches we need until we have a clear need to use the native client.
By that time ES may have a light weight native client less sensitive
to ES server version changes

Your thoughts will be greatly appreciated
Alex

You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.