IndexTank vs Elasticsearch


(Damien Hardy) #1

Hello,
Linkedin released indextank. It look like pretty much with Elasticsearch in
its features

What thinking about this new actor ?

Cheers,

--
Damien


(Clinton Gormley) #2

Linkedin released indextank. It look like pretty much with
Elasticsearch in its features

What thinking about this new actor ?
nosql.mypopescu.com/post/14584603231/linkedin-open-sources-indextank-what-is-indextank-an

All I know about indextank comes from a few pages on the web. It looks
like quite a nice SaaS service, and easy to start using, but I think the
feature set of ES is a good deal richer.

just my 0.02€

clint


(Lukáš Vlček) #3

Hi,

I think it looks interesting, but it is quite hard to say how exactly it
compares to elasticsearch right now. It reminds me when elasticsearch was
made available and many people started asking how it compares to SolrCloud,
which is legitimate question but it is hard to answer if you want to talk
about it only from the highlevel API POW because then they might look quite
similar. In my opinion the "devil" is in details, which in this case means
detail understanding of underlying design. Fortunately, IndexTank sources
are now available, on the other hand I was not able to find a single unit
test in the indextank-engine, so hard to say... apart from this one would
have to either compile and try it out to do the comparison or ask on some
forum or ML (but not sure if there is any at this point).

Based on
http://indextank.com/_static/papers/IndexTank%20WhitePaper%20Technical.pdf it
seems that IndexTank could have some interesting features, especially it
seems it has some kind of fast (possibly distributed?) in-memory storage
for portion of document data (they call it document variables) which they
can update frequently and made available for search, scoring calculations
and faceting. Personally, I think elasticsearch will get similar feature as
well one day and the fact that IndexTank already built something like that
means that elasticsearch can learn from this lesson and may be deliver
something similar (or even better). But this is probably not a trivial
function to build, especially in distributed search engines and current
documentation of IndexTank does not uncover a lot of design details.

From the documentations that is available, I was not able to understand how
exactly IndexTank distributes indexing and search (do they distribute
Lucene shards or individual documents?), how it does sharding and if it is
possible to have replicas and change their number dynamically. Also it
seems to me that a single node (in elasticsearch terms) provides more out
of the box functionality which is implemented "externaly" in IndexTank
(autodiscovery, loadbalaning, recovery...)

Also it seems to me that faceting capabilities are not that rich and one
has to predefine a lot of things ahead, in ES they are more flexible IMO.
The same apply to query DSL.

Also I noted that IndexTank seems to use Lucene 3.0.1
https://github.com/linkedin/indextank-engine/blob/master/pom.xml#L19 which
is more then year old. Not sure how hard it would be to migrate to new
Lucene version but is interesting observation IMO.

Regards,
Lukas

On Thu, Dec 22, 2011 at 1:45 PM, Damien Hardy damienhardy.bal@gmail.comwrote:

Hello,
Linkedin released indextank. It look like pretty much with Elasticsearch
in its features

What thinking about this new actor ?

nosql.mypopescu.com/post/14584603231/linkedin-open-sources-indextank-what-is-indextank-an

Cheers,

--
Damien


(Karussell) #4

Thanks Lukas for this quick review! I think missing tests is bad not
only in terms of managability but they would also show features and
act as missing documentation.

Peter.

On 22 Dez., 16:06, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think it looks interesting, but it is quite hard to say how exactly it
compares to elasticsearch right now. It reminds me when elasticsearch was
made available and many people started asking how it compares to SolrCloud,
which is legitimate question but it is hard to answer if you want to talk
about it only from the highlevel API POW because then they might look quite
similar. In my opinion the "devil" is in details, which in this case means
detail understanding of underlying design. Fortunately, IndexTank sources
are now available, on the other hand I was not able to find a single unit
test in the indextank-engine, so hard to say... apart from this one would
have to either compile and try it out to do the comparison or ask on some
forum or ML (but not sure if there is any at this point).

Based onhttp://indextank.com/_static/papers/IndexTank%20WhitePaper%20Technica...it
seems that IndexTank could have some interesting features, especially it
seems it has some kind of fast (possibly distributed?) in-memory storage
for portion of document data (they call it document variables) which they
can update frequently and made available for search, scoring calculations
and faceting. Personally, I think elasticsearch will get similar feature as
well one day and the fact that IndexTank already built something like that
means that elasticsearch can learn from this lesson and may be deliver
something similar (or even better). But this is probably not a trivial
function to build, especially in distributed search engines and current
documentation of IndexTank does not uncover a lot of design details.

From the documentations that is available, I was not able to understand how
exactly IndexTank distributes indexing and search (do they distribute
Lucene shards or individual documents?), how it does sharding and if it is
possible to have replicas and change their number dynamically. Also it
seems to me that a single node (in elasticsearch terms) provides more out
of the box functionality which is implemented "externaly" in IndexTank
(autodiscovery, loadbalaning, recovery...)

Also it seems to me that faceting capabilities are not that rich and one
has to predefine a lot of things ahead, in ES they are more flexible IMO.
The same apply to query DSL.

Also I noted that IndexTank seems to use Lucene 3.0.1https://github.com/linkedin/indextank-engine/blob/master/pom.xml#L19which
is more then year old. Not sure how hard it would be to migrate to new
Lucene version but is interesting observation IMO.

Regards,
Lukas

On Thu, Dec 22, 2011 at 1:45 PM, Damien Hardy damienhardy....@gmail.comwrote:

Hello,
Linkedin released indextank. It look like pretty much with Elasticsearch
in its features

What thinking about this new actor ?

nosql.mypopescu.com/post/14584603231/linkedin-open-sources-indextank-what-is-indextank-an

Cheers,

--
Damien


(Shay Banon) #5

As Lukas noted, its hard to compare... . I had a quick look at the source
code, and its, well..., interesting :). Regarding the fast "variables"
updates that allow for updates, yes, it is planned for elasticsearch (I
alluded to it in several IRC conversations and answers on the mailing
list), so I had a quick look at how its implemented in index tank, and its
very problematic (it flushes all of it periodically to disk, not sure how
it survives failures, and of course, how it works in distributed env).

I hope people will actually start to put it to good use so proper
knowledgable comparisons can be made.

On Thu, Dec 22, 2011 at 7:20 PM, Karussell tableyourtime@googlemail.comwrote:

Thanks Lukas for this quick review! I think missing tests is bad not
only in terms of managability but they would also show features and
act as missing documentation.

Peter.

On 22 Dez., 16:06, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think it looks interesting, but it is quite hard to say how exactly it
compares to elasticsearch right now. It reminds me when elasticsearch was
made available and many people started asking how it compares to
SolrCloud,
which is legitimate question but it is hard to answer if you want to talk
about it only from the highlevel API POW because then they might look
quite
similar. In my opinion the "devil" is in details, which in this case
means
detail understanding of underlying design. Fortunately, IndexTank sources
are now available, on the other hand I was not able to find a single unit
test in the indextank-engine, so hard to say... apart from this one would
have to either compile and try it out to do the comparison or ask on some
forum or ML (but not sure if there is any at this point).

Based onhttp://
indextank.com/_static/papers/IndexTank%20WhitePaper%20Technica...it
seems that IndexTank could have some interesting features, especially it
seems it has some kind of fast (possibly distributed?) in-memory storage
for portion of document data (they call it document variables) which they
can update frequently and made available for search, scoring calculations
and faceting. Personally, I think elasticsearch will get similar feature
as
well one day and the fact that IndexTank already built something like
that
means that elasticsearch can learn from this lesson and may be deliver
something similar (or even better). But this is probably not a trivial
function to build, especially in distributed search engines and current
documentation of IndexTank does not uncover a lot of design details.

From the documentations that is available, I was not able to understand
how
exactly IndexTank distributes indexing and search (do they distribute
Lucene shards or individual documents?), how it does sharding and if it
is
possible to have replicas and change their number dynamically. Also it
seems to me that a single node (in elasticsearch terms) provides more out
of the box functionality which is implemented "externaly" in IndexTank
(autodiscovery, loadbalaning, recovery...)

Also it seems to me that faceting capabilities are not that rich and one
has to predefine a lot of things ahead, in ES they are more flexible IMO.
The same apply to query DSL.

Also I noted that IndexTank seems to use Lucene 3.0.1
https://github.com/linkedin/indextank-engine/blob/master/pom.xml#L19which
is more then year old. Not sure how hard it would be to migrate to new
Lucene version but is interesting observation IMO.

Regards,
Lukas

On Thu, Dec 22, 2011 at 1:45 PM, Damien Hardy <damienhardy....@gmail.com
wrote:

Hello,
Linkedin released indextank. It look like pretty much with
Elasticsearch

in its features

What thinking about this new actor ?

nosql.mypopescu.com/post/14584603231/linkedin-open-sources-indextank-what-is-indextank-an

Cheers,

--
Damien


(Lukáš Vlček) #6

Exactly, I was looking for unit tests because I was hoping they would show
me how to use it. On the other hand, the chance is that they simply did not
open source tests (yet) for some reason, because I can not imagine
implementing something like that completely without tests. So may be they
will be added later...?

On Thu, Dec 22, 2011 at 6:20 PM, Karussell tableyourtime@googlemail.comwrote:

Thanks Lukas for this quick review! I think missing tests is bad not
only in terms of managability but they would also show features and
act as missing documentation.

Peter.

On 22 Dez., 16:06, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think it looks interesting, but it is quite hard to say how exactly it
compares to elasticsearch right now. It reminds me when elasticsearch was
made available and many people started asking how it compares to
SolrCloud,
which is legitimate question but it is hard to answer if you want to talk
about it only from the highlevel API POW because then they might look
quite
similar. In my opinion the "devil" is in details, which in this case
means
detail understanding of underlying design. Fortunately, IndexTank sources
are now available, on the other hand I was not able to find a single unit
test in the indextank-engine, so hard to say... apart from this one would
have to either compile and try it out to do the comparison or ask on some
forum or ML (but not sure if there is any at this point).

Based onhttp://
indextank.com/_static/papers/IndexTank%20WhitePaper%20Technica...it
seems that IndexTank could have some interesting features, especially it
seems it has some kind of fast (possibly distributed?) in-memory storage
for portion of document data (they call it document variables) which they
can update frequently and made available for search, scoring calculations
and faceting. Personally, I think elasticsearch will get similar feature
as
well one day and the fact that IndexTank already built something like
that
means that elasticsearch can learn from this lesson and may be deliver
something similar (or even better). But this is probably not a trivial
function to build, especially in distributed search engines and current
documentation of IndexTank does not uncover a lot of design details.

From the documentations that is available, I was not able to understand
how
exactly IndexTank distributes indexing and search (do they distribute
Lucene shards or individual documents?), how it does sharding and if it
is
possible to have replicas and change their number dynamically. Also it
seems to me that a single node (in elasticsearch terms) provides more out
of the box functionality which is implemented "externaly" in IndexTank
(autodiscovery, loadbalaning, recovery...)

Also it seems to me that faceting capabilities are not that rich and one
has to predefine a lot of things ahead, in ES they are more flexible IMO.
The same apply to query DSL.

Also I noted that IndexTank seems to use Lucene 3.0.1
https://github.com/linkedin/indextank-engine/blob/master/pom.xml#L19which
is more then year old. Not sure how hard it would be to migrate to new
Lucene version but is interesting observation IMO.

Regards,
Lukas

On Thu, Dec 22, 2011 at 1:45 PM, Damien Hardy <damienhardy....@gmail.com
wrote:

Hello,
Linkedin released indextank. It look like pretty much with
Elasticsearch

in its features

What thinking about this new actor ?

nosql.mypopescu.com/post/14584603231/linkedin-open-sources-indextank-what-is-indextank-an

Cheers,

--
Damien


(system) #7