Elasticsearch


(thomasmueller2@hushmail.com) #1

Hello,

We had a look on GSA and OpenSearchServer and are wondering if
elasticsearch is something similar (standalone fulltextretrieval system),
e.g. where the enduser can enter searchterms in a browser based search mask?

Thank you for any feedback!

Thomas


(Michael Sick) #2

Hi Thomas,

Will help the best that I can. First, I'm assuming GSA is Google Search
Appliance vs. you were looking for solutions on the General Services
Administration (which could make sense with the word "on").

ElasticSearch is best described on the home page "Distributed, RESTful,
Search Engine built on top of Apache Lucene". It is not an Appliance like
GSA - it also is not a crawler.

Vs OSS, from the OSS page I see:

  • A crawler allows you to index web pages, documents from files on local
    and remote systems and contents from any JDBC Database, such as Oracle,
    MySql, and Microsoft SQL Server and more,

ES does not have crawler modules AFAIK. ES is built for near real-time
indexing and retrieval. The projects that I've done and seen focus less on
nightly updates in batch than they do continuous feeds. ES' abstraction for
this type of work is the River API - see
http://www.elasticsearch.org/guide/reference/river/ as there are Rivers for
several data sources. Most often, people just use the Index API and submit
content from their application.

  • Full text analyzers and filters allowing optimized and efficient
    searches in 16 languages and indexing performance,

Yes. ES leverages Lucene which is a very mature text analysis / search
system. I'd be surprised (thought it's happened before) to see ES/Lucene be
outdone on the search basics.

  • An indexer that creates, updates the index and presents the answers to
    queries using the most efficient algorithms for best performance and
    response times,

Yes ES has an indexer and yes it is fast.

  • Html renderer allowing an easy integration of the Search box in an
    html/xhtml page, working with php and .Net, client library and xml over
    http API.

No. ES works on REST using JSON and I'd argue that the approach has some
benefits to not incorporating a 3rd party HTML requirement into
your application. Likely just a style preference but where I can I like to
build open.

  • Parsers allowing you to get content and metadata from most documents
    and formats, such as MS Office, OpenOffice, html/xhtml, xml, Adobe pdf,
    rtf, txt, mp3/4, wav, torrents and more.

Yes to some. ES leverages Apache Tika for parsing documents. Not sure
what's available for audio files.

  • A series of caches to accelerate processes and deliver powerful search
    applications,

Yes. See the Filter API - it has quite powerful parsing and I've found ES's
performance to be solid and it holds up under continuous updates to the
index.

  • A monitoring and administration module offering an alerting service
    which checks that your index is always updated and working well and that
    the necessary hardware resources are available.

ES can integrate with monitoring solutions directly or via JMX. The
monitoring is "good enough" - I'd like to see more bright, shiny monitoring
tools going forward.

  • An integrated Scheduler service can be used to create simple or
    complex jobs and run them automatically.

No. Not sure why an internal scheduler would be an advantage. ES does keep
track of TTL (Time to Live) for submitted documents and will delete/evict
on a schedule.

  • Comprehensive online documentation to provide you all the help you
    might need when learning to use features and creating your applications,

Wish there was an ES book but the documentation works.

  • Advanced functionality: faceting, clustering, filters, snippets,
    synonyms, stopwords, categorization, “find similar”, automatic thumbnail
    screenshot inclusion,

Facets - yes, quite strong. Filters/Snippets - not sure what is being
described. Synonyms/Stop Words/Similar - yes, in Lucene. Thumbnailing - I
don't think so.

  • An OpenSearchServer Drupal Module, a Wordpress module.

No but there are PHP clients and the core API is REST/JSON so there's
nothing you can't do in PHP.

Hope this helps --Mike

On Mon, Mar 12, 2012 at 2:55 PM, thomasmueller2@hushmail.com <
thomasmueller2@hushmail.com> wrote:

Hello,

We had a look on GSA and OpenSearchServer and are wondering if
elasticsearch is something similar (standalone fulltextretrieval system),
e.g. where the enduser can enter searchterms in a browser based search
mask?

Thank you for any feedback!

Thomas


(system) #3