In elasticsearch, indexes are more of a logical namespace. They are
somewhat akin to database table except a lot more flexible. If you were
referring to "indexes" as in the type of index you specify in a relational
database (e.g. index a column which builds a B-Tree under the covers, etc),
there is no concept of a pre-specified index in elasticsearch. All fields
are searchable, because all fields are converted into an inverted-indexhttp://en.wikipedia.org/wiki/Inverted_indexwhich enables fast lookups.
If your search simply requires any field that has the token "alien" inside
it, a simple Match queryhttp://www.elasticsearch.org/guide/reference/query-dsl/match-query/will work for you. If you don't need scoring, a Term
filter http://www.elasticsearch.org/guide/reference/query-dsl/term-filter/will be even faster. If you need prefix matching, so all documents that
have "ali" in them (which will match "alien", "alison", "alias", etc), you
can use Prefix Query.http://www.elasticsearch.org/guide/reference/query-dsl/prefix-query/
There are plenty of more advanced queries you can use too, such as phrase
matching or partial, fuzzy matches. Take a few minutes and look over the
queries that elasticsearch offers. With the right combination of analyzers
and queries, almost any behavior can be created.
Lastly, all of the above queries are very fast. Even the
prefix/fuzzy/wildcard queries are super fast, thanks to some fairly
advanced finite state automatons that operate under the covers.
On Monday, September 16, 2013 3:58:02 PM UTC-4, Doug Wolfgram wrote:
Thanks. That explains a lot. GenieDB has solved the performance issue for
the most part for MySQL with their self-healing methodology, but of course
the two servers are never perfectly in sync. But for most applications, a 1
or 2 second delay is acceptable. The trick is a private VPN between the
servers. For searching across multiple data centers, that could prove to be
Thanks for the info. One quick question, do I have to create indexes for
every possible keyword search, or is free-form search reasonably fast? The
primary problem I am trying to solve is this. "Select all the videos who's
description contains the word 'alien%.' Your basic, old-fashioned, slow
text comparison, wildcard search.
On Monday, September 16, 2013 11:48:35 AM UTC-7, Zachary Tong wrote:
Data is persisted to disk after it has been indexed...you don't lose data
when you restart nodes. If the server goes up in flames, you may lose data
if you don't have replicas spreading the data across the cluster, but
that's a different problem =)
Elasticsearch is built on top of Lucene, which is arguably the most
advanced information retrieval and search library available (including both
open source and commercial products). Elasticsearch benefits directly from
the search capabilities of Lucene - it's pretty powerful.
There isn't really a good tool for multi-datacenter push replication at
the moment. When 1.0 is released, the Snapshot/Restore feature can fill
that role. For now, you'll have to do it manually with something like
rsync. It should be noted this is for delayed synchronization between two
clusters - it isn't recommended to span a single cluster between two
datacenters. The latency makes distributed systems very difficult to work
with, even under the best circumstances.
On Monday, September 16, 2013 11:35:48 AM UTC-4, Doug Wolfgram wrote:
I am new to ES. I have a need to index and search hundreds of millions
of video files. (think Youtube). I am assuming that I could create the
entire system in ES with one of the keys being the url to to video file
itself. Pretty straight forward. On thing that confuses me is the
persistence of ES. From the videos, it seems that if I stop all instances I
lose my data. Is that true? Seems rather odd. Sorry to be so infantile in
my questions but I have very little exposure to ES at this point. I am
assuming that I would simply use ES as I would MongoDB but ES has better,
faster searching. Is this the general consensus?
Also, I have been working with GenieDB for really fast multi-datacenter
replication. Is something like this available for ES?
Cheers! Looking forward to a solid relationship with ES!
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/groups/opt_out.