Hi guys,
I've just discovered the potential of ES as scalable multi-purpose cache or
even only data store. So far, I've been using RDBMS with MemcacheD or Redis
(for simple queries in the application layer). I've decided to give ES a
try by building a Prototype, but before I dive in I'd much appreciate your
opinions about how I plan to get the data from ES.
The issue might be that my data is highly related and I need to work mainly
with large structures. ES's main task in this regard would be to support a
server process collecting all data items which are within the large
structures. These items are send to a rich client, where the actual
structured views are build.
Here's some example data:
root = {id = 1,
name = "Plane",
subassemblies = [2, 3, 4]}
body = {id = 2,
name = "Body",
subassemblies = [5, 6]}
left_wing = {id = 3,
name = "Wing"
subassemlies = []}
right_wing = {id = 4,
name = "Wing"
subassemlies = []}
uppder_body_structure = {id = 5,
name = "Upper Body"
subassemlies = []}
lower_body_structure = {id = 6,
name = "Lower Body"
subassemlies = []}
So, I would query ES iteratively to get all items, starting with the root
item. About like this in Python pseudocode:
all_item_ids = []
current_root_id = 1
all_item_ids.append(current_root_id)
current_item_ids = [1]
while len(current_item_ids) > 0:
current_item_ids =
query_ES_for_items_by_given_ids_and_return_given_field(current_item_ids,
"subassemlies") # here would come some more advanced query options
all_item_ids.extend(current_item_ids)
send_ids_to_client(all_item_ids) # there's a client cache for the item
data, so I send the ids only
The amount of data is quite large. Up to 100000 rows with up to 50 levels.
So I would possibly end up with queries with 10000 arguments (however only
exact matches need to be considered). Those could be split up into batches,
but that's where I hope to get your opinions. (Hitting ES 50 times wouldn't
make me nervous, but when it comes to a couple of thousand times, something
seems not right. But then, if it took only a couple of seconds overall, I
wouldn't complain :-)).
Is this the right approach to handle large structures? Do you see any
general showstoppers or flaws? (Like limits in query-size ...)
Another question is about storing Thrift oder Protocol Buffers encoded
data. How would you store those for simple get, mget operations? (Those
formats are used for transport and in the client cache, which is basically
a key-value store.)
On top of that I would use fulltext search and general combinded searches
within the whole data. But I have no doubt that ES is the right choice
there. So, if I'd be able to retrieve the structured data in a performant
way, ES would be an awesome powerfull all-in-one solution.
Cheers and thanks for any comments and opinions,
Jan
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.