Newbie questions

Jondow · November 10, 2010, 5:37pm

I need a few pointers with some (no doubt) very basic questions:

Using the Java API the best approach seems to be using NodeBuilder,
and closing this node on shutdown. In a web application does shutdown
mean the shutdown of the web application (which means NodeBuilders
node is thread-safe) or does shutdown mean the end of the request-
response lifecycle? What is the best way to manage this.
I really like the multi-tenancy feature indicated in the product
docs. However, looking at the examples, the multitenancy would allow
for (simplified for clarity):

PUT localhost/tenant1/foo
PUT localhost/tenant2/foo
PUT localhost/tenant3/foo

I have a basic object model design that follows the hierarchy of:

Account -> Profile -> Device

This means that an Account (tenant) has 1 or more Profiles which each
have 0 or more Devices.

If I wanted to be able to do a text search on, say, "Acme" and find
all devices where a field of a Device or a field of the Device's
Profile has the word "Acme" in it, how would I structure my indexes?
Looking at the examples I put above, tenant would be the account, but
would foo be 'profile' or 'device', or do I do both? ie /account1/
profile and account1/device? Probably too many questions and not
enough context.

Following on from 2, how would I structure the JSON that I submit
to index a set of Devices to satisfy such a search? Do I first index
the Profiles then index the Devices with the id of its Profile in its
document? But by doing that, I don't see that searching on Acme would
return only 'device' types, which is what I want.

Thanks,
Darryl

kimchy · November 10, 2010, 11:16pm

Hey,

On Wed, Nov 10, 2010 at 7:37 PM, Jondow djpentz@gmail.com wrote:

I need a few pointers with some (no doubt) very basic questions:

Using the Java API the best approach seems to be using NodeBuilder,
and closing this node on shutdown. In a web application does shutdown
mean the shutdown of the web application (which means NodeBuilders
node is thread-safe) or does shutdown mean the end of the request-
response lifecycle? What is the best way to manage this.

When using the Java API, you want to get a Client reference and use that
throughout the lifecycle of the your app. Its completely thread safe. If you
build the client with a Node in client mode, then make sure to close the
client and the node when shutting down, if its using the TransportClient,
make sure to close just the client.

I really like the multi-tenancy feature indicated in the product
docs. However, looking at the examples, the multitenancy would allow
for (simplified for clarity):

PUT localhost/tenant1/foo
PUT localhost/tenant2/foo
PUT localhost/tenant3/foo

I have a basic object model design that follows the hierarchy of:

Account -> Profile -> Device

This means that an Account (tenant) has 1 or more Profiles which each
have 0 or more Devices.

If I wanted to be able to do a text search on, say, "Acme" and find
all devices where a field of a Device or a field of the Device's
Profile has the word "Acme" in it, how would I structure my indexes?
Looking at the examples I put above, tenant would be the account, but
would foo be 'profile' or 'device', or do I do both? ie /account1/
profile and account1/device? Probably too many questions and not
enough context.

First of all, note that there is a limit to the number of indices you can
create on a box, since each shard (within each index) is a Lucene index, and
it does come with an overhead.

You can go with a single index, or multiple indexes, but at the end, you
will need to de-normalize you data into documents, and then search on it.

Following on from 2, how would I structure the JSON that I submit
to index a set of Devices to satisfy such a search? Do I first index
the Profiles then index the Devices with the id of its Profile in its
document? But by doing that, I don't see that searching on Acme would
return only 'device' types, which is what I want.

You have two options, either embed the profile data in each device, or the
other way around. And yes, sadly it does mean that you will need to reindex
a chunk of your date if part of the data changes.

Thanks,
Darryl