A lightweight partial index

Hi,

We have an ES index of around 12.5 gb and growing. We use the index in
various ways.

For one use case we hit the index with an 'ids' query - passing around 500
ids typically and then pulling the documents back. The documents are fairly
large and we don't need all the data in them.

Performance is extremely important here, and so I did a test where I hit a
lightweight version of the same index (I re-indexed everything minus all
the fields we don't need into a new index), and I got 3x speed increase.
The new index is around 2 gb.

Obviously this creates complexity because now we have 2 indexes - a
lightweight one and a 'full fat' one.

I'm wondering if anyone has gone with this approach before - is it a
good/bad idea? And it would be great if ES has any smarts and plugins that
would allow us to index into the full-fat one, and have the other
lightweight one indexed also (with just the fields we want).

BTW I did try just fetching back the fields we care about from the full-fat
one rather than having a completely separate index, but this didn't seem to
give us much gain,

thanks,

Jon.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you considered using partial_fieldshttp://www.elasticsearch.org/guide/reference/api/search/fields/ in
your queries? This will lighten the network burden.

On Tuesday, 30 April 2013 06:18:02 UTC-4, Jon Pither wrote:

Hi,

We have an ES index of around 12.5 gb and growing. We use the index in
various ways.

For one use case we hit the index with an 'ids' query - passing around 500
ids typically and then pulling the documents back. The documents are fairly
large and we don't need all the data in them.

Performance is extremely important here, and so I did a test where I hit a
lightweight version of the same index (I re-indexed everything minus all
the fields we don't need into a new index), and I got 3x speed increase.
The new index is around 2 gb.

Obviously this creates complexity because now we have 2 indexes - a
lightweight one and a 'full fat' one.

I'm wondering if anyone has gone with this approach before - is it a
good/bad idea? And it would be great if ES has any smarts and plugins that
would allow us to index into the full-fat one, and have the other
lightweight one indexed also (with just the fields we want).

BTW I did try just fetching back the fields we care about from the
full-fat one rather than having a completely separate index, but this
didn't seem to give us much gain,

thanks,

Jon.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

One thing to note is that the ìdsfilter is not cached by default, but using atermsfilter on the_id` field IS cached, so you may want to try
that instead.

clint

On Tue, Apr 30, 2013 at 1:49 PM, btiernay rtiernay@gmail.com wrote:

Have you considered using partial_fieldshttp://www.elasticsearch.org/guide/reference/api/search/fields/ in
your queries? This will lighten the network burden.

On Tuesday, 30 April 2013 06:18:02 UTC-4, Jon Pither wrote:

Hi,

We have an ES index of around 12.5 gb and growing. We use the index in
various ways.

For one use case we hit the index with an 'ids' query - passing around
500 ids typically and then pulling the documents back. The documents are
fairly large and we don't need all the data in them.

Performance is extremely important here, and so I did a test where I hit
a lightweight version of the same index (I re-indexed everything minus all
the fields we don't need into a new index), and I got 3x speed increase.
The new index is around 2 gb.

Obviously this creates complexity because now we have 2 indexes - a
lightweight one and a 'full fat' one.

I'm wondering if anyone has gone with this approach before - is it a
good/bad idea? And it would be great if ES has any smarts and plugins that
would allow us to index into the full-fat one, and have the other
lightweight one indexed also (with just the fields we want).

BTW I did try just fetching back the fields we care about from the
full-fat one rather than having a completely separate index, but this
didn't seem to give us much gain,

thanks,

Jon.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

For our application, I hand-select what fields get indexed and what get
stored in _source. Depending on your application, you are not likely to
need every field to query on, and not every field need to be retrieved.
There also need not be overlap between the two. So you can index fields
a,d,j and retrieve fields a,b,c.

There is also a difference between storing content in _sourcehttp://www.elasticsearch.org/guide/reference/mapping/source-field/and using "store":"yes" for persistence. store is good for retrieving
individual fields, and if you need to get multiple fields you're probably
better off with just relying of _source.

With some careful balancing of what goes where we are reducing 100GB of raw
data into 2GB of index.

On Tuesday, April 30, 2013 12:48:49 PM UTC-7, Clinton Gormley wrote:

One thing to note is that the ìdsfilter is not cached by default, but using atermsfilter on the_id` field IS cached, so you may want to try
that instead.

clint

On Tue, Apr 30, 2013 at 1:49 PM, btiernay <rtie...@gmail.com <javascript:>

wrote:

Have you considered using partial_fieldshttp://www.elasticsearch.org/guide/reference/api/search/fields/ in
your queries? This will lighten the network burden.

On Tuesday, 30 April 2013 06:18:02 UTC-4, Jon Pither wrote:

Hi,

We have an ES index of around 12.5 gb and growing. We use the index in
various ways.

For one use case we hit the index with an 'ids' query - passing around
500 ids typically and then pulling the documents back. The documents are
fairly large and we don't need all the data in them.

Performance is extremely important here, and so I did a test where I hit
a lightweight version of the same index (I re-indexed everything minus all
the fields we don't need into a new index), and I got 3x speed increase.
The new index is around 2 gb.

Obviously this creates complexity because now we have 2 indexes - a
lightweight one and a 'full fat' one.

I'm wondering if anyone has gone with this approach before - is it a
good/bad idea? And it would be great if ES has any smarts and plugins that
would allow us to index into the full-fat one, and have the other
lightweight one indexed also (with just the fields we want).

BTW I did try just fetching back the fields we care about from the
full-fat one rather than having a completely separate index, but this
didn't seem to give us much gain,

thanks,

Jon.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.