Designing an index that holds updating product data feeds

Ori_P · November 3, 2014, 9:31pm

I would appreciate your suggestions in helping me design my elasticsearch
index.

I'm intending to index product feeds from about 20 on-line stores, each
store not having more than 20,000 products. each product has about 15 basic
fields.
Most of the searches would be done on specific product categories, and not
specific stores.

Each store feed is updated every few days (each store separately), by
receiving an XML file containing all the products in the store (no deltas).
Each update, I need to remove from my index all the existing products from
that store and add the new ones.

I thought of two possibles approaches:

Create a single index + an alias to that index. Once a new feed is
received, clone the existing index to a new index, remove from the new
index all the old products, add the new products and finally change the
alias to point to the new index.
Create an index for each store, and an alias that points to all of the
indices. Once a new feed is received, just index it from scratch, remove
the old store index from the alias and add the new one.

I'm not sure which way will give me faster search results? or maybe there
is an even better approach I didn't think of...

Thanks in advance,

Ori

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · November 3, 2014, 9:54pm

I don't see any benefit of solution 1.

I would definitely do solution 2.

I don't really think you could see a difference search time wise. But in term of IO 2 is better.
Also, you should modify refresh interval while indexing to -1 and call refresh after the bulk load.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 nov. 2014 à 21:31, Ori P shalti@gmail.com a écrit :

I would appreciate your suggestions in helping me design my elasticsearch index.

I'm intending to index product feeds from about 20 on-line stores, each store not having more than 20,000 products. each product has about 15 basic fields.
Most of the searches would be done on specific product categories, and not specific stores.

Each store feed is updated every few days (each store separately), by receiving an XML file containing all the products in the store (no deltas). Each update, I need to remove from my index all the existing products from that store and add the new ones.

I thought of two possibles approaches:

Create a single index + an alias to that index. Once a new feed is received, clone the existing index to a new index, remove from the new index all the old products, add the new products and finally change the alias to point to the new index.

Create an index for each store, and an alias that points to all of the indices. Once a new feed is received, just index it from scratch, remove the old store index from the alias and add the new one.

I'm not sure which way will give me faster search results? or maybe there is an even better approach I didn't think of...

Thanks in advance,

Ori

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AF883E98-1AD1-4309-8062-19CFF9EAA246%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Ori_P · November 3, 2014, 10:01pm

Thanks for replying David.

I thought approach 2 might be problematic since the alias on multiple
indices would cause a query to run on every index separately, which I
thought might slow things down. Apparently I was wrong?

And thanks for the tip about the refresh interval

On Monday, November 3, 2014 11:54:38 PM UTC+2, David Pilato wrote:

I don't see any benefit of solution 1.

I would definitely do solution 2.

I don't really think you could see a difference search time wise. But in
term of IO 2 is better.
Also, you should modify refresh interval while indexing to -1 and call
refresh after the bulk load.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 nov. 2014 à 21:31, Ori P <sha...@gmail.com <javascript:>> a écrit :

I would appreciate your suggestions in helping me design my elasticsearch
index.

I'm intending to index product feeds from about 20 on-line stores, each
store not having more than 20,000 products. each product has about 15 basic
fields.
Most of the searches would be done on specific product categories, and not
specific stores.

Each store feed is updated every few days (each store separately), by
receiving an XML file containing all the products in the store (no deltas).
Each update, I need to remove from my index all the existing products from
that store and add the new ones.

I thought of two possibles approaches:

Create a single index + an alias to that index. Once a new feed is
received, clone the existing index to a new index, remove from the new
index all the old products, add the new products and finally change the
alias to point to the new index.

Create an index for each store, and an alias that points to all of the
indices. Once a new feed is received, just index it from scratch, remove
the old store index from the alias and add the new one.

I'm not sure which way will give me faster search results? or maybe there
is an even better approach I didn't think of...

Thanks in advance,

Ori

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · November 3, 2014, 10:21pm

Hmmm. Sounds like I misread what you explained in 2.

I missed the fact you want to have one index per store. So let me change my answer.
If a single index, one shard, can hold your 400 000 docs which sounds reasonable to me, then one single index will be faster than querying 20 indices.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 3 nov. 2014 à 23:01, Ori P shalti@gmail.com a écrit :

Thanks for replying David.

I thought approach 2 might be problematic since the alias on multiple indices would cause a query to run on every index separately, which I thought might slow things down. Apparently I was wrong?

And thanks for the tip about the refresh interval

On Monday, November 3, 2014 11:54:38 PM UTC+2, David Pilato wrote:
I don't see any benefit of solution 1.

I would definitely do solution 2.

I don't really think you could see a difference search time wise. But in term of IO 2 is better.
Also, you should modify refresh interval while indexing to -1 and call refresh after the bulk load.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 nov. 2014 à 21:31, Ori P <sha...@gmail.com <javascript:>> a écrit :

I would appreciate your suggestions in helping me design my elasticsearch index.

I'm intending to index product feeds from about 20 on-line stores, each store not having more than 20,000 products. each product has about 15 basic fields.
Most of the searches would be done on specific product categories, and not specific stores.

Each store feed is updated every few days (each store separately), by receiving an XML file containing all the products in the store (no deltas). Each update, I need to remove from my index all the existing products from that store and add the new ones.

I thought of two possibles approaches:

Create a single index + an alias to that index. Once a new feed is received, clone the existing index to a new index, remove from the new index all the old products, add the new products and finally change the alias to point to the new index.

Create an index for each store, and an alias that points to all of the indices. Once a new feed is received, just index it from scratch, remove the old store index from the alias and add the new one.

I'm not sure which way will give me faster search results? or maybe there is an even better approach I didn't think of...

Thanks in advance,

Ori

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4CB2DC5E-6512-4933-BA26-DDE45792D531%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Ori_P · November 3, 2014, 10:43pm

And if I may ask, do you have a suggestion on how to update the single
index? I need to replace on a daily basis a bulk of about 20,000 documents
at once, with as little performance and data availability implications as
possible.

On Tuesday, November 4, 2014 12:21:51 AM UTC+2, David Pilato wrote:

Hmmm. Sounds like I misread what you explained in 2.

I missed the fact you want to have one index per store. So let me change
my answer.
If a single index, one shard, can hold your 400 000 docs which sounds
reasonable to me, then one single index will be faster than querying 20
indices.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
http://Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
https://twitter.com/scrutmydocs

Le 3 nov. 2014 à 23:01, Ori P <sha...@gmail.com <javascript:>> a écrit :

Thanks for replying David.

I thought approach 2 might be problematic since the alias on multiple
indices would cause a query to run on every index separately, which I
thought might slow things down. Apparently I was wrong?

And thanks for the tip about the refresh interval

On Monday, November 3, 2014 11:54:38 PM UTC+2, David Pilato wrote:

I don't see any benefit of solution 1.

I would definitely do solution 2.

I don't really think you could see a difference search time wise. But in
term of IO 2 is better.
Also, you should modify refresh interval while indexing to -1 and call
refresh after the bulk load.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 nov. 2014 à 21:31, Ori P sha...@gmail.com a écrit :

I would appreciate your suggestions in helping me design my elasticsearch
index.

I'm intending to index product feeds from about 20 on-line stores, each
store not having more than 20,000 products. each product has about 15 basic
fields.
Most of the searches would be done on specific product categories, and
not specific stores.

Each store feed is updated every few days (each store separately), by
receiving an XML file containing all the products in the store (no deltas).
Each update, I need to remove from my index all the existing products from
that store and add the new ones.

I thought of two possibles approaches:

Create a single index + an alias to that index. Once a new feed is
received, clone the existing index to a new index, remove from the new
index all the old products, add the new products and finally change the
alias to point to the new index.

Create an index for each store, and an alias that points to all of the
indices. Once a new feed is received, just index it from scratch, remove
the old store index from the alias and add the new one.

I'm not sure which way will give me faster search results? or maybe there
is an even better approach I didn't think of...

Thanks in advance,

Ori

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e4d869d-f09b-4f20-b2ca-4639c4a7bab4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · November 3, 2014, 11:43pm

Well. I’m use to run demo where I can inject on my laptop (SSD drives) around 8k to 10k doc per second.
I think the biggest problem you can have is to read your source documents not to write them to elasticsearch.

With a single index, I would probably reindex the 400 000 docs every day in a new a clean index and then switch the alias from old to new index.

But it depends on your read rate I guess.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 3 nov. 2014 à 23:43, Ori P shalti@gmail.com a écrit :

And if I may ask, do you have a suggestion on how to update the single index? I need to replace on a daily basis a bulk of about 20,000 documents at once, with as little performance and data availability implications as possible.

On Tuesday, November 4, 2014 12:21:51 AM UTC+2, David Pilato wrote:
Hmmm. Sounds like I misread what you explained in 2.

I missed the fact you want to have one index per store. So let me change my answer.
If a single index, one shard, can hold your 400 000 docs which sounds reasonable to me, then one single index will be faster than querying 20 indices.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com http://elasticsearch.com/
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 3 nov. 2014 à 23:01, Ori P <sha...@gmail.com <javascript:>> a écrit :

Thanks for replying David.

I thought approach 2 might be problematic since the alias on multiple indices would cause a query to run on every index separately, which I thought might slow things down. Apparently I was wrong?

And thanks for the tip about the refresh interval

On Monday, November 3, 2014 11:54:38 PM UTC+2, David Pilato wrote:
I don't see any benefit of solution 1.

I would definitely do solution 2.

I don't really think you could see a difference search time wise. But in term of IO 2 is better.
Also, you should modify refresh interval while indexing to -1 and call refresh after the bulk load.

HTH

--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 nov. 2014 à 21:31, Ori P <sha...@gmail.com <>> a écrit :

I would appreciate your suggestions in helping me design my elasticsearch index.

I'm intending to index product feeds from about 20 on-line stores, each store not having more than 20,000 products. each product has about 15 basic fields.
Most of the searches would be done on specific product categories, and not specific stores.

Each store feed is updated every few days (each store separately), by receiving an XML file containing all the products in the store (no deltas). Each update, I need to remove from my index all the existing products from that store and add the new ones.

I thought of two possibles approaches:

Create a single index + an alias to that index. Once a new feed is received, clone the existing index to a new index, remove from the new index all the old products, add the new products and finally change the alias to point to the new index.

Create an index for each store, and an alias that points to all of the indices. Once a new feed is received, just index it from scratch, remove the old store index from the alias and add the new one.

I'm not sure which way will give me faster search results? or maybe there is an even better approach I didn't think of...

Thanks in advance,

Ori

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com <>.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/34f2766d-cada-4ba9-a4fa-961c34aa2f8b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6c85ec37-e93e-47d6-a29f-72207f9925d8%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e4d869d-f09b-4f20-b2ca-4639c4a7bab4%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/6e4d869d-f09b-4f20-b2ca-4639c4a7bab4%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3AABA50C-DAED-4BB9-B14A-C178C1D0CBE5%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Modeling product data with frequent updates Elasticsearch	2	497	September 27, 2022
How should I structure my shop data? Elasticsearch	3	629	June 20, 2018
Schema Design Question Elasticsearch	1	301	July 6, 2017
Index design for large lists of document references Elasticsearch	1	430	August 3, 2020
Indexing strategy advice Elasticsearch	2	545	July 6, 2017

Designing an index that holds updating product data feeds

Ori

Related Topics