What's the difference between solr and elasticsearch in hdfs case?


(Jianyi) #1

Hi~

I'm new to both solr and elasticsearch. I have read that both the two support creating index on hdfs.

So, what's the difference between solr and elasticsearch in hdfs case?


(David Pilato) #2

I would not store my indices on HDFS. :slight_smile:
Too slow IMHO.

Use local storage and let elasticsearch distribute your data over multiple machines. Basically, with elasticsearch you don't need HDFS to scale out.
I don't know for SOLR.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 12 août 2014 à 10:22:03, Jianyi (phoenix.w.2010@gmail.com) a écrit:

Hi~

I'm new to both solr and elasticsearch. I have read that both the two
support creating index on hdfs.

So, what's the difference between solr and elasticsearch in hdfs case?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4061659.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1407811639795-4061659.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53e9d06b.79838cb2.18f0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Jianyi) #3

Hi David,

Thanks for your reply.

I'm not talking about scaling out.

Out team has a project providing web services based on Lucene. The current
architecture indexes the documents to lucene through some kafka-like queues.
But we have a problem: when massive documents comes for indexing(eg:
creating index for some application for the first time with large scale
datas), the service becomes too slow. So we want to do some work about
offline-indexing, that is to prepare the index well offline and replace the
old index online when needed. We are looking for some solution based on
hdfs.

Do you have any idears about the offline-indexing above?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4061659p4061684.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1407834529982-4061684.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


(Jianyi) #4

Hi David,

Thanks for your reply.

I'm not talking about scaling out.

Our team has a project providing web services based on Lucene. The current architecture indexes the documents to lucene through some kafka-like queues. But we have a problem: when massive documents comes for indexing(eg: creating index for some application for the first time with large scale datas), the service becomes too slow. So we want to do some work about offline-indexing, that is to prepare the index well offline and replace the old index online when needed. We are looking for some solution based on hdfs.

Do you have any idears about the offline-indexing above?


(David Pilato) #5

And did you try it with elasticsearch? I mean indexing and still using the service?

If you really hit an issue, you can think of allocating new index on dedicated nodes and then move them to "live" nodes.
Using aliases would be even better so you'll be able to switch from one old index to the new one without any downtime.

Another solution could be to index in a dedicated cluster and then use snapshot and restore to backup from index cluster to search cluster.

But again, I'd make sure that I really have problems before trying to solve problems I don't have.

My 2 cents.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 12 août 2014 à 13:43:13, Jianyi (phoenix.w.2010@gmail.com) a écrit:

Hi David,

Thanks for your reply.

I'm not talking about scaling out.

Our team has a project providing web services based on Lucene. The current
architecture indexes the documents to lucene through some kafka-like queues.
But we have a problem: when massive documents comes for indexing(eg:
creating index for some application for the first time with large scale
datas), the service becomes too slow. So we want to do some work about
offline-indexing, that is to prepare the index well offline and replace the
old index online when needed. We are looking for some solution based on
hdfs.

Do you have any idears about the offline-indexing above?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4061659p4061695.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1407842875709-4061695.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53ea085c.189a769b.18f0%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.


(Jianyi) #6

Hi David,

I'm afraid it would be a little expensive to move to ES.

I came across elephant-bird-lucene of twitter yesterday, and I will try it.

Thanks for your reply.


(David Pilato) #7

Why it would be expensive?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 août 2014 à 03:43, Jianyi phoenix.w.2010@gmail.com a écrit :

Hi David,

I'm afraid it would be a little expensive to move to ES.

I came across elephant-bird-lucene of twitter yesterday, and I will try it.

Thanks for your reply.

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4061659p4061760.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1407894195068-4061760.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/BAEF859C-F48F-410B-9D6F-AD7C26F6E35B%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Offline indexing with multiple shards
(system) #8