Client nodes - mapper-attachment plugin


(juanlegrand) #1

Hi

Would it possible to dedicate certain ElasticSearch client nodes to do only
analyzing via mapper-attachment plugin?
Afterwards the indexing should be performed on the data nodes.
Goal would be offloading the nodes containing the indexes, as analyzing a
lot of large documents consumes a lot of resources.
Any thoughts or experiences will be very much appreciated.

adTHANKSvance,

Jan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2c3d13b-f803-4f63-8a2e-ef70c93cfc90%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #2

I think you'd better do this in your own processus and outside an elasticsearch node.
You don't need to use mapper attachment and you can use directly Tika if you're a Java developer or any other library to extract content and metadata from it.

Actually, I did move the FSRiver from mapper attachment to Tika directly/ Now I have a fine control of my documents.
Better than that, I'm not forced anymore to send over the wire a full PDF document (10Mb) which contains mainly pictures and extract only a small amount of data (metadata for example).

Makes sense?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 17 février 2014 à 10:33:52, juanlegrand@gmail.com (juanlegrand@gmail.com) a écrit:

Hi

Would it possible to dedicate certain ElasticSearch client nodes to do only analyzing via mapper-attachment plugin?
Afterwards the indexing should be performed on the data nodes.
Goal would be offloading the nodes containing the indexes, as analyzing a lot of large documents consumes a lot of resources.
Any thoughts or experiences will be very much appreciated.

adTHANKSvance,
Jan

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d2c3d13b-f803-4f63-8a2e-ef70c93cfc90%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.5301d8ad.7724c67e.f2%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


(juanlegrand) #3

Hi David,

Thanks for your answer!

It does make sense.

The reason for my questions was: I wanted to take advantage of the
mapper-attachment plugin,

and on the other hand the high availability and scalability features of
ElasticSearch nodes to

perform the analyzing of the documents. E.g. we have experienced situations
where Tika was

eating all the memory of a machine, and in the end died…

My thought was ElasticSearch could, in these situations, detect and remove
the affected node
from the cluster.

I have no trouble with development but if we can use available software for
which we have
a support contract then I prefer that way

Any thoughts?

adTHANKSvance,
Jan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e3f7c3ed-7e94-4f2e-b430-96d72a233f7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #4

Ah! Causing out of memory exception on node is not the best practice for sure! :slight_smile:
That's one of the reason I would not put Tika in nodes directly.

One of my TODO item is to move FSRiver to logstash. So extracting content will be done by logstash (probably using Tika) but in a separate process than elasticsearch.
So once in, it will be supported in contracts.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 18 févr. 2014 à 08:23, juanlegrand@gmail.com a écrit :

Hi David,

Thanks for your answer!
It does make sense.
The reason for my questions was: I wanted to take advantage of the mapper-attachment plugin,
and on the other hand the high availability and scalability features of ElasticSearch nodes to
perform the analyzing of the documents. E.g. we have experienced situations where Tika was
eating all the memory of a machine, and in the end died…
My thought was ElasticSearch could, in these situations, detect and remove the affected node
from the cluster.

I have no trouble with development but if we can use available software for which we have
a support contract then I prefer that way

Any thoughts?

adTHANKSvance,
Jan

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e3f7c3ed-7e94-4f2e-b430-96d72a233f7e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1827F995-35E5-4324-8216-AD306A1B8595%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(juanlegrand) #5

On Wednesday, February 19, 2014 7:42:49 AM UTC+1, David Pilato wrote:

Ah! Causing out of memory exception on node is not the best practice for
sure! :slight_smile:
That's one of the reason I would not put Tika in nodes directly.

One of my TODO item is to move FSRiver to logstash. So extracting content
will be done by logstash (probably using Tika) but in a separate process
than elasticsearch.
So once in, it will be supported in contracts.

That looks promising! I hope you have a short TODO list :wink:
Thanks again,
Jan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d0b6f2ee-efd4-4bc1-a0e6-df246ad43065%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6