I'm new to elasticsearch and I'm currently playing with mapper-attachments
plugin. My idea is to use ES to index file contents and NOT to store the
file contents. Is it possible? What would the mapping be? Cannot get it
working...
I forgot to mention that I used the latest releases of both ES and the
mapper-attchments plugin.
Dne pátek, 7. března 2014 16:37:51 UTC+1 Pavel Hloušek napsal(a):
Hello,
I'm new to elasticsearch and I'm currently playing with mapper-attachments
plugin. My idea is to use ES to index file contents and NOT to store the
file contents. Is it possible? What would the mapping be? Cannot get it
working...
The best idea is not to use mapper attachment and extract content before indexing using Apache Tika for example, which is used as well in mapper attachment.
That's what I did in FSRiver project. At first I was using mapper attachment but definitely not using it gives more flexibility and more control of what you are sending over the wire.
I forgot to mention that I used the latest releases of both ES and the mapper-attchments plugin.
Dne pátek, 7. března 2014 16:37:51 UTC+1 Pavel Hloušek napsal(a):
Hello,
I'm new to elasticsearch and I'm currently playing with mapper-attachments plugin. My idea is to use ES to index file contents and NOT to store the file contents. Is it possible? What would the mapping be? Cannot get it working...
Dne pátek, 7. března 2014 16:37:51 UTC+1 Pavel Hloušek napsal(a):
Hello,
I'm new to elasticsearch and I'm currently playing with mapper-attachments
plugin. My idea is to use ES to index file contents and NOT to store the
file contents. Is it possible? What would the mapping be? Cannot get it
working...
Thanks for your answer. I get your point. However, for a PHP project it
seemed easier to user mapper attachment. Otherwise there is need to keep
Apache Tika server running or run it each time from cli which is rather
expensive.
Still, exclude is nice but I'd like to share space on servers - there can
be a lot of content.
Pavel
Dne pátek, 7. března 2014 17:03:00 UTC+1 David Pilato napsal(a):
The best idea is not to use mapper attachment and extract content before
indexing using Apache Tika for example, which is used as well in mapper
attachment.
That's what I did in FSRiver project. At first I was using mapper
attachment but definitely not using it gives more flexibility and more
control of what you are sending over the wire.
Le 7 mars 2014 à 16:59:23, Pavel Hloušek (pavel....@gmail.com<javascript:>)
a écrit:
I forgot to mention that I used the latest releases of both ES and the
mapper-attchments plugin.
Dne pátek, 7. března 2014 16:37:51 UTC+1 Pavel Hloušek napsal(a):
Hello,
I'm new to elasticsearch and I'm currently playing with
mapper-attachments plugin. My idea is to use ES to index file contents and
NOT to store the file contents. Is it possible? What would the mapping be?
Cannot get it working...
Thanks for your answer. I get your point. However, for a PHP project it seemed easier to user mapper attachment. Otherwise there is need to keep Apache Tika server running or run it each time from cli which is rather expensive.
Still, exclude is nice but I'd like to share space on servers - there can be a lot of content.
Pavel
Dne pátek, 7. března 2014 17:03:00 UTC+1 David Pilato napsal(a):
The best idea is not to use mapper attachment and extract content before indexing using Apache Tika for example, which is used as well in mapper attachment.
That's what I did in FSRiver project. At first I was using mapper attachment but definitely not using it gives more flexibility and more control of what you are sending over the wire.
I forgot to mention that I used the latest releases of both ES and the mapper-attchments plugin.
Dne pátek, 7. března 2014 16:37:51 UTC+1 Pavel Hloušek napsal(a):
Hello,
I'm new to elasticsearch and I'm currently playing with mapper-attachments plugin. My idea is to use ES to index file contents and NOT to store the file contents. Is it possible? What would the mapping be? Cannot get it working...
Exclude did the job, thank you. But could you please elaborate on why the plugin stores the base64 by default? I guess you'll need it only if something in Tika extracting process changed. Are there any other reasons why shouldn't I disable it?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.