I'd like to use elasticsearch to store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio
Hello!
Take a look at http://tika.apache.org/ framework. You can extract data
from files like PDF or DOC and then index that data into
Elasticsearch.
--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch
I'd like to use elasticsearch to store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio
You could also use attachment plugin which will do the Tika job for you.
David
Twitter : @dadoonet / @elasticsearchfr
Le 10 mai 2012 à 16:05, Rafał Kuć r.kuc@solr.pl a écrit :
Hello!
Take a look at http://tika.apache.org/ framework. You can extract data
from files like PDF or DOC and then index that data into
Elasticsearch.--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticsearchI'd like to use elasticsearch to store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio
Where can I find the attachment plugin ?
Tks
Tullio
Il giorno giovedì 10 maggio 2012 15:57:13 UTC+2, tullio0106 ha scritto:
I'd like to use elasticsearch to store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio
https://github.com/elasticsearch/elasticsearch-mapper-attachments
https://github.com/elasticsearch/elasticsearch-mapper-attachments
Le 10 mai 2012 à 16:24, tullio0106 tbettinazzi@axioma.it a écrit :
plugin ?
Tks
TullioIl giorno giovedì 10 maggio 2012 15:57:13 UTC+2, tullio0106 ha scritto:
store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio =?f1a0ad7a-5daa-4ff9-a3d5-f1d42f61d6fc----
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet
If you do not want to use Tika, I am sure you can also base64 encode
the file and stuff it into a giant string field yourself. Maybe?
I have never tried it, I think it should work.
--Andrew
On May 10, 8:57 am, tullio0106 tbettina...@axioma.it wrote:
I'd like to use elasticsearch to store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio
Nothing against tika, but it's quite slow (I tried with a 3 MB pdf file and
the extraction time was 4 min.).
Base64 encoding don't seem to me a nice idea because every string would be
indexed, also escapes and meaningless string.
I hoped in a internal Elasticsearch tool avoind such complexities.
Tks
Tullio
Il giorno giovedì 10 maggio 2012 15:57:13 UTC+2, tullio0106 ha scritto:
I'd like to use elasticsearch to store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio
With ES attachment plugin, I indexed more than 100 documents per second in
a "small cluster", 2 nodes, 8 Gb RAM.
Documents are pdf, oOo, jpeg, ...
So, may I suggest you give it a try ?
David.
Le 10 mai 2012 à 17:20, tullio0106 tbettinazzi@axioma.it a écrit :
slow (I tried with a 3 MB pdf file and the extraction time was 4 min.).
Base64 encoding don't seem to me a nice idea because every string would be
indexed, also escapes and meaningless string.
I hoped in a internal Elasticsearch tool avoind such complexities.
Tks
TullioIl giorno giovedì 10 maggio 2012 15:57:13 UTC+2, tullio0106 ha scritto:
store indexes about my documents.
I've documents like .doc files or .pdf files or whatever.
Is there any way/tool to index such kind of documents ?
Tks
Tullio =?f1a0ad7a-5daa-4ff9-a3d5-f1d42f61d6fc----
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet
Is there any Maven repository for attachment mapper ?
Where can I find it ?
Tks
Tullio
Its under the same maven repo as elasticsearch main jar files.
On Sun, May 13, 2012 at 5:49 PM, tullio0106 tbettinazzi@axioma.it wrote:
Is there any Maven repository for attachment mapper ?
Where can I find it ?
Tks
Tullio--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Document-indexing-tp3977177p3984083.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.