Attachments as top-level documents

Hi list,

I'm indexing a website which has a lot of files on it.

I found the attachment plugin which handles all file types we have, but our
files are not "attached" (associated) with a particular web page -- in many
cases the same file is attached to multiple pages. So we want files to show
in the search results alongside other items.

I can extract data from the file myself using Apache Tika and index it as
with any other document in the system; but given Tika runs inside the
attachment plugin, is there any way to use the built-in system?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa853486-ef71-48f2-a711-998b792cfb35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you can do it by yourself and use Tika directly, I’d definitely do that and don’t use the mapper attachment plugin.
You will have more control on what you exactly want to do than with the mapper attachment plugin.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr https://twitter.com/elasticsearchfr | @scrutmydocs https://twitter.com/scrutmydocs

Le 10 déc. 2014 à 12:42, Peter Bowyer peter@mapledesign.co.uk a écrit :

Hi list,

I'm indexing a website which has a lot of files on it.

I found the attachment plugin which handles all file types we have, but our files are not "attached" (associated) with a particular web page -- in many cases the same file is attached to multiple pages. So we want files to show in the search results alongside other items.

I can extract data from the file myself using Apache Tika and index it as with any other document in the system; but given Tika runs inside the attachment plugin, is there any way to use the built-in system?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa853486-ef71-48f2-a711-998b792cfb35%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/fa853486-ef71-48f2-a711-998b792cfb35%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AD71C2AC-6257-4E39-8235-23A2C763B3B9%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.