Indexing files from filesystem


(Dan Ber) #1

Hey,

I just wondered if it is somehow possible to index files from a directory
on HDD and their contents if they are textfiles or word documents and maybe
even PDFs.
I read about FSRiver but could not test it becauser it seems to be not
working with es 1.2.1 due to a bug.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

Check out Logstash, it'll do most of what you want.
http://logstash.net/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 July 2014 17:15, Dan Ber online-daniel95@web.de wrote:

Hey,

I just wondered if it is somehow possible to index files from a directory
on HDD and their contents if they are textfiles or word documents and maybe
even PDFs.
I read about FSRiver but could not test it becauser it seems to be not
working with es 1.2.1 due to a bug.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y9UOdNt6Y6H%2B7xcmsJHDjb0HPhQKj8TD2eCYxMuXsR4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Dan Ber) #3

I just had a look at their website, the youtube video of their own
presentation and I red a bit about it in generally, how it works.
For me, it now just looks like I give him a file C:\Apache\logs.txt and it
works with it.
What I look for is something I can for example say: Check our company´s
drive where are sub folders like marketing, projects with again have sub
folders and so on and index me into elasticsearch the path and the name to
each file in each of those subfolders and if it is a word document or a pdf
then also put its content into elasticsearch. So we can search not only for
file names and path but also in the file contents.
I did a small tool for it written in Delphi (because we develop in Delphi)
but it uses some libs we want to get rid of so we can use that system in
our product as well for indexing documents. Logstash doesn´t look like it
is made for that.
So is there a plugin or something else which is able to do so?

On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:

Check out Logstash, it'll do most of what you want.
http://logstash.net/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 July 2014 17:15, Dan Ber <online-...@web.de <javascript:>> wrote:

Hey,

I just wondered if it is somehow possible to index files from a directory
on HDD and their contents if they are textfiles or word documents and maybe
even PDFs.
I read about FSRiver but could not test it becauser it seems to be not
working with es 1.2.1 due to a bug.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #4

Never used FSRiver, but from what I read, it should be exactly what you
want. The code is open-sourced, so I would just check out the project,
update the Elasticsearch version to 1.2.1 and find whatever bugs come up.
Then submit a pull request and contribute back to the project. :slight_smile:

Cheers,

Ivan

On Fri, Jul 11, 2014 at 1:13 AM, Daniel Berretz online-daniel95@web.de
wrote:

I just had a look at their website, the youtube video of their own
presentation and I red a bit about it in generally, how it works.
For me, it now just looks like I give him a file C:\Apache\logs.txt and it
works with it.
What I look for is something I can for example say: Check our company´s
drive where are sub folders like marketing, projects with again have sub
folders and so on and index me into elasticsearch the path and the name to
each file in each of those subfolders and if it is a word document or a pdf
then also put its content into elasticsearch. So we can search not only for
file names and path but also in the file contents.
I did a small tool for it written in Delphi (because we develop in Delphi)
but it uses some libs we want to get rid of so we can use that system in
our product as well for indexing documents. Logstash doesn´t look like it
is made for that.
So is there a plugin or something else which is able to do so?

On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:

Check out Logstash, it'll do most of what you want.
http://logstash.net/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 July 2014 17:15, Dan Ber online-...@web.de wrote:

Hey,

I just wondered if it is somehow possible to index files from a
directory on HDD and their contents if they are textfiles or word documents
and maybe even PDFs.
I read about FSRiver but could not test it becauser it seems to be not
working with es 1.2.1 due to a bug.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(David Pilato) #5

I love your plan Ivan! :slight_smile:

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 juil. 2014 à 20:36, Ivan Brusic ivan@brusic.com a écrit :

Never used FSRiver, but from what I read, it should be exactly what you want. The code is open-sourced, so I would just check out the project, update the Elasticsearch version to 1.2.1 and find whatever bugs come up. Then submit a pull request and contribute back to the project. :slight_smile:

Cheers,

Ivan

On Fri, Jul 11, 2014 at 1:13 AM, Daniel Berretz online-daniel95@web.de wrote:
I just had a look at their website, the youtube video of their own presentation and I red a bit about it in generally, how it works.
For me, it now just looks like I give him a file C:\Apache\logs.txt and it works with it.
What I look for is something I can for example say: Check our company´s drive where are sub folders like marketing, projects with again have sub folders and so on and index me into elasticsearch the path and the name to each file in each of those subfolders and if it is a word document or a pdf then also put its content into elasticsearch. So we can search not only for file names and path but also in the file contents.
I did a small tool for it written in Delphi (because we develop in Delphi) but it uses some libs we want to get rid of so we can use that system in our product as well for indexing documents. Logstash doesn´t look like it is made for that.
So is there a plugin or something else which is able to do so?

On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:
Check out Logstash, it'll do most of what you want.
http://logstash.net/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 July 2014 17:15, Dan Ber online-...@web.de wrote:
Hey,

I just wondered if it is somehow possible to index files from a directory on HDD and their contents if they are textfiles or word documents and maybe even PDFs.
I read about FSRiver but could not test it becauser it seems to be not working with es 1.2.1 due to a bug.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7A605A6B-532D-496F-BECB-C0D0B190D495%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #6

Hopefully you accept pull requests faster than the core team. :slight_smile:

--
Ivan

On Fri, Jul 11, 2014 at 11:55 AM, David Pilato david@pilato.fr wrote:

I love your plan Ivan! :slight_smile:

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 11 juil. 2014 à 20:36, Ivan Brusic ivan@brusic.com a écrit :

Never used FSRiver, but from what I read, it should be exactly what you
want. The code is open-sourced, so I would just check out the project,
update the Elasticsearch version to 1.2.1 and find whatever bugs come up.
Then submit a pull request and contribute back to the project. :slight_smile:

Cheers,

Ivan

On Fri, Jul 11, 2014 at 1:13 AM, Daniel Berretz online-daniel95@web.de
wrote:

I just had a look at their website, the youtube video of their own
presentation and I red a bit about it in generally, how it works.
For me, it now just looks like I give him a file C:\Apache\logs.txt and
it works with it.
What I look for is something I can for example say: Check our company´s
drive where are sub folders like marketing, projects with again have sub
folders and so on and index me into elasticsearch the path and the name to
each file in each of those subfolders and if it is a word document or a pdf
then also put its content into elasticsearch. So we can search not only for
file names and path but also in the file contents.
I did a small tool for it written in Delphi (because we develop in
Delphi) but it uses some libs we want to get rid of so we can use that
system in our product as well for indexing documents. Logstash doesn´t look
like it is made for that.
So is there a plugin or something else which is able to do so?

On Friday, July 11, 2014 9:20:50 AM UTC+2, Mark Walkom wrote:

Check out Logstash, it'll do most of what you want.
http://logstash.net/

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 July 2014 17:15, Dan Ber online-...@web.de wrote:

Hey,

I just wondered if it is somehow possible to index files from a
directory on HDD and their contents if they are textfiles or word documents
and maybe even PDFs.
I read about FSRiver but could not test it becauser it seems to be not
working with es 1.2.1 due to a bug.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1e250b52-19a3-48b8-b11d-687317160930%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0911f657-f9ff-40ae-a6d0-437b23a6edb7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBGKzVvTiUTsGForwjm-tMGDRWynPSBLcFM7wVFoVuGOQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7A605A6B-532D-496F-BECB-C0D0B190D495%40pilato.fr
https://groups.google.com/d/msgid/elasticsearch/7A605A6B-532D-496F-BECB-C0D0B190D495%40pilato.fr?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD6R96MsFZycCuGP%3DZnVGLxkCy6CswwEto97scvO6RDDg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #7