[ANN] FSRiver 0.3.0 for elasticsearch 0.90.3


(David Pilato) #1

Heya,

I just released File System River (aka FSRiver) 0.3.0: https://github.com/dadoonet/fsriver

This version contains:
Add SSH support
Update to Elasticsearch 0.90.3 / Mapper Attachment 1.8.0
Use BulkProcessor feature

Issues: https://github.com/dadoonet/fsriver/issues?milestone=4&state=closed

PR, feedbacks, comments are welcome :wink:

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs


(Fatima Castiglione Maldonado 发) #2

Hi
Well done

Do you think this would fix ScrutMyDocs automatic indexing?
Have you tried?

Thanks in advance

2013/8/9 David Pilato david@pilato.fr

Heya,

I just released File System River (aka FSRiver) 0.3.0:
https://github.com/dadoonet/fsriver

This version contains:

  • Add SSH support
  • Update to Elasticsearch 0.90.3 / Mapper Attachment 1.8.0
  • Use BulkProcessor feature

Issues:
https://github.com/dadoonet/fsriver/issues?milestone=4&state=closed

PR, feedbacks, comments are welcome :wink:

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

--

Fátima Castiglione Maldonado
castiglionemaldonado@gmail.com

                 ____
               ,'_   |

|||
<
) .------.
-----------,------.-' ,-'-.
| | | ,' . ,' | | ,'.
| ,-' |
/
,'-' .---.|
_____
.--' -----. | _____________________ -. ----- | | ___| | | \ ,- \ | | ___| |===========================((|) | | | | | | _____________________/- / |
--._ -----' | _________________,-' ----- |.-._ ,' __.---' | /
| -. | \ /. | | . ,' | | |. ,'
_____,------------------. -._ _,-' <___________________________)------'
| | |
`.
___|

=================================

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(pellyadolfo) #3

I am a new to ES so sorry if this is a trivial question.

I have some JSON files already in my file system. By using fsRiver to index
them into ES, will they be replicated somewhere in the hard drive or ES
just creates and stores only indices for the these original JSON documents
(so they are not replicated)?

Thanks

(My documents are instead XML docs but conversion into JSON is not a
problem, if fs River cannot handle, I will write some code, but I hate the
idea of having the documents replicated again by ES in the harddrive).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #4

ES will store the source within it's own data store (ie on disk), however
it does do compression.

This might help explain things -

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 20 November 2013 14:15, pellyadolfo@yahoo.es wrote:

I am a new to ES so sorry if this is a trivial question.

I have some JSON files already in my file system. By using fsRiver to
index them into ES, will they be replicated somewhere in the hard drive or
ES just creates and stores only indices for the these original JSON
documents (so they are not replicated)?

Thanks

(My documents are instead XML docs but conversion into JSON is not a
problem, if fs River cannot handle, I will write some code, but I hate the
idea of having the documents replicated again by ES in the harddrive).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(pellyadolfo) #5

Thanks Marc, definitely it helps.

This is a related thread.
https://groups.google.com/forum/#!topicsearchin/elasticsearch/store$20json|sort:date/elasticsearch/ms3A6vWTAVk

You can configure if the field is stored and/or indexed but in any case the
original JSON document is stored compressed by ES (as you said).

What I would be missing in ES is an option to say ES: "do not store the
JSON document and take in from here instead whenever is needed" (optionally
writing a kind of provider/format converter)

It clarifies, thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #6

The only issue with making it read from an external source is that you're
adding in extra delays and potential points of failure. You can do this but
it'd be a custom script.

I understand the desire to increase efficiencies in disk use, but the
compression for ES is very good. eg We get around 50% compression with
logging data.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 20 November 2013 14:49, pellyadolfo@yahoo.es wrote:

Thanks Marc, definitely it helps.

This is a related thread.
https://groups.google.com/forum/#!topicsearchin/elasticsearch/store$20json|sort:date/elasticsearch/ms3A6vWTAVk

You can configure if the field is stored and/or indexed but in any case
the original JSON document is stored compressed by ES (as you said).

What I would be missing in ES is an option to say ES: "do not store the
JSON document and take in from here instead whenever is needed" (optionally
writing a kind of provider/format converter)

It clarifies, thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #7

You could also think of disabling _all and _source fields.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html#mapping-all-field
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html#mapping-source-field

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 20 novembre 2013 at 05:01:42, Mark Walkom (markw@campaignmonitor.com) a écrit:

The only issue with making it read from an external source is that you're adding in extra delays and potential points of failure. You can do this but it'd be a custom script.

I understand the desire to increase efficiencies in disk use, but the compression for ES is very good. eg We get around 50% compression with logging data.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 20 November 2013 14:49, pellyadolfo@yahoo.es wrote:
Thanks Marc, definitely it helps.

This is a related thread. https://groups.google.com/forum/#!topicsearchin/elasticsearch/store$20json|sort:date/elasticsearch/ms3A6vWTAVk

You can configure if the field is stored and/or indexed but in any case the original JSON document is stored compressed by ES (as you said).

What I would be missing in ES is an option to say ES: "do not store the JSON document and take in from here instead whenever is needed" (optionally writing a kind of provider/format converter)

It clarifies, thanks

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8