Fs river giving error on reading large text file


(coder.ajit) #1

Hello All,

I have tried Elasticsearch + fs-river plugin to read the local directory
and file system. I have a file about 2.5 gb text file. While reading this
file, it gives error and dump the heap to elastic search folder. I have
started the es server with 6gb memory as given in elastic search
configuration.

i have tried to check the code in fs-river plugin, it load the file using
following code.

FileInputStream fileReader = new FileInputStream(file);

        // write it to a byte[] using a buffer since we don't know the 

exact
// image size
byte[] buffer = new byte[1024];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int i = 0;
while (-1 != (i = fileReader.read(buffer))) {
bos.write(buffer, 0, i);
}
byte[] data = bos.toByteArray();

        fileReader.close();
        bos.close();

Is there any way to parse the large text based file in ES server using
Fs-River? Did any one got success in loading heavy text files in ES server?

There are not lot of setting to do, its really easy. If i didn't set any
property. Please let me know.

Thanks,
APS

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Do you want a 2.5G file being a single hit?

The correct method is to write code for reading the file in a stream-like
manner and extract the relevant content into JSON documents for search
hits.

If not, you have to prepare the file and partition it into docs by a domain
specific parser, a task the fs river was not built for.

Jörg

On Sat, Jan 25, 2014 at 7:38 PM, coder.ajit@gmail.com wrote:

Hello All,

I have tried Elasticsearch + fs-river plugin to read the local directory
and file system. I have a file about 2.5 gb text file. While reading this
file, it gives error and dump the heap to elastic search folder. I have
started the es server with 6gb memory as given in elastic search
configuration.

i have tried to check the code in fs-river plugin, it load the file using
following code.

FileInputStream fileReader = new FileInputStream(file);

        // write it to a byte[] using a buffer since we don't know

the exact
// image size
byte[] buffer = new byte[1024];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int i = 0;
while (-1 != (i = fileReader.read(buffer))) {
bos.write(buffer, 0, i);
}
byte[] data = bos.toByteArray();

        fileReader.close();
        bos.close();

Is there any way to parse the large text based file in ES server using
Fs-River? Did any one got success in loading heavy text files in ES server?

There are not lot of setting to do, its really easy. If i didn't set any
property. Please let me know.

Thanks,
APS

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7rMwZOO-iYAq2JPH7koMc7ZRmq%2BPOXBu4yv-E74z3dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #3

I never tested fsriver with such files.
What kind of file is it?

If it's not really indexable, I would exclude it.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 janv. 2014 à 20:14, "joergprante@gmail.com" joergprante@gmail.com a écrit :

Do you want a 2.5G file being a single hit?

The correct method is to write code for reading the file in a stream-like manner and extract the relevant content into JSON documents for search hits.

If not, you have to prepare the file and partition it into docs by a domain specific parser, a task the fs river was not built for.

Jörg

On Sat, Jan 25, 2014 at 7:38 PM, coder.ajit@gmail.com wrote:

Hello All,

I have tried Elasticsearch + fs-river plugin to read the local directory and file system. I have a file about 2.5 gb text file. While reading this file, it gives error and dump the heap to elastic search folder. I have started the es server with 6gb memory as given in elastic search configuration.

i have tried to check the code in fs-river plugin, it load the file using following code.

FileInputStream fileReader = new FileInputStream(file);

        // write it to a byte[] using a buffer since we don't know the exact
        // image size
        byte[] buffer = new byte[1024];
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        int i = 0;
        while (-1 != (i = fileReader.read(buffer))) {
            bos.write(buffer, 0, i);
        }
        byte[] data = bos.toByteArray();

        fileReader.close();
        bos.close();

Is there any way to parse the large text based file in ES server using Fs-River? Did any one got success in loading heavy text files in ES server?

There are not lot of setting to do, its really easy. If i didn't set any property. Please let me know.

Thanks,
APS

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7rMwZOO-iYAq2JPH7koMc7ZRmq%2BPOXBu4yv-E74z3dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/FEDCE594-1ADD-43B7-A91E-49470CD99431%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(coder.ajit) #4

Hi,

It is a simple .txt file format which contain text only.

How to verify file is indexable or not?

Thanks,
Ajitpal

On Saturday, January 25, 2014 10:01:46 PM UTC, David Pilato wrote:

I never tested fsriver with such files.
What kind of file is it?

If it's not really indexable, I would exclude it.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 janv. 2014 à 20:14, "joerg...@gmail.com <javascript:>" <
joerg...@gmail.com <javascript:>> a écrit :

Do you want a 2.5G file being a single hit?

The correct method is to write code for reading the file in a stream-like
manner and extract the relevant content into JSON documents for search
hits.

If not, you have to prepare the file and partition it into docs by a
domain specific parser, a task the fs river was not built for.

Jörg

On Sat, Jan 25, 2014 at 7:38 PM, <coder...@gmail.com <javascript:>> wrote:

Hello All,

I have tried Elasticsearch + fs-river plugin to read the local directory
and file system. I have a file about 2.5 gb text file. While reading this
file, it gives error and dump the heap to elastic search folder. I have
started the es server with 6gb memory as given in elastic search
configuration.

i have tried to check the code in fs-river plugin, it load the file using
following code.

FileInputStream fileReader = new FileInputStream(file);

        // write it to a byte[] using a buffer since we don't know 

the exact
// image size
byte[] buffer = new byte[1024];
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int i = 0;
while (-1 != (i = fileReader.read(buffer))) {
bos.write(buffer, 0, i);
}
byte[] data = bos.toByteArray();

        fileReader.close();
        bos.close();

Is there any way to parse the large text based file in ES server using
Fs-River? Did any one got success in loading heavy text files in ES server?

There are not lot of setting to do, its really easy. If i didn't set any
property. Please let me know.

Thanks,
APS

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7rMwZOO-iYAq2JPH7koMc7ZRmq%2BPOXBu4yv-E74z3dw%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f55106c-ef3b-46d0-a56f-bc475f9787a8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(David Pilato) #5

2.5 gb in a txt file?
What kind of content is it?

Is it readable by a human?

That said, I think I should add another option filesize_max default to 10mb to avoid OOM on nodes.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 26 janv. 2014 à 07:57, coder.ajit@gmail.com a écrit :

Hi,

It is a simple .txt file format which contain text only.

How to verify file is indexable or not?

Thanks,
Ajitpal

On Saturday, January 25, 2014 10:01:46 PM UTC, David Pilato wrote:

I never tested fsriver with such files.
What kind of file is it?

If it's not really indexable, I would exclude it.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 25 janv. 2014 à 20:14, "joerg...@gmail.com" joerg...@gmail.com a écrit :

Do you want a 2.5G file being a single hit?

The correct method is to write code for reading the file in a stream-like manner and extract the relevant content into JSON documents for search hits.

If not, you have to prepare the file and partition it into docs by a domain specific parser, a task the fs river was not built for.

Jörg

On Sat, Jan 25, 2014 at 7:38 PM, coder...@gmail.com wrote:

Hello All,

I have tried Elasticsearch + fs-river plugin to read the local directory and file system. I have a file about 2.5 gb text file. While reading this file, it gives error and dump the heap to elastic search folder. I have started the es server with 6gb memory as given in elastic search configuration.

i have tried to check the code in fs-river plugin, it load the file using following code.

FileInputStream fileReader = new FileInputStream(file);

        // write it to a byte[] using a buffer since we don't know the exact
        // image size
        byte[] buffer = new byte[1024];
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        int i = 0;
        while (-1 != (i = fileReader.read(buffer))) {
            bos.write(buffer, 0, i);
        }
        byte[] data = bos.toByteArray();

        fileReader.close();
        bos.close();

Is there any way to parse the large text based file in ES server using Fs-River? Did any one got success in loading heavy text files in ES server?

There are not lot of setting to do, its really easy. If i didn't set any property. Please let me know.

Thanks,
APS

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e8a2f87-e2ee-41dd-b67d-f7e68a78c65c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7rMwZOO-iYAq2JPH7koMc7ZRmq%2BPOXBu4yv-E74z3dw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f55106c-ef3b-46d0-a56f-bc475f9787a8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8D772869-8508-4278-8037-B75F5937C06F%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6