Index binary files (PDF, ...)

Hey everybody,

at the moment I'm testing ES for the use as an indexing and storaging
solution. I use the following general settings with analysis-icu and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

    {
            "testmap": {
                    "properties" : {
                            "files" : { "type" : "attachment" }
                    }

      }
    }

'

Afterwards, i curl a merged XML-File to ES, in which the content is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from the
pdf (just the long string) and every search returns nothing. I think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Hi,

can you double check that the attachment plugin is installed correctly?

  1. make sure both tika and mapper attachment jar files are in <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the following two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initializing ...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting ...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri henning.frieder@googlemail.comwrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and storaging
solution. I use the following general settings with analysis-icu and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }

         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from the
pdf (just the long string) and every search returns nothing. I think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Once we make sure that these two things are ok then we can look at mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

can you double check that the attachment plugin is installed correctly?

  1. make sure both tika and mapper attachment jar files are in <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the following two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initializing ...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting ...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri henning.frieder@googlemail.comwrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and storaging
solution. I use the following general settings with analysis-icu and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }

         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from the
pdf (just the long string) and every search returns nothing. I think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

can you double check that the attachment plugin is installed correctly?

  1. make sure both tika and mapper attachment jar files are in <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the following two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initializing ...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting ...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri henning.frie...@googlemail.comwrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and storaging
solution. I use the following general settings with analysis-icu and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d '{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from the
pdf (just the long string) and every search returns nothing. I think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Hi,

You can either install plugins manually by copying these files to the lib
folder or you can use bin/plugin install command as explained here:

Once you have both plugins correctly installed and ES is loading them during
startup then you should be ready to use them. Did you try it again? What was
the result now? Any exceptions in log files when indexing attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri henning.frieder@googlemail.comwrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Hi,

can you double check that the attachment plugin is installed correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting ...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and storaging
solution. I use the following general settings with analysis-icu and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d '{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from the
pdf (just the long string) and every search returns nothing. I think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

2.) And you are right, in the logfiles I get an error if I'm trying to
index a pdf-file:

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
181)
at
org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
172)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
401)
at
org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
380)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:
278)
[......]

So this looks as if the plugin for treating the content isn't working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to the lib
folder or you can use bin/plugin install command as explained here:Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading them during
startup then you should be ready to use them. Did you try it again? What was
the result now? Any exceptions in log files when indexing attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri henning.frie...@googlemail.comwrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Hi,

can you double check that the attachment plugin is installed correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting ...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and storaging
solution. I use the following general settings with analysis-icu and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from the
pdf (just the long string) and every search returns nothing. I think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri henning.frieder@googlemail.comwrote:

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I directly copy
related jar files into lib folder. However, using plugin script is probably
better way and as long as you can see those plugins are loaded during node
startup then they you did good job (assuming you made sure you installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm trying to
index a pdf-file:

So probably we should try to find a simple working example because from the
quick glance at your mapping file it looks fine to me. Again I assume that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
181)
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
172)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
401)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
380)
at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:
278)
[......]

So this looks as if the plugin for treating the content isn't working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to the lib
folder or you can use bin/plugin install command as explained here:
Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading them
during
startup then you should be ready to use them. Did you try it again? What
was
the result now? Any exceptions in log files when indexing attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from
the
pdf (just the long string) and every search returns nothing. I
think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri henning.frie...@googlemail.comwrote:

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I directly copy
related jar files into lib folder. However, using plugin script is probably
better way and as long as you can see those plugins are loaded during node
startup then they you did good job (assuming you made sure you installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm trying to
index a pdf-file:

So probably we should try to find a simple working example because from the
quick glance at your mapping file it looks fine to me. Again I assume that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
181)
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
172)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
401)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
380)
at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:
278)
[......]

So this looks as if the plugin for treating the content isn't working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to the lib
folder or you can use bin/plugin install command as explained here:
Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading them
during
startup then you should be ready to use them. Did you try it again? What
was
the result now? Any exceptions in log files when indexing attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from
the
pdf (just the long string) and every search returns nothing. I
think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Pay attention to the correct base64 encoding. I had some problems to find the correct base64 encoding in implementing attachment support in pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri henning.frieder@googlemail.com wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri henning.frie...@googlemail.comwrote:

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I directly copy
related jar files into lib folder. However, using plugin script is probably
better way and as long as you can see those plugins are loaded during node
startup then they you did good job (assuming you made sure you installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm trying to
index a pdf-file:

So probably we should try to find a simple working example because from the
quick glance at your mapping file it looks fine to me. Again I assume that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
181)
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
172)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
401)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
380)
at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:
278)
[......]

So this looks as if the plugin for treating the content isn't working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to the lib
folder or you can use bin/plugin install command as explained here:
Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading them
during
startup then you should be ready to use them. Did you try it again? What
was
the result now? Any exceptions in log files when indexing attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from
the
pdf (just the long string) and every search returns nothing. I
think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

You mean one has to choose UTF8? By the way I can't find any
information on this issue in the base64 man page...

On 9 Jul., 11:47, Alberto Paro alberto.p...@gmail.com wrote:

Pay attention to the correct base64 encoding. I had some problems to find the correct base64 encoding in implementing attachment support in pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri henning.frie...@googlemail.com wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri henning.frie...@googlemail.comwrote:

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I directly copy
related jar files into lib folder. However, using plugin script is probably
better way and as long as you can see those plugins are loaded during node
startup then they you did good job (assuming you made sure you installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm trying to
index a pdf-file:

So probably we should try to find a simple working example because from the
quick glance at your mapping file it looks fine to me. Again I assume that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
181)
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:
172)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
401)
at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:
380)
at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:
278)
[......]

So this looks as if the plugin for treating the content isn't working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to the lib
folder or you can use bin/plugin install command as explained here:
Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading them
during
startup then you should be ready to use them. Did you try it again? What
was
the result now? Any exceptions in log files when indexing attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there, but I
found them to be in ESHOME/plugins/mapper-attachments/ and just copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ] [Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ] [Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]: starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment" }
                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword from
the
pdf (just the long string) and every search returns nothing. I
think,
I misunderstood something - I'm quite new to JSON/Java and similar
technique.

Thanks in advance

Regards,
Frifri

Hi,

I created a gist of a simple shell script for attachment plugin. HTH. Note
it requires Perl.

Regards,
Lukas

On Sun, Jul 10, 2011 at 4:51 PM, Frifri henning.frieder@googlemail.comwrote:

You mean one has to choose UTF8? By the way I can't find any
information on this issue in the base64 man page...

On 9 Jul., 11:47, Alberto Paro alberto.p...@gmail.com wrote:

Pay attention to the correct base64 encoding. I had some problems to find
the correct base64 encoding in implementing attachment support in pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri henning.frie...@googlemail.com wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri <
henning.frie...@googlemail.com>wrote:

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I
directly copy
related jar files into lib folder. However, using plugin script is
probably
better way and as long as you can see those plugins are loaded during
node
startup then they you did good job (assuming you made sure you
installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm trying
to
index a pdf-file:

So probably we should try to find a simple working example because
from the
quick glance at your mapping file it looks fine to me. Again I assume
that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:

[......]

So this looks as if the plugin for treating the content isn't
working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to
the lib
folder or you can use bin/plugin install command as explained here:

Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading
them
during
startup then you should be ready to use them. Did you try it again?
What
was
the result now? Any exceptions in log files when indexing
attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there,
but I
found them to be in ESHOME/plugins/mapper-attachments/ and just
copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček <
lukas.vl...@gmail.com>
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and
check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ]
[Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment"

}

                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword
from
the
pdf (just the long string) and every search returns nothing. I
think,
I misunderstood something - I'm quite new to JSON/Java and
similar
technique.

Thanks in advance

Regards,
Frifri

Hey,

I tried your way and in general it works (means without error), but
the search returns still nothing.

So I verified row by row of your script and encountered that even
though perl is installed (sudo apt-get install perl), the row

coded=cat fn6742.pdf | perl -MMIME::Base64 -e 'print encode_base64($_)'

doesn't return any content stream. So I changed the row to:

coded=cat fn6742.pdf | base64 $_

and voilà, the variable "coded" contained the stream. Also the
indexing went well (according to logs) but there's still no result...

$ curl "${host}/_search?q=amplifier"
{"took":4,"timed_out":false,"_shards":{"total":3,"successful":
3,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

Perhaps the base64-command isn't compatible with the encoding needed
for ES?!

On 11 Jul., 00:59, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I created a gist of a simple shell script for attachment plugin. HTH. Note
it requires Perl.

Test of attachments plugin · GitHub

Regards,
Lukas

On Sun, Jul 10, 2011 at 4:51 PM, Frifri henning.frie...@googlemail.comwrote:

You mean one has to choose UTF8? By the way I can't find any
information on this issue in the base64 man page...

On 9 Jul., 11:47, Alberto Paro alberto.p...@gmail.com wrote:

Pay attention to the correct base64 encoding. I had some problems to find
the correct base64 encoding in implementing attachment support in pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri henning.frie...@googlemail.com wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri <
henning.frie...@googlemail.com>wrote:

Hey,

1.) what I meant to say was that I had both plugins already installed
with the command you cited (bin/plugin install ....) and they seem to
be loaded as I've shown above. Nevertheless, the associated jar-files
are not in the lib/-directory, only in plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I
directly copy
related jar files into lib folder. However, using plugin script is
probably
better way and as long as you can see those plugins are loaded during
node
startup then they you did good job (assuming you made sure you
installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm trying
to
index a pdf-file:

So probably we should try to find a simple working example because
from the
quick glance at your mapping file it looks fine to me. Again I assume
that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ] [test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]: Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:

[......]

So this looks as if the plugin for treating the content isn't
working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files to
the lib
folder or you can use bin/plugin install command as explained here:

Elasticsearch Platform — Find real-time answers at scale | Elastic...

Once you have both plugins correctly installed and ES is loading
them
during
startup then you should be ready to use them. Did you try it again?
What
was
the result now? Any exceptions in log files when indexing
attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't there,
but I
found them to be in ESHOME/plugins/mapper-attachments/ and just
copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ] [test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ] [test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček <
lukas.vl...@gmail.com>
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and
check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ]
[Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" : "attachment"

}

                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any keyword
from
the
pdf (just the long string) and every search returns nothing. I
think,
I misunderstood something - I'm quite new to JSON/Java and
similar
technique.

Thanks in advance

Regards,
Frifri

Hey,

sorry, there were some issues in the script. Fixed.

First issue was on the curl command (line 13) using "_mappings" instead of
"_mapping"
Second issue in the perl command, I was missing -n option. (line 29)
Then I added a refresh after indexing.
Finally, I enhanced mapping and the query to pull only the document title.

Regards,
Lukas

On Mon, Jul 11, 2011 at 12:25 PM, Frifri henning.frieder@googlemail.comwrote:

Hey,

I tried your way and in general it works (means without error), but
the search returns still nothing.

So I verified row by row of your script and encountered that even
though perl is installed (sudo apt-get install perl), the row

coded=cat fn6742.pdf | perl -MMIME::Base64 -e 'print encode_base64($_)'

doesn't return any content stream. So I changed the row to:

coded=cat fn6742.pdf | base64 $_

and voilà, the variable "coded" contained the stream. Also the
indexing went well (according to logs) but there's still no result...

$ curl "${host}/_search?q=amplifier"
{"took":4,"timed_out":false,"_shards":{"total":3,"successful":
3,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

Perhaps the base64-command isn't compatible with the encoding needed
for ES?!

On 11 Jul., 00:59, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I created a gist of a simple shell script for attachment plugin. HTH.
Note
it requires Perl.

Test of attachments plugin · GitHub

Regards,
Lukas

On Sun, Jul 10, 2011 at 4:51 PM, Frifri <henning.frie...@googlemail.com
wrote:

You mean one has to choose UTF8? By the way I can't find any
information on this issue in the base64 man page...

On 9 Jul., 11:47, Alberto Paro alberto.p...@gmail.com wrote:

Pay attention to the correct base64 encoding. I had some problems to
find
the correct base64 encoding in implementing attachment support in pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri henning.frie...@googlemail.com
wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do
you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri <
henning.frie...@googlemail.com>wrote:

Hey,

1.) what I meant to say was that I had both plugins already
installed
with the command you cited (bin/plugin install ....) and they
seem to
be loaded as I've shown above. Nevertheless, the associated
jar-files
are not in the lib/-directory, only in
plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I
directly copy
related jar files into lib folder. However, using plugin script is
probably
better way and as long as you can see those plugins are loaded
during
node
startup then they you did good job (assuming you made sure you
installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm
trying
to
index a pdf-file:

So probably we should try to find a simple working example because
from the
quick glance at your mapping file it looks fine to me. Again I
assume
that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ]
[test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]:
Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:

[......]

So this looks as if the plugin for treating the content isn't
working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files
to
the lib
folder or you can use bin/plugin install command as explained
here:

Elasticsearch Platform — Find real-time answers at scale | Elastic.
..

Once you have both plugins correctly installed and ES is loading
them
during
startup then you should be ready to use them. Did you try it
again?
What
was
the result now? Any exceptions in log files when indexing
attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't
there,
but I
found them to be in ESHOME/plugins/mapper-attachments/ and just
copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ]
[test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ]
[test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ]
[test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can
look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček <
lukas.vl...@gmail.com>
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and
check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ]
[Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with
analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" :

"attachment"

}

                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the
content
is
created by "base64 SOURCE.pdf > TARGET"

{

"files" : [
{
"_content_type" : "application/pdf",
"_name" : " LeParfait_disclaimer_fr.pdf",
"content" : " ...base64-encoded content.."
}
]
}

Unfortunately, the content field does not contain any
keyword
from
the
pdf (just the long string) and every search returns nothing.
I
think,
I misunderstood something - I'm quite new to JSON/Java and
similar
technique.

Thanks in advance

Regards,
Frifri

Thank you so much for your effort. It's finally working now!!

There's only one little issue, but not an error, only a warning -
though I don't think, it's important right now.

[2011-07-11 14:07:15,192][WARN ]
[org.apache.pdfbox.pdfparser.PDFParser] Parsing Error, Skipping Object
java.io.IOException: expected='endobj' firstReadAttempt=''
secondReadAttempt='' org.apache.pdfbox.io.PushBackInputStream@2a807bbc
at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:607)

On 11 Jul., 13:10, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hey,

sorry, there were some issues in the script. Fixed.

First issue was on the curl command (line 13) using "_mappings" instead of
"_mapping"
Second issue in the perl command, I was missing -n option. (line 29)
Then I added a refresh after indexing.
Finally, I enhanced mapping and the query to pull only the document title.

Regards,
Lukas

On Mon, Jul 11, 2011 at 12:25 PM, Frifri henning.frie...@googlemail.comwrote:

Hey,

I tried your way and in general it works (means without error), but
the search returns still nothing.

So I verified row by row of your script and encountered that even
though perl is installed (sudo apt-get install perl), the row

coded=cat fn6742.pdf | perl -MMIME::Base64 -e 'print encode_base64($_)'

doesn't return any content stream. So I changed the row to:

coded=cat fn6742.pdf | base64 $_

and voilà, the variable "coded" contained the stream. Also the
indexing went well (according to logs) but there's still no result...

$ curl "${host}/_search?q=amplifier"
{"took":4,"timed_out":false,"_shards":{"total":3,"successful":
3,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

Perhaps the base64-command isn't compatible with the encoding needed
for ES?!

On 11 Jul., 00:59, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I created a gist of a simple shell script for attachment plugin. HTH.
Note
it requires Perl.

Test of attachments plugin · GitHub

Regards,
Lukas

On Sun, Jul 10, 2011 at 4:51 PM, Frifri <henning.frie...@googlemail.com
wrote:

You mean one has to choose UTF8? By the way I can't find any
information on this issue in the base64 man page...

On 9 Jul., 11:47, Alberto Paro alberto.p...@gmail.com wrote:

Pay attention to the correct base64 encoding. I had some problems to
find
the correct base64 encoding in implementing attachment support in pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri henning.frie...@googlemail.com
wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or do
you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri <
henning.frie...@googlemail.com>wrote:

Hey,

1.) what I meant to say was that I had both plugins already
installed
with the command you cited (bin/plugin install ....) and they
seem to
be loaded as I've shown above. Nevertheless, the associated
jar-files
are not in the lib/-directory, only in
plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I
directly copy
related jar files into lib folder. However, using plugin script is
probably
better way and as long as you can see those plugins are loaded
during
node
startup then they you did good job (assuming you made sure you
installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm
trying
to
index a pdf-file:

So probably we should try to find a simple working example because
from the
quick glance at your mapping file it looks fine to me. Again I
assume
that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ]
[test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P], s[STARTED]:
Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:

[......]

So this looks as if the plugin for treating the content isn't
working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these files
to
the lib
folder or you can use bin/plugin install command as explained
here:

Elasticsearch Platform — Find real-time answers at scale | Elastic.
..

Once you have both plugins correctly installed and ES is loading
them
during
startup then you should be ready to use them. Did you try it
again?
What
was
the result now? Any exceptions in log files when indexing
attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't
there,
but I
found them to be in ESHOME/plugins/mapper-attachments/ and just
copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ]
[test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ]
[test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ]
[test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com wrote:

Once we make sure that these two things are ok then we can
look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček <
lukas.vl...@gmail.com>
wrote:

Hi,

can you double check that the attachment plugin is installed
correctly?

  1. make sure both tika and mapper attachment jar files are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs and
check
ES
startup log records sequence. You should see something like

[2011-06-22 11:44:38,456][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initializing
...
[2011-06-22 11:44:38,481][INFO ][plugins ]
[Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
initialized
[2011-06-22 11:44:41,322][INFO ][node ]
[Amber
Hunt]
{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:
starting
...
...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing and
storaging
solution. I use the following general settings with
analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d '

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" :

"attachment"

}

                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the
content
is

...

Erfahren Sie mehr »

This exception comes from Tika. Generally speaking, parsing PDF documents is
a pain for anybody, not only for Tika (or pdfbox which Tika delegates to).
So the chance is that Tika is hitting some issues with this particular
document. The question is if this is serious or not. Note that if you are
seeing this when parsing the document from my example then this PDF document
contains a lot of complex charts and tables with texts. Typically such
documents (highly technical product sheets) are generated automatically from
product data store and some level of formatting issues are probably expected
(but I am just guessing here).

On Mon, Jul 11, 2011 at 2:32 PM, Frifri henning.frieder@googlemail.comwrote:

Thank you so much for your effort. It's finally working now!!

There's only one little issue, but not an error, only a warning -
though I don't think, it's important right now.

[2011-07-11 14:07:15,192][WARN ]
[org.apache.pdfbox.pdfparser.PDFParser] Parsing Error, Skipping Object
java.io.IOException: expected='endobj' firstReadAttempt=''
secondReadAttempt='' org.apache.pdfbox.io.PushBackInputStream@2a807bbc
at
org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:607)

On 11 Jul., 13:10, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hey,

sorry, there were some issues in the script. Fixed.

First issue was on the curl command (line 13) using "_mappings" instead
of
"_mapping"
Second issue in the perl command, I was missing -n option. (line 29)
Then I added a refresh after indexing.
Finally, I enhanced mapping and the query to pull only the document
title.

Regards,
Lukas

On Mon, Jul 11, 2011 at 12:25 PM, Frifri <henning.frie...@googlemail.com
wrote:

Hey,

I tried your way and in general it works (means without error), but
the search returns still nothing.

So I verified row by row of your script and encountered that even
though perl is installed (sudo apt-get install perl), the row

coded=cat fn6742.pdf | perl -MMIME::Base64 -e 'print encode_base64($_)'

doesn't return any content stream. So I changed the row to:

coded=cat fn6742.pdf | base64 $_

and voilà, the variable "coded" contained the stream. Also the
indexing went well (according to logs) but there's still no result...

$ curl "${host}/_search?q=amplifier"
{"took":4,"timed_out":false,"_shards":{"total":3,"successful":
3,"failed":0},"hits":{"total":0,"max_score":null,"hits":}}

Perhaps the base64-command isn't compatible with the encoding needed
for ES?!

On 11 Jul., 00:59, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I created a gist of a simple shell script for attachment plugin. HTH.
Note
it requires Perl.

Test of attachments plugin · GitHub

Regards,
Lukas

On Sun, Jul 10, 2011 at 4:51 PM, Frifri <
henning.frie...@googlemail.com
wrote:

You mean one has to choose UTF8? By the way I can't find any
information on this issue in the base64 man page...

On 9 Jul., 11:47, Alberto Paro alberto.p...@gmail.com wrote:

Pay attention to the correct base64 encoding. I had some problems
to
find
the correct base64 encoding in implementing attachment support in
pyes

Sent from my iPhone

On 07/lug/2011, at 22:04, Frifri <henning.frie...@googlemail.com

wrote:

Yeah, but at the moment I've really no clue. Isn't there any
possibility to verify whether the plugin is working or not? Or
do
you
have another idea what to try else?

On 7 Jul., 16:43, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

On Thu, Jul 7, 2011 at 4:05 PM, Frifri <
henning.frie...@googlemail.com>wrote:

Hey,

1.) what I meant to say was that I had both plugins already
installed
with the command you cited (bin/plugin install ....) and they
seem to
be loaded as I've shown above. Nevertheless, the associated
jar-files
are not in the lib/-directory, only in
plugins/mapper-attachments.
Do you think I have to manually move or copy them?

I normally do not use plugin script for installation because I
directly copy
related jar files into lib folder. However, using plugin
script is
probably
better way and as long as you can see those plugins are loaded
during
node
startup then they you did good job (assuming you made sure you
installed
plugins on ALL nodes in the cluster).

2.) And you are right, in the logfiles I get an error if I'm
trying
to
index a pdf-file:

So probably we should try to find a simple working example
because
from the
quick glance at your mapping file it looks fine to me. Again I
assume
that
you index your json document into /testindex/testmap/

[2011-07-07 15:53:30,296][DEBUG][action.index ]
[test]
[testindex][0], node[YEPolMwkRhe_eJdNAyuz4Q], [P],
s[STARTED]:
Failed
to execute [index {[testindex][testmap][1], source}]
org.elasticsearch.ElasticSearchParseException: Failed to
derive
xcontent from
at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.common.xcontent.XContentFactory.xContent(XContentFactory.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.mapper.xcontent.XContentDocumentMapper.parse(XContentDocumentMapper.java:

  1. at

org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:

[......]

So this looks as if the plugin for treating the content isn't
working.

Thanks again!
Frieder

On 7 Jul., 15:36, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

You can either install plugins manually by copying these
files
to
the lib
folder or you can use bin/plugin install command as
explained
here:

Elasticsearch Platform — Find real-time answers at scale | Elastic.

..

Once you have both plugins correctly installed and ES is
loading
them
during
startup then you should be ready to use them. Did you try it
again?
What
was
the result now? Any exceptions in log files when indexing
attachments or
when searching them?

Regards,
Lukas

On Thu, Jul 7, 2011 at 3:09 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Thanks you very much for your support.

I use the precompiled ES 0.16.2 and the two files weren't
there,
but I
found them to be in ESHOME/plugins/mapper-attachments/ and
just
copied
them to lib/.
Perhaps that helps somehow?!

And both plugins seem to be properly installed:

[2011-07-07 06:06:09,682][INFO ][node ]
[test]
{elasticsearch/0.16.2}[2897]: initializing ...
[2011-07-07 06:06:09,711][INFO ][plugins ]
[test]
loaded [mapper-attachments, analysis-icu]
[2011-07-07 06:06:17,724][INFO ][node ]
[test]
{elasticsearch/0.16.2}[2897]: initialized

On 7 Jul., 14:57, Lukáš Vlček lukas.vl...@gmail.com
wrote:

Once we make sure that these two things are ok then we can
look at
mappings
and searching in detail :slight_smile:

On Thu, Jul 7, 2011 at 2:54 PM, Lukáš Vlček <
lukas.vl...@gmail.com>
wrote:

Hi,

can you double check that the attachment plugin is
installed
correctly?

  1. make sure both tika and mapper attachment jar files
    are in
    <ES_HOME>/lib
    folder

For example if you user 0.17.0-SNAPSHOT you should see
the
following
two
files in lib folder:
elasticsearch-mapper-attachments-0.17.0-SNAPSHOT.jar
tika-app-0.9.jar

  1. is attachment plugin recognized when ES startup?

You can see this in log file. Navigate to <ES_HOME>/logs
and
check
ES
startup log records sequence. You should see something
like

[2011-06-22 11:44:38,456][INFO ][node
]
[Amber
Hunt]

{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:

initializing

...

[2011-06-22 11:44:38,481][INFO ][plugins
]
[Amber
Hunt]
loaded [mapper-attachments], sites
[2011-06-22 11:44:41,322][INFO ][node
]
[Amber
Hunt]

{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:

initialized

[2011-06-22 11:44:41,322][INFO ][node
]
[Amber
Hunt]

{elasticsearch/0.17.0-SNAPSHOT/2011-06-21T13:56:26}[6144]:

starting

...

...

Note the second row.

Regards,
Lukas

On Thu, Jul 7, 2011 at 2:37 PM, Frifri <
henning.frie...@googlemail.com
wrote:

Hey everybody,

at the moment I'm testing ES for the use as an indexing
and
storaging
solution. I use the following general settings with
analysis-icu
and
mapper-attachments plugins enabled:

curl -XPUT 'http://localhost:9200/testindex/'-d'{
"settings" : {
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 0
}
}
}'

and the following mapping for pdf indexing:

curl -XPUT localhost:9200/testindex/testmap/_mapping -d
'

   {
           "testmap": {
                   "properties" : {
                           "files" : { "type" :

"attachment"

}

                   }
         }
   }

'

Afterwards, i curl a merged XML-File to ES, in which the
content
is

...

Erfahren Sie mehr »

Hi,

I am using ELK 5.4 and unable to index pdf file. Is it because during ingest pluin installation had below warnings...

[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

  • java.lang.RuntimePermission getClassLoader
  • java.lang.reflect.ReflectPermission suppressAccessChecks
  • java.security.SecurityPermission createAccessControlContext
  • java.security.SecurityPermission insertProvider
  • java.security.SecurityPermission putProviderProperty.BC
    See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
    for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]

Is anyone also faced these warnings? Thanks.

Ramapriya

The warnings are expected as this requires some permissions to run.
But you should better open a new thread instead of answering on a thread which is out-dated.

thought, but the problem is still same.

Can you help me indexing pdf/docx files in ES 5.4 or point any link which has already achieved. thanks.

Can you open a new question please?