Mapper-attachments and highlights


(Shairon Toledo) #1

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin

{
"doc" : {
"properties" : {
"attachment" : {"type" : "attachment", "store" : "yes", "term_vector"
: "with_positions_offsets" }
}
}
}

The doc

{
'filename':'Redis manual.pdf',
'size':61952,
'folder':'/Users/shairon/References',
'updated_at':'2011-12-07T22:02:33Z',
'modified':'2011-12-07T22:02:33Z',
'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'
}

I search by

{
"query":{
"term" : { "attachment": "redis"}
},
"highlight" : {
"fields" : {
"attachment" : {}
}
}
}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is still a
encoded content, I was expecting the tika extracted content in text/plain.
The question is, is there any way to get highlights as decoded text?


(David Pilato) #2

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this mapping
to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes", "term_vector"

: "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is still a
encoded content, I was expecting the tika extracted content in text/plain.
The question is, is there any way to get highlights as decoded text?


(Shairon Toledo) #3

Hi David,

My env is

elasticsearch-0.18.7
elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,****



Highlighting attachments works fine for me.****

It’s highlight the content of my base64 encoded files.****


What version do you use (ES and mapper-attachment-plugin) ?****





De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights****


Hi guys, ****


I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin ****


{****

"doc" : {****

"properties" : {****

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }****

 }****

}****

}****


The doc****


{****

'filename':'Redis manual.pdf',****

'size':61952,****

'folder':'/Users/shairon/References',****

'updated_at':'2011-12-07T22:02:33Z',****

'modified':'2011-12-07T22:02:33Z',****

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'****

}****


I search by****


{****

"query":{****

"term" : { "attachment": "redis"} ****

},****

"highlight" : {****

     "fields" : {****

         "attachment" : {}****

     }****

 }****

}****


I get the document properly, the highlight entry is ****


"highlight":{"filename":["Redis manual.pdf"] }****


I see that ES returns _source of the doc so the field attachment is still
a encoded content, I was expecting the tika extracted content in
text/plain. The question is, is there any way to get highlights as decoded
text?****




--
[ ]'s
Shairon Toledo


(David Pilato) #4

Ok. I think that you should consider to use the new plugin repository, even
if it should be the same content.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

I use this version (1.0.0).

Could you gist a curl recreation ? ( http://www.elasticsearch.org/help/
http://www.elasticsearch.org/help/)

Not sure I can answer right now (time to sleep here ;-)), but I could have a
look at it tomorrow (or perhaps someone else will answer before).

Some other question.

When you index your doc, do you see “updating mapping” in logs ?

Can you check also that your mapping is the one you think it is ?

I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string

David.

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights

Hi David,

My env is

elasticsearch-0.18.7

elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com [mailto:elasticsearch@googlegroups.com]
De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this mapping
to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes", "term_vector"

: "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is still a
encoded content, I was expecting the tika extracted content in text/plain.
The question is, is there any way to get highlights as decoded text?

--
[ ]'s
Shairon Toledo


(Shairon Toledo) #5

I removed /plugins/* and installed new one by

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a gist

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.****

https://github.com/elasticsearch/elasticsearch-mapper-attachments****


I use this version (1.0.0).****


Could you gist a curl recreation ? (http://www.elasticsearch.org/help/)***
*

Not sure I can answer right now (time to sleep here ;-)), but I could have
a look at it tomorrow (or perhaps someone else will answer before).****


Some other question.****

When you index your doc, do you see “updating mapping” in logs ?****

Can you check also that your mapping is the one you think it is ?****


I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string…****


David.****


De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights****


Hi David,****


My env is ****


elasticsearch-0.18.7****

elasticsearch-mapper-attachments-0.18.7.jar

thx****



On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:****

Hi Shairon,****



Highlighting attachments works fine for me.****

It’s highlight the content of my base64 encoded files.****


What version do you use (ES and mapper-attachment-plugin) ?****





De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights****


Hi guys, ****


I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin ****


{****

"doc" : {****

"properties" : {****

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }****

 }****

}****

}****


The doc****


{****

'filename':'Redis manual.pdf',****

'size':61952,****

'folder':'/Users/shairon/References',****

'updated_at':'2011-12-07T22:02:33Z',****

'modified':'2011-12-07T22:02:33Z',****

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'****

}****


I search by****


{****

"query":{****

"term" : { "attachment": "redis"} ****

},****

"highlight" : {****

     "fields" : {****

         "attachment" : {}****

     }****

 }****

}****


I get the document properly, the highlight entry is ****


"highlight":{"filename":["Redis manual.pdf"] }****


I see that ES returns _source of the doc so the field attachment is still
a encoded content, I was expecting the tika extracted content in
text/plain. The question is, is there any way to get highlights as decoded
text?****






--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo****

--
[ ]'s
Shairon Toledo


(Maurício Linhares) #6

Hi Shairon,

I could not reproduce your steps from the curl calls and I also could
not re-generate your pdf file from the base64 in there, so I'm
guessing there is something wrong with that base64 string and the
plugin is not being able to decode the file.

You can open config/logging.yml and change the root logger to DEBUG so
you can see what the plugin says when it receives the file.

In any case, I built a very simple example on how to index and search
for a document with the plugin enabled, you can check it out here ->

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr

On Mon, Jan 23, 2012 at 9:10 PM, Shairon Toledo
shairon.toledo@gmail.com wrote:

I removed /plugins/* and installed new one by

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a
gist https://gist.github.com/b457803a9358c6465578

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

I use this version (1.0.0).

Could you gist a curl recreation ? (http://www.elasticsearch.org/help/)

Not sure I can answer right now (time to sleep here ;-)), but I could have
a look at it tomorrow (or perhaps someone else will answer before).

Some other question.

When you index your doc, do you see “updating mapping” in logs ?

Can you check also that your mapping is the one you think it is ?

I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string…

David.

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights

Hi David,

My env is

elasticsearch-0.18.7

elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is still
a encoded content, I was expecting the tika extracted content in text/plain.
The question is, is there any way to get highlights as decoded text?

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo


(Shairon Toledo) #7

Indeed, curl wasn't good enough to test. Even using your project I get same
behavior here, no highlights for document content.
I've created a test case

I got success for filename field but red for
contenthttps://github.com/shairontoledo/elasticsearch-attachment-tests/blob/master/src/test/java/net/hashcode/esattach/AttachmentTest.java

//line 97
assertThat(search.hits().getAt(0).highlightFields().get(attachFiled),
notNullValue());
*
*
*
*
*
*
2012/1/24 Maurício Linhares mauricio.linhares@gmail.com

Hi Shairon,

I could not reproduce your steps from the curl calls and I also could
not re-generate your pdf file from the base64 in there, so I'm
guessing there is something wrong with that base64 string and the
plugin is not being able to decode the file.

You can open config/logging.yml and change the root logger to DEBUG so
you can see what the plugin says when it receives the file.

In any case, I built a very simple example on how to index and search
for a document with the plugin enabled, you can check it out here ->
https://github.com/mauricio/elasticsearch-with-attachment

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr

On Mon, Jan 23, 2012 at 9:10 PM, Shairon Toledo
shairon.toledo@gmail.com wrote:

I removed /plugins/* and installed new one by

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a
gist https://gist.github.com/b457803a9358c6465578

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

I use this version (1.0.0).

Could you gist a curl recreation ? (http://www.elasticsearch.org/help/)

Not sure I can answer right now (time to sleep here ;-)), but I could
have

a look at it tomorrow (or perhaps someone else will answer before).

Some other question.

When you index your doc, do you see “updating mapping” in logs ?

Can you check also that your mapping is the one you think it is ?

I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string…

David.

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights

Hi David,

My env is

elasticsearch-0.18.7

elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is
still

a encoded content, I was expecting the tika extracted content in
text/plain.

The question is, is there any way to get highlights as decoded text?

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo


(Maurício Linhares-2) #8

I see you are creating a Node, shouldnt you connect to the ES server that you installed the plugin?

This Node you are starting possibly does not have the plugin installed.

Enviado via iPad

Em 24/01/2012, às 09:49, Shairon Toledo shairon.toledo@gmail.com escreveu:

Indeed, curl wasn't good enough to test. Even using your project I get same behavior here, no highlights for document content.
I've created a test case https://github.com/shairontoledo/elasticsearch-attachment-tests

I got success for filename field but red for content

//line 97
assertThat(search.hits().getAt(0).highlightFields().get(attachFiled), notNullValue());

2012/1/24 Maurício Linhares mauricio.linhares@gmail.com
Hi Shairon,

I could not reproduce your steps from the curl calls and I also could
not re-generate your pdf file from the base64 in there, so I'm
guessing there is something wrong with that base64 string and the
plugin is not being able to decode the file.

You can open config/logging.yml and change the root logger to DEBUG so
you can see what the plugin says when it receives the file.

In any case, I built a very simple example on how to index and search
for a document with the plugin enabled, you can check it out here ->
https://github.com/mauricio/elasticsearch-with-attachment

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr

On Mon, Jan 23, 2012 at 9:10 PM, Shairon Toledo
shairon.toledo@gmail.com wrote:

I removed /plugins/* and installed new one by

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a
gist https://gist.github.com/b457803a9358c6465578

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

I use this version (1.0.0).

Could you gist a curl recreation ? (http://www.elasticsearch.org/help/)

Not sure I can answer right now (time to sleep here ;-)), but I could have
a look at it tomorrow (or perhaps someone else will answer before).

Some other question.

When you index your doc, do you see “updating mapping” in logs ?

Can you check also that your mapping is the one you think it is ?

I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string…

David.

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights

Hi David,

My env is

elasticsearch-0.18.7

elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is still
a encoded content, I was expecting the tika extracted content in text/plain.
The question is, is there any way to get highlights as decoded text?

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo


(Shairon Toledo) #9

The plugin is registered properly, in log I can see it

[2012-01-24 12:28:43,054][INFO ][org.elasticsearch.plugins] [Chondu the
Mystic] loaded [mapper-attachments], sites []

Thanks,

2012/1/24 Maurício Linhares linhares.mauricio@gmail.com

I see you are creating a Node, shouldnt you connect to the ES server that
you installed the plugin?

This Node you are starting possibly does not have the plugin installed.

Enviado via iPad

Em 24/01/2012, às 09:49, Shairon Toledo shairon.toledo@gmail.com
escreveu:

Indeed, curl wasn't good enough to test. Even using your project I get
same behavior here, no highlights for document content.
I've created a test case
https://github.com/shairontoledo/elasticsearch-attachment-tests

I got success for filename field but red for contenthttps://github.com/shairontoledo/elasticsearch-attachment-tests/blob/master/src/test/java/net/hashcode/esattach/AttachmentTest.java

//line 97
assertThat(search.hits().getAt(0).highlightFields().get(attachFiled),
notNullValue());
*
*
*
*
*
*
2012/1/24 Maurício Linhares mauricio.linhares@gmail.com

Hi Shairon,

I could not reproduce your steps from the curl calls and I also could
not re-generate your pdf file from the base64 in there, so I'm
guessing there is something wrong with that base64 string and the
plugin is not being able to decode the file.

You can open config/logging.yml and change the root logger to DEBUG so
you can see what the plugin says when it receives the file.

In any case, I built a very simple example on how to index and search
for a document with the plugin enabled, you can check it out here ->
https://github.com/mauricio/elasticsearch-with-attachment

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr

On Mon, Jan 23, 2012 at 9:10 PM, Shairon Toledo
shairon.toledo@gmail.com wrote:

I removed /plugins/* and installed new one by

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a
gist https://gist.github.com/b457803a9358c6465578

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

I use this version (1.0.0).

Could you gist a curl recreation ? (http://www.elasticsearch.org/help/
)

Not sure I can answer right now (time to sleep here ;-)), but I could
have

a look at it tomorrow (or perhaps someone else will answer before).

Some other question.

When you index your doc, do you see “updating mapping” in logs ?

Can you check also that your mapping is the one you think it is ?

I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string…

David.

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights

Hi David,

My env is

elasticsearch-0.18.7

elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is
still

a encoded content, I was expecting the tika extracted content in
text/plain.

The question is, is there any way to get highlights as decoded text?

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo


(Maurício Linhares) #10

Not sure what it is then, it works on my end.

[]`s

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr

On Tue, Jan 24, 2012 at 11:31 AM, Shairon Toledo
shairon.toledo@gmail.com wrote:

The plugin is registered properly, in log I can see
ithttps://gist.github.com/42b8e0e8603670e16ab0

[2012-01-24 12:28:43,054][INFO ][org.elasticsearch.plugins] [Chondu the
Mystic] loaded [mapper-attachments], sites []

Thanks,

2012/1/24 Maurício Linhares linhares.mauricio@gmail.com

I see you are creating a Node, shouldnt you connect to the ES server that
you installed the plugin?

This Node you are starting possibly does not have the plugin installed.

Enviado via iPad

Em 24/01/2012, às 09:49, Shairon Toledo shairon.toledo@gmail.com
escreveu:

Indeed, curl wasn't good enough to test. Even using your project I get
same behavior here, no highlights for document content.
I've created a test
case https://github.com/shairontoledo/elasticsearch-attachment-tests

I got success for filename field but red for content

//line 97
assertThat(search.hits().getAt(0).highlightFields().get(attachFiled),
notNullValue());

2012/1/24 Maurício Linhares mauricio.linhares@gmail.com

Hi Shairon,

I could not reproduce your steps from the curl calls and I also could
not re-generate your pdf file from the base64 in there, so I'm
guessing there is something wrong with that base64 string and the
plugin is not being able to decode the file.

You can open config/logging.yml and change the root logger to DEBUG so
you can see what the plugin says when it receives the file.

In any case, I built a very simple example on how to index and search
for a document with the plugin enabled, you can check it out here ->
https://github.com/mauricio/elasticsearch-with-attachment

Maurício Linhares
http://techbot.me/ - http://twitter.com/#!/mauriciojr

On Mon, Jan 23, 2012 at 9:10 PM, Shairon Toledo
shairon.toledo@gmail.com wrote:

I removed /plugins/* and installed new one by

bin/plugin -install
elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a
gist https://gist.github.com/b457803a9358c6465578

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.

https://github.com/elasticsearch/elasticsearch-mapper-attachments

I use this version (1.0.0).

Could you gist a curl recreation ?
(http://www.elasticsearch.org/help/)

Not sure I can answer right now (time to sleep here ;-)), but I could
have
a look at it tomorrow (or perhaps someone else will answer before).

Some other question.

When you index your doc, do you see “updating mapping” in logs ?

Can you check also that your mapping is the one you think it is ?

I suspect that your mapping has not been taken into account and that
ES
thinks that your attachment field is only a string…

David.

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights

Hi David,

My env is

elasticsearch-0.18.7

elasticsearch-mapper-attachments-0.18.7.jar

thx

On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:

Hi Shairon,

Highlighting attachments works fine for me.

It’s highlight the content of my base64 encoded files.

What version do you use (ES and mapper-attachment-plugin) ?

De : elasticsearch@googlegroups.com
[mailto:elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights

Hi guys,

I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin

{

"doc" : {

"properties" : {

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }

 }

}

}

The doc

{

'filename':'Redis manual.pdf',

'size':61952,

'folder':'/Users/shairon/References',

'updated_at':'2011-12-07T22:02:33Z',

'modified':'2011-12-07T22:02:33Z',

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'

}

I search by

{

"query":{

"term" : { "attachment": "redis"}

},

"highlight" : {

     "fields" : {

         "attachment" : {}

     }

 }

}

I get the document properly, the highlight entry is

"highlight":{"filename":["Redis manual.pdf"] }

I see that ES returns _source of the doc so the field attachment is
still
a encoded content, I was expecting the tika extracted content in
text/plain.
The question is, is there any way to get highlights as decoded text?

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo


(Shairon Toledo) #11

I put attachment plugin working. The issues was related with mapping.
I've ported Lukáš Vlček's
scripthttp://www.elasticsearch.org/tutorials/2011/07/18/attachment-type-in-action.html
to
java, I'm sharing it in the branch
https://github.com/shairontoledo/elasticsearch-attachment-tests.

Thank you for attention.

On Mon, Jan 23, 2012 at 10:10 PM, Shairon Toledo
shairon.toledo@gmail.comwrote:

I removed /plugins/* and installed new one by

bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.0.0

No success.

I've put together the artifacts over a gist
https://gist.github.com/b457803a9358c6465578

Do you see any issue with my steps?

Thank for your help o/

On Mon, Jan 23, 2012 at 8:53 PM, David Pilato david@pilato.fr wrote:

Ok. I think that you should consider to use the new plugin repository,
even if it should be the same content.****

https://github.com/elasticsearch/elasticsearch-mapper-attachments****


I use this version (1.0.0).****


Could you gist a curl recreation ? (http://www.elasticsearch.org/help/)**
**

Not sure I can answer right now (time to sleep here ;-)), but I could
have a look at it tomorrow (or perhaps someone else will answer before).*



Some other question.****

When you index your doc, do you see “updating mapping” in logs ?****

Can you check also that your mapping is the one you think it is ?****


I suspect that your mapping has not been taken into account and that ES
thinks that your attachment field is only a string…****


David.****


De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 23:41
À : elasticsearch@googlegroups.com
Objet : Re: mapper-attachments and highlights****


Hi David,****


My env is ****


elasticsearch-0.18.7****

elasticsearch-mapper-attachments-0.18.7.jar

thx****



On Mon, Jan 23, 2012 at 8:24 PM, David Pilato david@pilato.fr wrote:***
*

Hi Shairon,****



Highlighting attachments works fine for me.****

It’s highlight the content of my base64 encoded files.****


What version do you use (ES and mapper-attachment-plugin) ?****





De : elasticsearch@googlegroups.com [mailto:
elasticsearch@googlegroups.com] De la part de Shairon Toledo
Envoyé : lundi 23 janvier 2012 22:56
À : elasticsearch@googlegroups.com
Objet : mapper-attachments and highlights****


Hi guys, ****


I got ES with mapper-attachments plugin running here, I've used this
mapping to the plugin ****


{****

"doc" : {****

"properties" : {****

  "attachment" : {"type" : "attachment", "store" : "yes",

"term_vector" : "with_positions_offsets" }****

 }****

}****

}****


The doc****


{****

'filename':'Redis manual.pdf',****

'size':61952,****

'folder':'/Users/shairon/References',****

'updated_at':'2011-12-07T22:02:33Z',****

'modified':'2011-12-07T22:02:33Z',****

'attachment' : '...JVBERi0xLjMKJcTl8uXrp/Og...base64'****

}****


I search by****


{****

"query":{****

"term" : { "attachment": "redis"} ****

},****

"highlight" : {****

     "fields" : {****

         "attachment" : {}****

     }****

 }****

}****


I get the document properly, the highlight entry is ****


"highlight":{"filename":["Redis manual.pdf"] }****


I see that ES returns _source of the doc so the field attachment is still
a encoded content, I was expecting the tika extracted content in
text/plain. The question is, is there any way to get highlights as decoded
text?****






--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo****

--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo

--
[ ]'s
Shairon Toledo


(system) #12