Mapper-attachment support arabic

Hi all

I want to use mapper-attachment plugin to index arabic documents.

But after indexing document I can't search on it.

I think this is about arabic analyzer but when I define normal analyzer the
result is same:

curl -XDELETE localhost:9200/text

curl -XPUT 'localhost:9200/text' -d '
{
"settings":{
"analysis": {
"analyzer": {
"khofes":{
"type": "pattern",
"pattern":"\s+"
}
}
}
}
}'

curl -X PUT "localhost:9200/text/extracted/_mapping" -d '{
"extracted" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "analyzer" : "khofes", "term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'

echo 'سلام' > tmp

coded=cat tmp | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

curl -XPUT localhost:9200/text/extracted/0 -d @json.file

but when I get the content, the content is empty:

curl -XGET 'http://localhost:9200/text/_search?pretty=1' -d '
{
"fields": "file",
"highlight": {
"pre_tags": [""],
"post_tags": ["
"],
"fields": {
"file": {}
}
}
}'

So I think problem is about mapper attachment because this:

curl -XGET 'localhost:9200/text/_analyze?analyzer=khofes&pretty=true' -d
cat tmp

and I get this:

{
"tokens" : [ {
"token" : "سلام",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
} ]
}

so what can I do?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

maybe a silly question: You try to highlight above, but querying of
documents is not working as well (I just want to make sure it is not a
highlight issue).

--Alex

On Wed, Jul 10, 2013 at 7:17 AM, ghonsor sajad22@gmail.com wrote:

Hi all

I want to use mapper-attachment plugin to index arabic documents.

But after indexing document I can't search on it.

I think this is about arabic analyzer but when I define normal analyzer
the result is same:

curl -XDELETE localhost:9200/text

curl -XPUT 'localhost:9200/text' -d '
{
"settings":{
"analysis": {
"analyzer": {
"khofes":{
"type": "pattern",
"pattern":"\s+"
}
}
}
}
}'

curl -X PUT "localhost:9200/text/extracted/_mapping" -d '{
"extracted" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "analyzer" : "khofes", "term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'

echo 'سلام' > tmp

coded=cat tmp | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

curl -XPUT localhost:9200/text/extracted/0 -d @json.file

but when I get the content, the content is empty:

curl -XGET 'http://localhost:9200/text/_search?pretty=1' -d '
{
"fields": "file",
"highlight": {
"pre_tags": [""],
"post_tags": ["
"],
"fields": {
"file": {}
}
}
}'

So I think problem is about mapper attachment because this:

curl -XGET 'localhost:9200/text/_analyze?analyzer=khofes&pretty=true' -d
cat tmp

and I get this:

{
"tokens" : [ {
"token" : "سلام",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
} ]
}

so what can I do?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No, it isn't highlight issue
I test it without highlighting too:

curl -XGET 'http://localhost:9200/text/_search?pretty=1' -d '
{
"fields": "file"
}'

There is a point here:
when I change 'سلام' to 'hello' it's work very well so I guess the
mapper-attachment has problem with arabic. Is it right?

On Wednesday, July 10, 2013 1:27:49 PM UTC+4:30, Alexander Reelsen wrote:

Hey,

maybe a silly question: You try to highlight above, but querying of
documents is not working as well (I just want to make sure it is not a
highlight issue).

--Alex

On Wed, Jul 10, 2013 at 7:17 AM, ghonsor <saj...@gmail.com <javascript:>>wrote:

Hi all

I want to use mapper-attachment plugin to index arabic documents.

But after indexing document I can't search on it.

I think this is about arabic analyzer but when I define normal analyzer
the result is same:

curl -XDELETE localhost:9200/text

curl -XPUT 'localhost:9200/text' -d '
{
"settings":{
"analysis": {
"analyzer": {
"khofes":{
"type": "pattern",
"pattern":"\s+"
}
}
}
}
}'

curl -X PUT "localhost:9200/text/extracted/_mapping" -d '{
"extracted" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "analyzer" : "khofes", "term_vector":"with_positions_offsets",
"store":"yes"}
}
}
}
}
}'

echo 'سلام' > tmp

coded=cat tmp | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

curl -XPUT localhost:9200/text/extracted/0 -d @json.file

but when I get the content, the content is empty:

curl -XGET 'http://localhost:9200/text/_search?pretty=1' -d '
{
"fields": "file",
"highlight": {
"pre_tags": [""],
"post_tags": ["
"],
"fields": {
"file": {}
}
}
}'

So I think problem is about mapper attachment because this:

curl -XGET 'localhost:9200/text/_analyze?analyzer=khofes&pretty=true' -d
cat tmp

and I get this:

{
"tokens" : [ {
"token" : "سلام",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
} ]
}

so what can I do?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

can you file a bug report in the attachment mapper plugin repo (please
provide a full example, not sure if most of the european/us engineers are
able to reproduce your problem :slight_smile:

Thanks!

--Alex

On Wed, Jul 10, 2013 at 12:13 PM, ghonsor sajad22@gmail.com wrote:

No, it isn't highlight issue
I test it without highlighting too:

curl -XGET 'http://localhost:9200/text/_**search?pretty=1http://localhost:9200/text/_search?pretty=1'
-d '
{
"fields": "file"
}'

There is a point here:
when I change 'سلام' to 'hello' it's work very well so I guess the
mapper-attachment has problem with arabic. Is it right?

On Wednesday, July 10, 2013 1:27:49 PM UTC+4:30, Alexander Reelsen wrote:

Hey,

maybe a silly question: You try to highlight above, but querying of
documents is not working as well (I just want to make sure it is not a
highlight issue).

--Alex

On Wed, Jul 10, 2013 at 7:17 AM, ghonsor saj...@gmail.com wrote:

Hi all

I want to use mapper-attachment plugin to index arabic documents.

But after indexing document I can't search on it.

I think this is about arabic analyzer but when I define normal analyzer
the result is same:

curl -XDELETE localhost:9200/text

curl -XPUT 'localhost:9200/text' -d '
{
"settings":{
"analysis": {
"analyzer": {
"khofes":{
"type": "pattern",
"pattern":"\s+"
}
}
}
}
}'

curl -X PUT "localhost:9200/text/**extracted/mapping" -d '{
"extracted" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "analyzer" : "khofes", "term_vector":"with_positions
**offsets",
"store":"yes"}
}
}
}
}
}'

echo 'سلام' > tmp

coded=cat tmp | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

curl -XPUT localhost:9200/text/extracted/**0 -d @json.file

but when I get the content, the content is empty:

curl -XGET 'http://localhost:9200/text/_**search?pretty=1http://localhost:9200/text/_search?pretty=1'
-d '
{
"fields": "file",
"highlight": {
"pre_tags": [""],
"post_tags": ["
"],
"fields": {
"file": {}
}
}
}'

So I think problem is about mapper attachment because this:

curl -XGET 'localhost:9200/text/_analyze?**analyzer=khofes&pretty=true'
-d cat tmp

and I get this:

{
"tokens" : [ {
"token" : "سلام",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
} ]
}

so what can I do?

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.