Attachment arabic

Hi all

In ES with mapper-attachment do we can index arabic pdf and search on it?

can you give me one example?(like java exapmle or with curl)

I test this mapping:

curl -XPUT 'http://localhost:9200/test/arabic/_mapping' -d '
{
"arabic" : {
"properties" : {
"content" : {"type" : "attachment", "analyzer" : "arabic" }
}
}
}
'

but I can't query on indexed pdf

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Could you GIST a full curl recreation of what you did so far?
How do you index your document? How do you search in it?

Note that you should probably use that kind of mapping:

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}
My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 2 juil. 2013 à 17:59, ghonsor sajad22@gmail.com a écrit :

Hi all

In ES with mapper-attachment do we can index arabic pdf and search on it?

can you give me one example?(like java exapmle or with curl)

I test this mapping:

curl -XPUT 'http://localhost:9200/test/arabic/_mapping' -d '
{
"arabic" : {
"properties" : {
"content" : {"type" : "attachment", "analyzer" : "arabic" }
}
}
}
'

but I can't query on indexed pdf

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi David

I use this mapping:
curl -XPUT 'http://localhost:9200/david/person/_mapping' -d '
{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}'

and make json from document using this script:

coded=cat $1.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

and indexing using this:

curl -XPUT localhost:9200/david/person/0 -d @json.file

and search with this from browser:

http://localhost:9200/david/person/_search?q=file:محمود&pretty=true

but the total number of hits is zero

can you help me?

On Tuesday, July 2, 2013 9:06:25 PM UTC+4:30, David Pilato wrote:

Could you GIST a full curl recreation of what you did so far?
How do you index your document? How do you search in it?

Note that you should probably use that kind of mapping:

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 2 juil. 2013 à 17:59, ghonsor <saj...@gmail.com <javascript:>> a écrit
:

Hi all

In ES with mapper-attachment do we can index arabic pdf and search on it?

can you give me one example?(like java exapmle or with curl)

I test this mapping:

curl -XPUT 'http://localhost:9200/test/arabic/_mapping' -d '
{
"arabic" : {
"properties" : {
"content" : {"type" : "attachment", "analyzer" : "arabic" }
}
}
}
'

but I can't query on indexed pdf

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Could you use the _analyze API to see how your extracted text is broken into tokens using arabic analyzer?

Something like:

curl -XGET 'localhost:9200/_analyze?analyzer=arabic' -d 'your text here'
It could give you clues

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 3 juil. 2013 à 13:28, ghonsor sajad22@gmail.com a écrit :

Hi David

I use this mapping:
curl -XPUT 'http://localhost:9200/david/person/_mapping' -d '
{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}'

and make json from document using this script:

coded=cat $1.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

and indexing using this:

curl -XPUT localhost:9200/david/person/0 -d @json.file

and search with this from browser:

http://localhost:9200/david/person/_search?q=file:محمود&pretty=true

but the total number of hits is zero

can you help me?

On Tuesday, July 2, 2013 9:06:25 PM UTC+4:30, David Pilato wrote:
Could you GIST a full curl recreation of what you did so far?
How do you index your document? How do you search in it?

Note that you should probably use that kind of mapping:

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}
My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 2 juil. 2013 à 17:59, ghonsor saj...@gmail.com a écrit :

Hi all

In ES with mapper-attachment do we can index arabic pdf and search on it?

can you give me one example?(like java exapmle or with curl)

I test this mapping:

curl -XPUT 'http://localhost:9200/test/arabic/_mapping' -d '
{
"arabic" : {
"properties" : {
"content" : {"type" : "attachment", "analyzer" : "arabic" }
}
}
}
'

but I can't query on indexed pdf

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I test it, but it can't found any query

Is mapper attachment support arabic pdf?( unless without arabic analyzer )

On Thursday, July 4, 2013 1:45:47 AM UTC+4:30, David Pilato wrote:

Could you use the _analyze API to see how your extracted text is broken
into tokens using arabic analyzer?

Something like:

curl -XGET 'localhost:9200/_analyze?analyzer=arabic' -d 'your text here'

It could give you clues

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 3 juil. 2013 à 13:28, ghonsor <saj...@gmail.com <javascript:>> a écrit
:

Hi David

I use this mapping:
curl -XPUT 'http://localhost:9200/david/person/_mapping' -d '
{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}'

and make json from document using this script:

coded=cat $1.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'
json="{"file":"${coded}"}"
echo "$json" > json.file

and indexing using this:

curl -XPUT localhost:9200/david/person/0 -d @json.file

and search with this from browser:

http://localhost:9200/david/person/_search?q=file:محمود&pretty=true

but the total number of hits is zero

can you help me?

On Tuesday, July 2, 2013 9:06:25 PM UTC+4:30, David Pilato wrote:

Could you GIST a full curl recreation of what you did so far?
How do you index your document? How do you search in it?

Note that you should probably use that kind of mapping:

{
"person" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"file" : { "analyzer" : "arabic" },
"date" : { "store" : "yes" },
"author" : { "analyzer" : "arabic" }
}
}
}
}
}

My 2 cents

--
David Pilato | Technical Advocate | *Elasticsearch.comhttp://elasticsearch.com/
*
@dadoonet https://twitter.com/dadoonet | @elasticsearchfrhttps://twitter.com/elasticsearchfr
| @scrutmydocs https://twitter.com/scrutmydocs

Le 2 juil. 2013 à 17:59, ghonsor saj...@gmail.com a écrit :

Hi all

In ES with mapper-attachment do we can index arabic pdf and search on it?

can you give me one example?(like java exapmle or with curl)

I test this mapping:

curl -XPUT 'http://localhost:9200/test/arabic/_mapping' -d '
{
"arabic" : {
"properties" : {
"content" : {"type" : "attachment", "analyzer" : "arabic" }
}
}
}
'

but I can't query on indexed pdf

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.