ES query on raw subfield


(valerio) #1

Hi there, I m struggling with raw not_analyzed subfield

i ve the following conf

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"claims_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

Now im tryin to query the raw field with match_phrase:

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT.raw": "piezoelettriche"
}
}
}

It doesnt work, Can i use my "raw" subfiel for "not_analyzed" query?

The same query on "wo-patent-document.description_IT" works!

Thank you very much


(David Pilato) #2

Please format your code.

And provide a typical document which should match.


(valerio) #3

Hi David,

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"description_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

I want to query using analyzed and not_analyzed field "description_IT"
The Query:

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}
retuns one document but it doesn t highlight the word "che" because stop word. This result is fine

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT.raw": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT.raw" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}
returns 0 documents. I would expect to return 1 document.

IT seems the raw subfield is not taken into account.

Content snippet of descriptio_IT field:

"Per aumentare la velocità di selezione ed incrementare \n25 \nla produzione si è sostituito l’utilizzo di tradizionali \n2\n \nelettromagneti di azionamento delle punte con soluzioni \nche impiegano ceramiche piezoelettriche, riducendo in \nquesto modo sia il tempo di attuazione che i consumi \nderivanti dalle bobine. \n5 \nQuesti attuatori sono dotati di lamine piezoelettriche \nche, a seconda della polarizzazione elettrica a cui sono \nsottoposte, si muovono in alto od in basso (considerando \nper semplicità espositiva le punte allineate in altezza \ne rivolte orizzontalmente),"

My ES version 2.3.4

Rgds valerio


(valerio) #4

Sorry but from the preview it seems to be ok. If you copy/paste on notepad you shouyld display the correct format

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"claims_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}


(valerio) #5

Hi any news?

Any query run over the raw subfield return 0 docuemts. Even

GET /mise/wo-patent-document/_search
{
"query": {
"wildcard" : {
"wo-patent-document.description_IT.raw" : {
"value" : "*"
}
}
}
}

why????


(David Pilato) #6

Should I write this again?

You have here a typical example of what we would like to see when users are asking for help. About the Elasticsearch category

Can you provide such a script?
And please format your code using </> icon. It will make your post more readable.


(valerio) #7

No need.

I've fixed my issue.

Rgds valeiro


(Mark Walkom) #8

How did you fix it? Sharing may help others in future :slight_smile:


(valerio) #9

With pleasure :smile:

I' ve saved into my field "description" the text content of a pdf file so it is quite big.

I need to make multipe search over this field. One of this search is an "exact match phrase". Initially i thought to use a raw "not_indexed" field for this kind of query (description.raw). For relatively small description field it works fine, but when the content of the description field is big , the content of the description.raw field is big too, and query over the raw field returns 0 document, always!! Or at least this is what i tested (even with "ignore_above": 256 )

I realized that for exact match phrase i could simply use a normal string field with "standard" tokenizer. And i decided to go for it!

Any thought?

Rgds valerio


(system) #10