ES query on raw subfield

Hi there, I m struggling with raw not_analyzed subfield

i ve the following conf

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"claims_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

Now im tryin to query the raw field with match_phrase:

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT.raw": "piezoelettriche"
}
}
}

It doesnt work, Can i use my "raw" subfiel for "not_analyzed" query?

The same query on "wo-patent-document.description_IT" works!

Thank you very much

Please format your code.

And provide a typical document which should match.

Hi David,

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"description_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

I want to query using analyzed and not_analyzed field "description_IT"
The Query:

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}
retuns one document but it doesn t highlight the word "che" because stop word. This result is fine

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT.raw": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT.raw" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}
returns 0 documents. I would expect to return 1 document.

IT seems the raw subfield is not taken into account.

Content snippet of descriptio_IT field:

"Per aumentare la velocità di selezione ed incrementare \n25 \nla produzione si è sostituito l’utilizzo di tradizionali \n2\n \nelettromagneti di azionamento delle punte con soluzioni \nche impiegano ceramiche piezoelettriche, riducendo in \nquesto modo sia il tempo di attuazione che i consumi \nderivanti dalle bobine. \n5 \nQuesti attuatori sono dotati di lamine piezoelettriche \nche, a seconda della polarizzazione elettrica a cui sono \nsottoposte, si muovono in alto od in basso (considerando \nper semplicità espositiva le punte allineate in altezza \ne rivolte orizzontalmente),"

My ES version 2.3.4

Rgds valerio

Sorry but from the preview it seems to be ok. If you copy/paste on notepad you shouyld display the correct format

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"claims_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}

Hi any news?

Any query run over the raw subfield return 0 docuemts. Even

GET /mise/wo-patent-document/_search
{
"query": {
"wildcard" : {
"wo-patent-document.description_IT.raw" : {
"value" : "*"
}
}
}
}

why????

Should I write this again?

You have here a typical example of what we would like to see when users are asking for help. About the Elasticsearch category

Can you provide such a script?
And please format your code using </> icon. It will make your post more readable.

No need.

I've fixed my issue.

Rgds valeiro

How did you fix it? Sharing may help others in future :slight_smile:

1 Like

With pleasure :smile:

I' ve saved into my field "description" the text content of a pdf file so it is quite big.

I need to make multipe search over this field. One of this search is an "exact match phrase". Initially i thought to use a raw "not_indexed" field for this kind of query (description.raw). For relatively small description field it works fine, but when the content of the description field is big , the content of the description.raw field is big too, and query over the raw field returns 0 document, always!! Or at least this is what i tested (even with "ignore_above": 256 )

I realized that for exact match phrase i could simply use a normal string field with "standard" tokenizer. And i decided to go for it!

Any thought?

Rgds valerio