ES query on raw subfield

valerioorfano · October 20, 2016, 11:24am

Hi there, I m struggling with raw not_analyzed subfield

i ve the following conf

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"claims_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

Now im tryin to query the raw field with match_phrase:

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT.raw": "piezoelettriche"
}
}
}

It doesnt work, Can i use my "raw" subfiel for "not_analyzed" query?

The same query on "wo-patent-document.description_IT" works!

Thank you very much

dadoonet · October 20, 2016, 11:46am

Please format your code.

And provide a typical document which should match.

valerioorfano · October 20, 2016, 12:19pm

Hi David,

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"description_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

I want to query using analyzed and not_analyzed field "description_IT"
The Query:

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}
retuns one document but it doesn t highlight the word "che" because stop word. This result is fine

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT.raw": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT.raw" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}
returns 0 documents. I would expect to return 1 document.

IT seems the raw subfield is not taken into account.

Content snippet of descriptio_IT field:

"Per aumentare la velocità di selezione ed incrementare \n25 \nla produzione si è sostituito l’utilizzo di tradizionali \n2\n \nelettromagneti di azionamento delle punte con soluzioni \nche impiegano ceramiche piezoelettriche, riducendo in \nquesto modo sia il tempo di attuazione che i consumi \nderivanti dalle bobine. \n5 \nQuesti attuatori sono dotati di lamine piezoelettriche \nche, a seconda della polarizzazione elettrica a cui sono \nsottoposte, si muovono in alto od in basso (considerando \nper semplicità espositiva le punte allineate in altezza \ne rivolte orizzontalmente),"

My ES version 2.3.4

Rgds valerio

valerioorfano · October 20, 2016, 12:22pm

Sorry but from the preview it seems to be ok. If you copy/paste on notepad you shouyld display the correct format

PUT /mise/wo-patent-document/_mapping
{
"wo-patent-document": {
"properties": {
"wo-patent-document": {
"properties": {
"claims_IT": {
"type": "string",
"analyzer": "italian",
"term_vector" : "with_positions_offsets",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"term_vector" : "with_positions_offsets",
"ignore_above" : 256
}
}
}
}
}
}
}
}

GET /mise/wo-patent-document/_search
{
"query": {
"match_phrase": {
"wo-patent-document.description_IT": "che impiegano ceramiche piezoelettriche"
}
},
"highlight" : {
"fields" : {
"wo-patent-document.description_IT" : {}
},
"fragment_size" : "300",
"number_of_fragments" : "1"
}
}

valerioorfano · October 21, 2016, 8:01am

Hi any news?

Any query run over the raw subfield return 0 docuemts. Even

GET /mise/wo-patent-document/_search
{
"query": {
"wildcard" : {
"wo-patent-document.description_IT.raw" : {
"value" : "*"
}
}
}
}

why????

dadoonet · October 21, 2016, 8:15am

Should I write this again?

You have here a typical example of what we would like to see when users are asking for help. About the Elasticsearch category

Can you provide such a script?
And please format your code using </> icon. It will make your post more readable.

valerioorfano · October 21, 2016, 1:24pm

No need.

I've fixed my issue.

Rgds valeiro

warkolm · October 21, 2016, 10:26pm

How did you fix it? Sharing may help others in future

valerioorfano · October 27, 2016, 10:33am

With pleasure

I' ve saved into my field "description" the text content of a pdf file so it is quite big.

I need to make multipe search over this field. One of this search is an "exact match phrase". Initially i thought to use a raw "not_indexed" field for this kind of query (description.raw). For relatively small description field it works fine, but when the content of the description field is big , the content of the description.raw field is big too, and query over the raw field returns 0 document, always!! Or at least this is what i tested (even with "ignore_above": 256 )

I realized that for exact match phrase i could simply use a normal string field with "standard" tokenizer. And i decided to go for it!

Any thought?

Rgds valerio

Topic		Replies	Views
Wildcard search on raw not_analyzed field Elasticsearch	5	3238	July 6, 2017
Creating a raw field mapping searchable by regexes with spaces Elasticsearch	2	908	August 31, 2020
[SOLVED] Raw field Elasticsearch	2	794	July 5, 2017
Search for exact phrase in not_analyzed (keyword) field Elasticsearch	1	338	May 22, 2019
How I can do exact search by "not_analyzed" fields? Elasticsearch	4	2889	July 6, 2017

ES query on raw subfield

Related topics