How to examine the _all field


(dvd) #1

There is a way to dump the _all field associated to an indexed
document?

I'm trying to debug strange behavior when I query an index; this query
produces no results

{query: {"query_string": {"query": "stile"}}}

when this one instead return the expected document

{query: {"query_string": {"query": "description:stile"}}}


(Nick Hoffman) #2

What're your index settings, mapping, and document?


(dvd) #3

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(darkyoung) #4

I have the same problem with you. At some time the _all field is
unsearchable. So I have to build queries to search every fields instead
_all field.

在 2011年11月17日星期四UTC+8上午6时50分09秒,dvd写道:

There is a way to dump the _all field associated to an indexed
document?

I'm trying to debug strange behavior when I query an index; this query
produces no results

{query: {"query_string": {"query": "stile"}}}

when this one instead return the expected document

{query: {"query_string": {"query": "description:stile"}}}


(darkyoung) #5

I have the same problem with you. At some time the _all field is
unsearchable. So I have to build queries to search every fields instead
_all field.

On Thursday, November 17, 2011 6:50:09 AM UTC+8, dvd wrote:

There is a way to dump the _all field associated to an indexed
document?

I'm trying to debug strange behavior when I query an index; this query
produces no results

{query: {"query_string": {"query": "stile"}}}

when this one instead return the expected document

{query: {"query_string": {"query": "description:stile"}}}


(medcl.net) #6

this is the way how i solved the problem:

first way:
explicit set the default analyzer in config “elasticsearch.yml”, like:

index.analysis.analyzer.default.type : "your analyzer"

another way:

explicit set the analyzer against “_all” in your mapping ,like :

{
"YourType": {
"_source": {
"enabled": false
},
"_all": {
"indexAnalyzer": "your_index_analyzer",
"searchAnalyzer": "your_search_analyzer",
"term_vector": "no",
"store": "false"
},
"properties": {
"yourFiled": {
"type": "long",
"store": "yes"
}
}
}
}

you can try it yourself.

yours sincerely,
medcl@github
http://log.medcl.net

From: Ocean Wu
Sent: Thursday, November 17, 2011 10:45 AM
To: elasticsearch@googlegroups.com
Subject: Re: how to examine the _all field

I have the same problem with you. At some time the _all field is unsearchable. So I have to build queries to search every fields instead _all field.

On Thursday, November 17, 2011 6:50:09 AM UTC+8, dvd wrote:
There is a way to dump the _all field associated to an indexed
document?

I'm trying to debug strange behavior when I query an index; this query
produces no results

{query: {"query_string": {"query": "stile"}}}

when this one instead return the expected document

{query: {"query_string": {"query": "description:stile"}}}


(Shay Banon) #7

Can you try and explicitly set the italian analyzer on the _all field as
well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd dvd@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(Shay Banon) #8

I double checked on 0.18.3, and I see the _all field using the correct
analyzer (the custom default one), though I suspect that another bug fix in
0.18.3 fixed this one as well indadvertedly. In any case, I pushed a change
that will make sure this will not happen, regardless of the other fix.

If you still have problems with 0.18.3, gist a simple recreation, with your
config, some sample curl requests that index data, and then your search
requests that fail.

-shay.banon

On Thu, Nov 17, 2011 at 3:23 PM, Shay Banon kimchy@gmail.com wrote:

Can you try and explicitly set the italian analyzer on the _all field as
well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd dvd@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(seanwalbran) #9

I'd had the same issue with 0.18.2, and am still seeing it in 0.18.4.
It doesn't appear to be always reproducible, so I don't have a clean/
simple repro script, but here's a gist with some (partially elided)
details:

On Nov 17, 7:40 am, Shay Banon kim...@gmail.com wrote:

I double checked on 0.18.3, and I see the _all field using the correct
analyzer (the custom default one), though I suspect that another bug fix in
0.18.3 fixed this one as well indadvertedly. In any case, I pushed a change
that will make sure this will not happen, regardless of the other fix.

If you still have problems with 0.18.3, gist a simple recreation, with your
config, some sample curl requests that index data, and then your search
requests that fail.

-shay.banon

On Thu, Nov 17, 2011 at 3:23 PM, Shay Banon kim...@gmail.com wrote:

Can you try and explicitly set the italian analyzer on the _all field as
well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd d...@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(dvd) #10

Hi Shay,

thank you for your time!

I've run a test with the new version (0.18.4) and, on a newly created
index, it seems to work fine!

Unfortunately, querying the old index doesn't work until I indexed
again the document.

My main index has more than 900k documents, is there a way to rebuild
this index without submit all documents again?

david

On Nov 17, 2:40 pm, Shay Banon kim...@gmail.com wrote:

I double checked on 0.18.3, and I see the _all field using the correct
analyzer (the custom default one), though I suspect that another bug fix in
0.18.3 fixed this one as well indadvertedly. In any case, I pushed a change
that will make sure this will not happen, regardless of the other fix.

If you still have problems with 0.18.3, gist a simple recreation, with your
config, some sample curl requests that index data, and then your search
requests that fail.

-shay.banon

On Thu, Nov 17, 2011 at 3:23 PM, Shay Banon kim...@gmail.com wrote:

Can you try and explicitly set the italian analyzer on the _all field as
well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd d...@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(dvd) #11

Looks like that I claim victory too soon :frowning:

After a night spent indexing all my documents I'm stuck with the same
problem:

curl -XPOST http://localhost:9200/alessandria/book/_search? -d
'{ query: { "query_string": { "query": "cervello" } } }'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

but the document, one of the many with the word "cervello", exists

curl -XGET http://localhost:9200/alessandria/book/9788862204781
{"_index":"alessandria","_type":"book","_id":"9788862204781","_version":
4,"exists":true, "_source" : {... "title": "Il cervello e l'arte di
imparare. Apprendimento e memoria nello sviluppo del bambino", ... }}

Following an advice on this thread I added an explicit mapping for the
_all field, but with no luck.

After a reindex of the 9788862204781 document and a server restart the
search finally returns one, and only one, results.

thank you for any advice
david

On Nov 17, 10:47 pm, dvd d...@gnx.it wrote:

Hi Shay,

thank you for your time!

I've run a test with the new version (0.18.4) and, on a newly created
index, it seems to work fine!

Unfortunately, querying the old index doesn't work until I indexed
again the document.

My main index has more than 900k documents, is there a way to rebuild
this index without submit all documents again?

david

On Nov 17, 2:40 pm, Shay Banon kim...@gmail.com wrote:

I double checked on 0.18.3, and I see the _all field using the correct
analyzer (the custom default one), though I suspect that another bug fix in
0.18.3 fixed this one as well indadvertedly. In any case, I pushed a change
that will make sure this will not happen, regardless of the other fix.

If you still have problems with 0.18.3, gist a simple recreation, with your
config, some sample curl requests that index data, and then your search
requests that fail.

-shay.banon

On Thu, Nov 17, 2011 at 3:23 PM, Shay Banon kim...@gmail.com wrote:

Can you try and explicitly set the italian analyzer on the _all field as
well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd d...@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(Shay Banon) #12

Can you gist a proper recreation? one that has curl requests that create
the index with the relevant settings, index some sample data, and then
issue a sample search requests that fail?

On Thu, Nov 17, 2011 at 10:15 PM, seanwalbran seanwalbran@gmail.com wrote:

I'd had the same issue with 0.18.2, and am still seeing it in 0.18.4.
It doesn't appear to be always reproducible, so I don't have a clean/
simple repro script, but here's a gist with some (partially elided)
details:

https://gist.github.com/1374346

On Nov 17, 7:40 am, Shay Banon kim...@gmail.com wrote:

I double checked on 0.18.3, and I see the _all field using the correct
analyzer (the custom default one), though I suspect that another bug fix
in
0.18.3 fixed this one as well indadvertedly. In any case, I pushed a
change
that will make sure this will not happen, regardless of the other fix.

If you still have problems with 0.18.3, gist a simple recreation, with
your
config, some sample curl requests that index data, and then your search
requests that fail.

-shay.banon

On Thu, Nov 17, 2011 at 3:23 PM, Shay Banon kim...@gmail.com wrote:

Can you try and explicitly set the italian analyzer on the _all field
as

well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd d...@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(Shay Banon) #13

Gist a curl recreation, with sample indexing and index configuration. More
info here: http://www.elasticsearch.org/help.

On Fri, Nov 18, 2011 at 11:43 PM, dvd dvd@gnx.it wrote:

Looks like that I claim victory too soon :frowning:

After a night spent indexing all my documents I'm stuck with the same
problem:

curl -XPOST http://localhost:9200/alessandria/book/_search? -d
'{ query: { "query_string": { "query": "cervello" } } }'
{"took":3,"timed_out":false,"_shards":{"total":5,"successful":
5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

but the document, one of the many with the word "cervello", exists

curl -XGET http://localhost:9200/alessandria/book/9788862204781
{"_index":"alessandria","_type":"book","_id":"9788862204781","_version":
4,"exists":true, "_source" : {... "title": "Il cervello e l'arte di
imparare. Apprendimento e memoria nello sviluppo del bambino", ... }}

Following an advice on this thread I added an explicit mapping for the
_all field, but with no luck.

After a reindex of the 9788862204781 document and a server restart the
search finally returns one, and only one, results.

thank you for any advice
david

On Nov 17, 10:47 pm, dvd d...@gnx.it wrote:

Hi Shay,

thank you for your time!

I've run a test with the new version (0.18.4) and, on a newly created
index, it seems to work fine!

Unfortunately, querying the old index doesn't work until I indexed
again the document.

My main index has more than 900k documents, is there a way to rebuild
this index without submit all documents again?

david

On Nov 17, 2:40 pm, Shay Banon kim...@gmail.com wrote:

I double checked on 0.18.3, and I see the _all field using the correct
analyzer (the custom default one), though I suspect that another bug
fix in

0.18.3 fixed this one as well indadvertedly. In any case, I pushed a
change

that will make sure this will not happen, regardless of the other fix.

If you still have problems with 0.18.3, gist a simple recreation, with
your

config, some sample curl requests that index data, and then your search
requests that fail.

-shay.banon

On Thu, Nov 17, 2011 at 3:23 PM, Shay Banon kim...@gmail.com wrote:

Can you try and explicitly set the italian analyzer on the _all
field as

well, there might be a problem where the _all field will not use the
default analyzer configured for an index.

On Thu, Nov 17, 2011 at 1:19 AM, dvd d...@gnx.it wrote:

index settings:

{
"number_of_shards" : 5,
"analysis": {
"analyzer": {
"default": {
"alias": ["italian"],
"type": "italian"
}
}
}
}

type mapping

{
'properties': {
'ean': {
'type': 'string',
'index': 'not_analyzed',
},
'title': {
'type': 'string',
'index': 'analyzed',
'boost': 5.0,
},
'authors': {
'properties': {
'name': {
'type': 'string',
'index': 'not_analyzed',
'boost': 2.0,
},
},
},
'publisher': {
'type': 'string',
'index': 'not_analyzed',
},
'description': {
'type': 'string',
'index': 'analyzed',
},
},
}

and document structure:

{
'authors': [{'name': ''}],
'description': '',
'ean': '',
'publisher': {'code': '', 'name': ''},
'title': ''
}


(dylanhay) #14

Searching across every field is going to yield lower performance than a properly structured query against the fields you actually want to check. The only way to achieve this is to use the GroupedOr statement, passing in each field and term you want to search against.

EAN


(system) #15