Analyzer disappears after server restart (includes gist)

I have an analyzer:

"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "trim"]
}
}

that is referenced in a mapping:

"location_countries" : {
"properties" : {
"country" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}
}

And when I use the 'country' field in a filter or a facet, the field is
(correctly) treated as a keyword.

curl -XGET 'localhost:9200/clinical_trials/_search?pretty=true' -d '
{
"query" : {
"term" : { "brief_title" : "dermatitis" }
},
"filter" : {
"term" : { "country" : "united states" }
},
"facets" : {
"tag" : {
"terms" : { "field" : "country" }
}
}
}
'

facet results:

"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "united states",
"count" : 1
} ]
}

Everything works fine until the machine gets rebooted or the Elastic
Search service gets restarted. After a restart, all my filters stop working
as if the analyzer does not exist.

The same query against the same data results in:
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 2,
"other" : 0,
"terms" : [ {
"term" : "united",
"count" : 1
}, {
"term" : "states",
"count" : 1
} ]
}

If I query the _settings/_mappings of my index, the analyzer and mappings
are still defined correctly but the analyzer seems to have no effect.

What am I doing wrong?

Thanks in advance,

Kevin

git://gist.github.com/3442562.git

--

I get the same behavior in multiple versions of ES, including today's
0.19.9.

Kevin

On Thursday, August 23, 2012 3:24:20 PM UTC-7, Kevin Lawrence wrote:

I have an analyzer:

"analyzer" : {
"lowercase_keyword" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["lowercase", "trim"]
}
}

that is referenced in a mapping:

"location_countries" : {
"properties" : {
"country" : {
"type" : "string",
"analyzer" : "lowercase_keyword"
}
}
}

And when I use the 'country' field in a filter or a facet, the field is
(correctly) treated as a keyword.

curl -XGET 'localhost:9200/clinical_trials/_search?pretty=true' -d '
{
"query" : {
"term" : { "brief_title" : "dermatitis" }
},
"filter" : {
"term" : { "country" : "united states" }
},
"facets" : {
"tag" : {
"terms" : { "field" : "country" }
}
}
}
'

facet results:

"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 1,
"other" : 0,
"terms" : [ {
"term" : "united states",
"count" : 1
} ]
}

Everything works fine until the machine gets rebooted or the Elastic
Search service gets restarted. After a restart, all my filters stop working
as if the analyzer does not exist.

The same query against the same data results in:
"facets" : {
"tag" : {
"_type" : "terms",
"missing" : 0,
"total" : 2,
"other" : 0,
"terms" : [ {
"term" : "united",
"count" : 1
}, {
"term" : "states",
"count" : 1
} ]
}

If I query the _settings/_mappings of my index, the analyzer and mappings
are still defined correctly but the analyzer seems to have no effect.

What am I doing wrong?

Thanks in advance,

Kevin

git://gist.github.com/3442562.git

--

Hi Kevin

As I said yesterday, please gist a COMPLETE curl recreation, ie
something that we can copy and paste to make it work, from index
creation onwards. Snippets of information as below are insufficient.

Also, please gist your elasticsearch config, as that may have some
impact.

ta

clint

On Thu, 2012-08-23 at 15:27 -0700, Kevin Lawrence wrote:

I get the same behavior in multiple versions of ES, including today's
0.19.9.

Kevin

On Thursday, August 23, 2012 3:24:20 PM UTC-7, Kevin Lawrence wrote:
I have an analyzer:

    "analyzer" : {
    "lowercase_keyword" : {
    "type" : "custom",
    "tokenizer" : "keyword",
    "filter" : ["lowercase", "trim"]
    }
    }
    
    
    that is referenced in a mapping:
    
    
    "location_countries" : {
    "properties" : {
    "country" : {
    "type" : "string",
    "analyzer" : "lowercase_keyword"
    }
    }
    }
    
    
    And when I use the 'country' field in a filter or a facet, the
    field is (correctly) treated as a keyword.
    
    
    curl -XGET
    'localhost:9200/clinical_trials/_search?pretty=true' -d '
    {
        "query" : {
            "term" : { "brief_title" : "dermatitis" }
        },
        "filter" : {
            "term" : { "country" : "united states" }
        },
        "facets" : {
            "tag" : {
                "terms" : { "field" : "country" }
            }
        }
    }
    '
    
    
    
    facet results:
    
    
    "facets" : {
        "tag" : {
          "_type" : "terms",
          "missing" : 0,
          "total" : 1,
          "other" : 0,
          "terms" : [ {
            "term" : "united states",
            "count" : 1
          } ]
        }
    
    
    
    
    Everything works fine until the machine gets rebooted or the
    Elastic Search service gets restarted. After a restart, all my
    filters stop working as if the analyzer does not exist.
    
    
    The same query against the same data results in:
    "facets" : {
        "tag" : {
          "_type" : "terms",
          "missing" : 0,
          "total" : 2,
          "other" : 0,
          "terms" : [ {
            "term" : "united",
            "count" : 1
          }, {
            "term" : "states",
            "count" : 1
          } ]
        }
    
    
    
    
    If I query the _settings/_mappings of my index, the analyzer
    and mappings are still defined correctly but the analyzer
    seems to have no effect. 
    
    What am I doing wrong? 
    
    Thanks in advance, 
    
    Kevin 
    
    
    
    git://gist.github.com/3442562.git

--

--

Thanks Clint,

I added my config file to the gist at:

but the only things changed from the default are path.data and cluster.name.

Kevin

On Thursday, August 23, 2012 11:34:59 PM UTC-7, Clinton Gormley wrote:

Hi Kevin

As I said yesterday, please gist a COMPLETE curl recreation, ie
something that we can copy and paste to make it work, from index
creation onwards. Snippets of information as below are insufficient.

Also, please gist your elasticsearch config, as that may have some
impact.

ta

clint

On Thu, 2012-08-23 at 15:27 -0700, Kevin Lawrence wrote:

I get the same behavior in multiple versions of ES, including today's
0.19.9.

Kevin

On Thursday, August 23, 2012 3:24:20 PM UTC-7, Kevin Lawrence wrote:
I have an analyzer:

    "analyzer" : { 
    "lowercase_keyword" : { 
    "type" : "custom", 
    "tokenizer" : "keyword", 
    "filter" : ["lowercase", "trim"] 
    } 
    } 
    
    
    that is referenced in a mapping: 
    
    
    "location_countries" : { 
    "properties" : { 
    "country" : { 
    "type" : "string", 
    "analyzer" : "lowercase_keyword" 
    } 
    } 
    } 
    
    
    And when I use the 'country' field in a filter or a facet, the 
    field is (correctly) treated as a keyword. 
    
    
    curl -XGET 
    'localhost:9200/clinical_trials/_search?pretty=true' -d ' 
    { 
        "query" : { 
            "term" : { "brief_title" : "dermatitis" } 
        }, 
        "filter" : { 
            "term" : { "country" : "united states" } 
        }, 
        "facets" : { 
            "tag" : { 
                "terms" : { "field" : "country" } 
            } 
        } 
    } 
    ' 
    
    
    
    facet results: 
    
    
    "facets" : { 
        "tag" : { 
          "_type" : "terms", 
          "missing" : 0, 
          "total" : 1, 
          "other" : 0, 
          "terms" : [ { 
            "term" : "united states", 
            "count" : 1 
          } ] 
        } 
    
    
    
    
    Everything works fine until the machine gets rebooted or the 
    Elastic Search service gets restarted. After a restart, all my 
    filters stop working as if the analyzer does not exist. 
    
    
    The same query against the same data results in: 
    "facets" : { 
        "tag" : { 
          "_type" : "terms", 
          "missing" : 0, 
          "total" : 2, 
          "other" : 0, 
          "terms" : [ { 
            "term" : "united", 
            "count" : 1 
          }, { 
            "term" : "states", 
            "count" : 1 
          } ] 
        } 
    
    
    
    
    If I query the _settings/_mappings of my index, the analyzer 
    and mappings are still defined correctly but the analyzer 
    seems to have no effect. 
    
    What am I doing wrong? 
    
    Thanks in advance, 
    
    Kevin 
    
    
    
    git://gist.github.com/3442562.git 

--

--

It appears that issue is that an analyzer that is created as part of a
create index request is not being saved. I am not familiar with the
index metadata, but it appears that analyzers are not persisted and
are re-read entirely from the yml file on startup. Haven't tried your
gist, but it seems correct.

The immediate workaround would be to define the analyzer in
elasticseach.yml. Persisting analyzer information to the index
metadata should be supported IMHO. I would open an issue and see what
Shay thinks.

Ivan

On Fri, Aug 24, 2012 at 9:57 AM, Kevin Lawrence kevin@diamond-sky.com wrote:

Thanks Clint,

I added my config file to the gist at:

ElasticSearch forgets my analyzer · GitHub

but the only things changed from the default are path.data and cluster.name.

Kevin

On Thursday, August 23, 2012 11:34:59 PM UTC-7, Clinton Gormley wrote:

Hi Kevin

As I said yesterday, please gist a COMPLETE curl recreation, ie
something that we can copy and paste to make it work, from index
creation onwards. Snippets of information as below are insufficient.

Also, please gist your elasticsearch config, as that may have some
impact.

ta

clint

On Thu, 2012-08-23 at 15:27 -0700, Kevin Lawrence wrote:

I get the same behavior in multiple versions of ES, including today's
0.19.9.

Kevin

On Thursday, August 23, 2012 3:24:20 PM UTC-7, Kevin Lawrence wrote:
I have an analyzer:

    "analyzer" : {
    "lowercase_keyword" : {
    "type" : "custom",
    "tokenizer" : "keyword",
    "filter" : ["lowercase", "trim"]
    }
    }


    that is referenced in a mapping:


    "location_countries" : {
    "properties" : {
    "country" : {
    "type" : "string",
    "analyzer" : "lowercase_keyword"
    }
    }
    }


    And when I use the 'country' field in a filter or a facet, the
    field is (correctly) treated as a keyword.


    curl -XGET
    'localhost:9200/clinical_trials/_search?pretty=true' -d '
    {
        "query" : {
            "term" : { "brief_title" : "dermatitis" }
        },
        "filter" : {
            "term" : { "country" : "united states" }
        },
        "facets" : {
            "tag" : {
                "terms" : { "field" : "country" }
            }
        }
    }
    '



    facet results:


    "facets" : {
        "tag" : {
          "_type" : "terms",
          "missing" : 0,
          "total" : 1,
          "other" : 0,
          "terms" : [ {
            "term" : "united states",
            "count" : 1
          } ]
        }




    Everything works fine until the machine gets rebooted or the
    Elastic Search service gets restarted. After a restart, all my
    filters stop working as if the analyzer does not exist.


    The same query against the same data results in:
    "facets" : {
        "tag" : {
          "_type" : "terms",
          "missing" : 0,
          "total" : 2,
          "other" : 0,
          "terms" : [ {
            "term" : "united",
            "count" : 1
          }, {
            "term" : "states",
            "count" : 1
          } ]
        }




    If I query the _settings/_mappings of my index, the analyzer
    and mappings are still defined correctly but the analyzer
    seems to have no effect.

    What am I doing wrong?

    Thanks in advance,

    Kevin



    git://gist.github.com/3442562.git

--

--

--

Thanks Ivan.

I suspect it's a bit more complex than that. When I query the _settings for
my index after the restart, I still see my custom analyzer but the analyzer
is not used.

I plan to download the source for ES this afternoon to see if I can figure
it out.

Kevin

On Friday, August 24, 2012 11:37:56 AM UTC-7, Ivan Brusic wrote:

It appears that issue is that an analyzer that is created as part of a
create index request is not being saved. I am not familiar with the
index metadata, but it appears that analyzers are not persisted and
are re-read entirely from the yml file on startup. Haven't tried your
gist, but it seems correct.

The immediate workaround would be to define the analyzer in
elasticseach.yml. Persisting analyzer information to the index
metadata should be supported IMHO. I would open an issue and see what
Shay thinks.

Ivan

On Fri, Aug 24, 2012 at 9:57 AM, Kevin Lawrence <ke...@diamond-sky.com<javascript:>>
wrote:

Thanks Clint,

I added my config file to the gist at:

ElasticSearch forgets my analyzer · GitHub

but the only things changed from the default are path.data and
cluster.name.

Kevin

On Thursday, August 23, 2012 11:34:59 PM UTC-7, Clinton Gormley wrote:

Hi Kevin

As I said yesterday, please gist a COMPLETE curl recreation, ie
something that we can copy and paste to make it work, from index
creation onwards. Snippets of information as below are insufficient.

Also, please gist your elasticsearch config, as that may have some
impact.

ta

clint

On Thu, 2012-08-23 at 15:27 -0700, Kevin Lawrence wrote:

I get the same behavior in multiple versions of ES, including today's
0.19.9.

Kevin

On Thursday, August 23, 2012 3:24:20 PM UTC-7, Kevin Lawrence wrote:
I have an analyzer:

    "analyzer" : { 
    "lowercase_keyword" : { 
    "type" : "custom", 
    "tokenizer" : "keyword", 
    "filter" : ["lowercase", "trim"] 
    } 
    } 


    that is referenced in a mapping: 


    "location_countries" : { 
    "properties" : { 
    "country" : { 
    "type" : "string", 
    "analyzer" : "lowercase_keyword" 
    } 
    } 
    } 


    And when I use the 'country' field in a filter or a facet, 

the

    field is (correctly) treated as a keyword. 


    curl -XGET 
    'localhost:9200/clinical_trials/_search?pretty=true' -d ' 
    { 
        "query" : { 
            "term" : { "brief_title" : "dermatitis" } 
        }, 
        "filter" : { 
            "term" : { "country" : "united states" } 
        }, 
        "facets" : { 
            "tag" : { 
                "terms" : { "field" : "country" } 
            } 
        } 
    } 
    ' 



    facet results: 


    "facets" : { 
        "tag" : { 
          "_type" : "terms", 
          "missing" : 0, 
          "total" : 1, 
          "other" : 0, 
          "terms" : [ { 
            "term" : "united states", 
            "count" : 1 
          } ] 
        } 




    Everything works fine until the machine gets rebooted or the 
    Elastic Search service gets restarted. After a restart, all 

my

    filters stop working as if the analyzer does not exist. 


    The same query against the same data results in: 
    "facets" : { 
        "tag" : { 
          "_type" : "terms", 
          "missing" : 0, 
          "total" : 2, 
          "other" : 0, 
          "terms" : [ { 
            "term" : "united", 
            "count" : 1 
          }, { 
            "term" : "states", 
            "count" : 1 
          } ] 
        } 




    If I query the _settings/_mappings of my index, the analyzer 
    and mappings are still defined correctly but the analyzer 
    seems to have no effect. 

    What am I doing wrong? 

    Thanks in advance, 

    Kevin 



    git://gist.github.com/3442562.git 

--

--

--

Hi Kevin

I suspect it's a bit more complex than that. When I query the
_settings for my index after the restart, I still see my custom
analyzer but the analyzer is not used.

The problem is not that the analyzer is disappearing - it's that you
have two fields called 'country' with different mappings.

So after indexing, it is finding the keyword 'country' field first.
After restarting it is finding the analyzed 'country' field first.

If you change your query to the following, you will see that it works
correctly:

curl -XGET 'localhost:9200/clinical_trials/_search?pretty=true' -d '
{
"query" : {
"term" : { "brief_title" : "dermatitis" }
},
"filter" : {
"term" : { "location_countries.country" : "united states" }
},
"facets" : {
"tag" : {
"terms" : { "field" : "location_countries.country" }
}
}
}

clint

--

Thanks Clint!

Trying that now.

Kevin

On Saturday, August 25, 2012 2:42:20 AM UTC-7, Clinton Gormley wrote:

Hi Kevin

I suspect it's a bit more complex than that. When I query the
_settings for my index after the restart, I still see my custom
analyzer but the analyzer is not used.

The problem is not that the analyzer is disappearing - it's that you
have two fields called 'country' with different mappings.

So after indexing, it is finding the keyword 'country' field first.
After restarting it is finding the analyzed 'country' field first.

If you change your query to the following, you will see that it works
correctly:

curl -XGET 'localhost:9200/clinical_trials/_search?pretty=true' -d '
{
"query" : {
"term" : { "brief_title" : "dermatitis" }
},
"filter" : {
"term" : { "location_countries.country" : "united states" }
},
"facets" : {
"tag" : {
"terms" : { "field" : "location_countries.country" }
}
}
}

clint

--

You are my hero, Clint! That worked.

Thanks for your patience with me.

Kevin

On Monday, August 27, 2012 9:48:59 AM UTC-7, Kevin Lawrence wrote:

Thanks Clint!

Trying that now.

Kevin

On Saturday, August 25, 2012 2:42:20 AM UTC-7, Clinton Gormley wrote:

Hi Kevin

I suspect it's a bit more complex than that. When I query the
_settings for my index after the restart, I still see my custom
analyzer but the analyzer is not used.

The problem is not that the analyzer is disappearing - it's that you
have two fields called 'country' with different mappings.

So after indexing, it is finding the keyword 'country' field first.
After restarting it is finding the analyzed 'country' field first.

If you change your query to the following, you will see that it works
correctly:

curl -XGET 'localhost:9200/clinical_trials/_search?pretty=true' -d '
{
"query" : {
"term" : { "brief_title" : "dermatitis" }
},
"filter" : {
"term" : { "location_countries.country" : "united states" }
},
"facets" : {
"tag" : {
"terms" : { "field" : "location_countries.country" }
}
}
}

clint

--