Analyzer randomly applied

hey all

when I create an index, I register an analyzer to use with a 'tags' field named 'csv', below.

settingsBuilder.put( "index.analysis.analyzer.csv.type", "pattern" );
settingsBuilder.put( "index.analysis.analyzer.csv.pattern", "," );

thus, stuffing "a,b,c" into a 'tags' field and making a facet query returns "a","b","c".

which is exactly what I want.

Except, if the values are "a-b,a-b,a-c", the values are tokenized against both "," and "-"., return on a facet query gives "a", "b", "c". not "a-b", etc..

But not always!

If i run a test to stuff a single document and then run a facet query, sometimes the "-" isn't tokenized on, and sometimes it is. I would say 30% of the time the "-" gets parsed out.

I've tried the following as well, and get the same random results

settingsBuilder.put( "index.analysis.analyzer.csv.type", "custom" );
settingsBuilder.put( "index.analysis.analyzer.csv.tokenizer", "csvPattern" );
settingsBuilder.put( "index.analysis.analyzer.csv.filter", "lowercase" );

settingsBuilder.put( "index.analysis.tokenizer.csvPattern.type", "pattern" );
settingsBuilder.put( "index.analysis.tokenizer.csvPattern.pattern", "," );

FWIW, my mapping of 'tags' to 'csv' does work, just not consistently across invocations of the test. I'm using a dynamic template, defined here

{
template_tags: {
mapping: {
store: yes
analyzer: csv
type: string
}
match: tags
}
}

thoughts?

--
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Chris,

Not sure why this happens. Maybe your mapping isn't applied on all indices?
What you're doing should work, the field value for the field tags should be
tokenised by ,. I created the following gist:

Can you try if your issue still occurs if you perform the indexing /
searching the same way I do in this gist?
(I used ES version 0.20.4)

Martijn

On 10 February 2013 04:10, Chris K Wensel chris@wensel.net wrote:

hey all

when I create an index, I register an analyzer to use with a 'tags' field
named 'csv', below.

settingsBuilder.put( "index.analysis.analyzer.csv.type", "pattern" );
settingsBuilder.put( "index.analysis.analyzer.csv.pattern", "," );

thus, stuffing "a,b,c" into a 'tags' field and making a facet query
returns "a","b","c".

which is exactly what I want.

Except, if the values are "a-b,a-b,a-c", the values are tokenized against
both "," and "-"., return on a facet query gives "a", "b", "c". not "a-b",
etc..

But not always!

If i run a test to stuff a single document and then run a facet query,
sometimes the "-" isn't tokenized on, and sometimes it is. I would say 30%
of the time the "-" gets parsed out.

I've tried the following as well, and get the same random results

settingsBuilder.put( "index.analysis.analyzer.csv.type", "custom" );
settingsBuilder.put( "index.analysis.analyzer.csv.tokenizer",

"csvPattern" );
settingsBuilder.put( "index.analysis.analyzer.csv.filter", "lowercase"
);

settingsBuilder.put( "index.analysis.tokenizer.csvPattern.type",

"pattern" );
settingsBuilder.put( "index.analysis.tokenizer.csvPattern.pattern",
"," );

FWIW, my mapping of 'tags' to 'csv' does work, just not consistently
across invocations of the test. I'm using a dynamic template, defined here

{
template_tags: {
mapping: {
store: yes
analyzer: csv
type: string
}
match: tags
}
}

thoughts?

--
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally.

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
chris@concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ok, this does repro the problem

note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
  	"tags" : "a-b,a-b,a-c"
  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

Less frequently it does not

	{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
	  	"tags" : "a-b,a-b,a-c"
	  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

		{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
		  	"tags" : "a-b,a-b,a-c"
		  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel chris@wensel.net wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally.

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
chris@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
chris@concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4
for me. What I think can happen here is that the template is not applied
when the tags index is created, this would explain what you see. Apparently
the "wrong" analyzer is consistently applied to all the documents. Can you
try to get this to fail again and if it fails pull the mapping from the ES
instance you run this against? -> curl -XGET
'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied
to it. Maybe there is a race in the template creation code. I try to come
up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem

tag parser · GitHub

note, i had it fail if using doc id 1, and also using a $RANDOM doc id
(the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"

}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" :
{
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" :
{
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" :
{
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" :
{
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel <ch...@wensel.net<javascript:>>
wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are
reciprocating parents, the third is just a child (not nested of course).
though all three have nested documents. the tags are not in those nested
elements.

turns out this has been a persistent problem for the last 9-12 months of
ES releases, we just stopped using a "-" in our tests to stop the random
failures.

I think I just need to go deep and see what's happening internally.

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue
is that the analyzer is randomly not applied properly on a PUT. that is,
some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
ch...@concurrentinc.com <javascript:>
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
ch...@concurrentinc.com <javascript:>
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

from the data that produces the last case, it is indeed missing
curl -XGET 'http://localhost:9200/tag/_mapping'
{"error":"IndexMissingException[[tag] missing]","status":404}

that said, per my original email, it is not missing, when I see the test failures, i've double checked the mappings existence, further, not all documents (tags) are mis-parsed.

i'll try and dig deeper into the es code at some point.

ckw

On Feb 16, 2013, at 7:00 AM, simonw simon.willnauer@elasticsearch.com wrote:

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4 for me. What I think can happen here is that the template is not applied when the tags index is created, this would explain what you see. Apparently the "wrong" analyzer is consistently applied to all the documents. Can you try to get this to fail again and if it fails pull the mapping from the ES instance you run this against? -> curl -XGET 'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied to it. Maybe there is a race in the template creation code. I try to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem

tag parser · GitHub

note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

Less frequently it does not

  {"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
    	"tags" : "a-b,a-b,a-c"
    }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

  	{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel ch...@wensel.net wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally.

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
chris@concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

hmm I think my like was broken it should be 'tags' not 'tag' given the
gist, right?

simon

On Sunday, February 17, 2013 2:23:49 AM UTC+1, Chris K Wensel wrote:

from the data that produces the last case, it is indeed missing
curl -XGET 'http://localhost:9200/tag/_mapping'
{"error":"IndexMissingException[[tag] missing]","status":404}

that said, per my original email, it is not missing, when I see the test
failures, i've double checked the mappings existence, further, not all
documents (tags) are mis-parsed.

i'll try and dig deeper into the es code at some point.

ckw

On Feb 16, 2013, at 7:00 AM, simonw <simon.w...@elasticsearch.com<javascript:>>
wrote:

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4
for me. What I think can happen here is that the template is not applied
when the tags index is created, this would explain what you see. Apparently
the "wrong" analyzer is consistently applied to all the documents. Can you
try to get this to fail again and if it fails pull the mapping from the ES
instance you run this against? -> curl -XGET '
http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not
applied to it. Maybe there is a race in the template creation code. I try
to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem

tag parser · GitHub

note, i had it fail if using doc id 1, and also using a $RANDOM doc id
(the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"

}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

Less frequently it does not

{"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0,
"_source" : {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source"
: {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel ch...@wensel.net wrote:

Not sure why this happens. Maybe your mapping isn't applied on all
indices?

we only have one index. three document types, two of which are
reciprocating parents, the third is just a child (not nested of course).
though all three have nested documents. the tags are not in those nested
elements.

turns out this has been a persistent problem for the last 9-12 months of
ES releases, we just stopped using a "-" in our tests to stop the random
failures.

I think I just need to go deep and see what's happening internally.

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue
is that the analyzer is randomly not applied properly on a PUT. that is,
some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
ch...@concurrentinc.com <javascript:>
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

oops, crap, and I wiped the data for that.

i added set -e to make sure the server was fully up and i'm not reproducing the problem now via the bash

i'll see if I can get a replay of the calls during our tests and try to reproduce independently of the test harness.

ckw

On Feb 17, 2013, at 10:35 AM, simonw simon.willnauer@elasticsearch.com wrote:

hmm I think my like was broken it should be 'tags' not 'tag' given the gist, right?

simon

On Sunday, February 17, 2013 2:23:49 AM UTC+1, Chris K Wensel wrote:

from the data that produces the last case, it is indeed missing
curl -XGET 'http://localhost:9200/tag/_mapping'
{"error":"IndexMissingException[[tag] missing]","status":404}

that said, per my original email, it is not missing, when I see the test failures, i've double checked the mappings existence, further, not all documents (tags) are mis-parsed.

i'll try and dig deeper into the es code at some point.

ckw

On Feb 16, 2013, at 7:00 AM, simonw simon.w...@elasticsearch.com wrote:

Hey Chris,

this gist doesn't reproduce for me on the current master neither on 0.20.4 for me. What I think can happen here is that the template is not applied when the tags index is created, this would explain what you see. Apparently the "wrong" analyzer is consistently applied to all the documents. Can you try to get this to fail again and if it fails pull the mapping from the ES instance you run this against? -> curl -XGET 'http://localhost:9200/tag/_mapping'
I'd be interested if the index gets created and the template is not applied to it. Maybe there is a race in the template creation code. I try to come up with a testcase for this and stress it a little next week.

simon
On Friday, February 15, 2013 8:14:36 PM UTC+1, Chris K Wensel wrote:

ok, this does repro the problem

tag parser · GitHub

note, i had it fail if using doc id 1, and also using a $RANDOM doc id (the last failure below is this)

Sometimes it works
{"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

query:
{"took":40,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
"tags" : "a-b,a-b,a-c"
}}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"a-c","count":1},{"term":"a-b","count":1}]}}}

Less frequently it does not

  {"took":42,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"1","_score":1.0, "_source" : {
    	"tags" : "a-b,a-b,a-c"
    }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":2,"other":0,"terms":[{"term":"c","count":1},{"term":"b","count":1}]}}}

  	{"took":57,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":96,"max_score":1.0,"hits":[{"_index":"tags","_type":"tag","_id":"19100","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"26775","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"6971","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"2070","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"17185","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"26016","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"1971","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"2657","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"19504","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }},{"_index":"tags","_type":"tag","_id":"19179","_score":1.0, "_source" : {
  	  	"tags" : "a-b,a-b,a-c"
  	  }}]},"facets":{"tags":{"_type":"terms","missing":0,"total":192,"other":0,"terms":[{"term":"c","count":96},{"term":"b","count":96}]}}}

On Feb 15, 2013, at 10:56 AM, Chris K Wensel ch...@wensel.net wrote:

Not sure why this happens. Maybe your mapping isn't applied on all indices?

we only have one index. three document types, two of which are reciprocating parents, the third is just a child (not nested of course). though all three have nested documents. the tags are not in those nested elements.

turns out this has been a persistent problem for the last 9-12 months of ES releases, we just stopped using a "-" in our tests to stop the random failures.

I think I just need to go deep and see what's happening internally.

I suspect your gist will work without issues with such a simple document.

though it may show up if i wrap line 22 with a for loop, since the issue is that the analyzer is randomly not applied properly on a PUT. that is, some puts are properly parsed, some smaller % are not.

ckw

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Chris K Wensel
chris@concurrentinc.com

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.