Question on Analyzer Not Working

Scott_Decker · January 17, 2012, 8:39pm

Hey All,

have a quick analyzer question.

We have an index analyzer setup like this:
index:
analysis:
analyzer:
pipeDelim:
tokenizer: pattern
pattern: |
filter: [lowercase]

And we then index a document using this analyzer, like this field:

entities: {

analyzer: pipeDelim
type: string

}

when we query, I would expect this query to work:
query: {

bool: {
    must: [
        {
            term: {
                entities: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}

}

but it does not

if I do this query, it works though:
query: {

bool: {
    must: [
        {
            query_string: {
                default_field: entities
                query: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}

}

Is something wrong in the index setup perhaps? Seems like this should
match on a term query, since when the document was written, our field
may look like
Kobe Bryant|Kobe|Bryant
and Kobe Bryant is one of the terms, if the analyzer for the pipe
delimiter worked correctly

Any help would be much appreciated.
Thanks,
Scott

Karussell1 · January 18, 2012, 8:55am

How does your indexing example looks like? Normally the tokens are
splitted and a term query will match only exact terms ...

Peter.

On 17 Jan., 21:39, Scott Decker sc...@publishthis.com wrote:

Hey All,

have a quick analyzer question.

We have an index analyzer setup like this:
index:
analysis:
analyzer:
pipeDelim:
tokenizer: pattern
pattern: |
filter: [lowercase]

And we then index a document using this analyzer, like this field:

entities: {
analyzer: pipeDelim
type: string
}

when we query, I would expect this query to work:
query: {
bool: {
    must: [
        {
            term: {
                entities: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

but it does not

if I do this query, it works though:
query: {
bool: {
    must: [
        {
            query_string: {
                default_field: entities
                query: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

Is something wrong in the index setup perhaps? Seems like this should
match on a term query, since when the document was written, our field
may look like
Kobe Bryant|Kobe|Bryant
and Kobe Bryant is one of the terms, if the analyzer for the pipe
delimiter worked correctly

Any help would be much appreciated.
Thanks,
Scott

ppearcy · January 18, 2012, 4:56pm

Hey Scott,
We use a similar, if not identical analyzer. Your issue is that the
analyzer needs to be a valid regex and | is a special regex character.
Here is what I have in my yaml file:

  pipe_tokenizer :
    type: pattern
    lowercase: true
    pattern: '\|'
    stopwords: _none_
    flags: DOTALL

Best Regards,
Paul

On Jan 18, 1:55 am, Karussell tableyourt...@googlemail.com wrote:

How does your indexing example looks like? Normally the tokens are
splitted and a term query will match only exact terms ...

Peter.

On 17 Jan., 21:39, Scott Decker sc...@publishthis.com wrote:

Hey All,

have a quick analyzer question.

We have an index analyzer setup like this:
index:
analysis:
analyzer:
pipeDelim:
tokenizer: pattern
pattern: |
filter: [lowercase]

And we then index a document using this analyzer, like this field:

entities: {
analyzer: pipeDelim
type: string
}

when we query, I would expect this query to work:
query: {
bool: {
    must: [
        {
            term: {
                entities: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

but it does not

if I do this query, it works though:
query: {
bool: {
    must: [
        {
            query_string: {
                default_field: entities
                query: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

Is something wrong in the index setup perhaps? Seems like this should
match on a term query, since when the document was written, our field
may look like
Kobe Bryant|Kobe|Bryant
and Kobe Bryant is one of the terms, if the analyzer for the pipe
delimiter worked correctly

Any help would be much appreciated.
Thanks,
Scott

ppearcy · January 18, 2012, 4:57pm

Also, I recommend running a facet query after indexing a doc, to
confirm the terms are getting generated correctly.

On Jan 18, 9:56 am, ppearcy ppea...@gmail.com wrote:

Hey Scott,
We use a similar, if not identical analyzer. Your issue is that the
analyzer needs to be a valid regex and | is a special regex character.
Here is what I have in my yaml file:
  pipe_tokenizer :
    type: pattern
    lowercase: true
    pattern: '\|'
    stopwords: _none_
    flags: DOTALL
Best Regards,
Paul

On Jan 18, 1:55 am, Karussell tableyourt...@googlemail.com wrote:

How does your indexing example looks like? Normally the tokens are
splitted and a term query will match only exact terms ...

Peter.

On 17 Jan., 21:39, Scott Decker sc...@publishthis.com wrote:

Hey All,

have a quick analyzer question.

We have an index analyzer setup like this:
index:
analysis:
analyzer:
pipeDelim:
tokenizer: pattern
pattern: |
filter: [lowercase]

And we then index a document using this analyzer, like this field:

entities: {
analyzer: pipeDelim
type: string
}

when we query, I would expect this query to work:
query: {
bool: {
    must: [
        {
            term: {
                entities: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

but it does not

if I do this query, it works though:
query: {
bool: {
    must: [
        {
            query_string: {
                default_field: entities
                query: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

Is something wrong in the index setup perhaps? Seems like this should
match on a term query, since when the document was written, our field
may look like
Kobe Bryant|Kobe|Bryant
and Kobe Bryant is one of the terms, if the analyzer for the pipe
delimiter worked correctly

Any help would be much appreciated.
Thanks,
Scott

Scott_Decker · January 18, 2012, 7:54pm

Thanks Paul.

I tried this out, but there seemed to be two errors.
One was, if I took my analyzer setup in elasticsearch.yml
and applied your format to it:

index:
analysis:
analyzer:
pipeDelim:
type: pattern
lowercase: true
pattern: '|'
stopwords: none
flags: DOTALL

Elasticsearch would not start, saying there was some error in
the .yml file.
speaking of - is there any nice place to see what the actual issue is
there? I just get some random thing saying "expecting blockquote" or
something

if I just had this setup
index:
analysis:
analyzer:
pipeDelim:
tokenizer: pattern
pattern: '|'
filter: [lowercase]

That allowed Elasticsearch to load, we reindexed the documents, but
still no love on the search for "kobe bryant" as a term query. So,
still not working correctly.

We are running version 17.7, so, maybe something changed in the index
mapping setup between that version and latest? It seems like trying to
put your format in yaml, the es startup doesn't like it.

On Jan 18, 8:57 am, ppearcy ppea...@gmail.com wrote:

Also, I recommend running a facet query after indexing a doc, to
confirm the terms are getting generated correctly.

On Jan 18, 9:56 am, ppearcy ppea...@gmail.com wrote:

Hey Scott,
We use a similar, if not identicalanalyzer. Your issue is that the
analyzerneeds to be a valid regex and | is a special regex character.
Here is what I have in my yaml file:
  pipe_tokenizer :
    type: pattern
    lowercase: true
    pattern: '\|'
    stopwords: _none_
    flags: DOTALL
Best Regards,
Paul

On Jan 18, 1:55 am, Karussell tableyourt...@googlemail.com wrote:

How does your indexing example looks like? Normally the tokens are
splitted and a term query will match only exact terms ...

Peter.

On 17 Jan., 21:39, Scott Decker sc...@publishthis.com wrote:

Hey All,

have a quickanalyzerquestion.

We have an indexanalyzersetup like this:
index:
analysis:
analyzer:
pipeDelim:
tokenizer: pattern
pattern: |
filter: [lowercase]

And we then index a document using thisanalyzer, like this field:

entities: {

analyzer: pipeDelim
type: string

}

when we query, I would expect this query to work:
query: {
bool: {
    must: [
        {
            term: {
                entities: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

but it does not

if I do this query, it works though:
query: {
bool: {
    must: [
        {
            query_string: {
                default_field: entities
                query: kobe bryant
            }
        }
        {
            match_all: { }
        }
    ]
}
}

Is something wrong in the index setup perhaps? Seems like this should
match on a term query, since when the document was written, our field
may look like
Kobe Bryant|Kobe|Bryant
and Kobe Bryant is one of the terms, if theanalyzerfor the pipe
delimiter worked correctly

Any help would be much appreciated.
Thanks,
Scott

Clinton_Gormley · January 18, 2012, 8:44pm

index:
analysis:
analyzer:
pipeDelim:
type: pattern
lowercase: true
pattern: '|'
stopwords: none
flags: DOTALL

Elasticsearch would not start, saying there was some error in
the .yml file.
speaking of - is there any nice place to see what the actual issue is
there? I just get some random thing saying "expecting blockquote" or
something

That's a YAML parser issue. I copied and pasted what you have above, and
it seems legal to me, so I assume it has lost some formatting in the mail.
Perhaps a tab or something?

Try this (changing 'foo' to the index of your choice):

curl -XPUT 'http://127.0.0.1:9200/foo/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"pipeDelim" : {
"stopwords" : "none",
"pattern" : "\|",
"flags" : "DOTALL",
"lowercase" : true,
"type" : "pattern"
}
}
}
}
}
'

clint

Scott_Decker · January 19, 2012, 4:48pm

All right, switched to JSON instead of YML and things started to work.
Still not sure why there were differences, as the yml file validated
fine. Oh well, yey for JSON commas and quotes!

Thanks for your help on the analyzer config options!

Scott

On Jan 18, 12:44 pm, Clinton Gormley cl...@traveljury.com wrote:

index:
analysis:
analyzer:
pipeDelim:
type: pattern
lowercase: true
pattern: '|'
stopwords: none
flags: DOTALL

Elasticsearch would not start, saying there was some error in
the .yml file.
speaking of - is there any nice place to see what the actual issue is
there? I just get some random thing saying "expecting blockquote" or
something

That's a YAML parser issue. I copied and pasted what you have above, and
it seems legal to me, so I assume it has lost some formatting in the mail.
Perhaps a tab or something?

Try this (changing 'foo' to the index of your choice):

curl -XPUT 'http://127.0.0.1:9200/foo/?pretty=1' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"pipeDelim" : {
"stopwords" : "none",
"pattern" : "\|",
"flags" : "DOTALL",
"lowercase" : true,
"type" : "pattern"
}
}
}
}}

'

clint

Topic		Replies	Views
Issue with pattern analyzer Elasticsearch	11	428	July 6, 2017
Custom 'pattern' analyzer for the field doesn't work Elasticsearch	4	502	February 6, 2018
Query "bool must match" not returning any value whereas "bool must match_all" returns Elasticsearch elastic-agent	5	388	May 7, 2022
Correctly analyzed field not found by query_string search Elasticsearch	2	460	July 6, 2017
Search analyzer not work Elasticsearch	7	1445	August 2, 2019

Question on Analyzer Not Working

Related topics