Hash tag field analyzer not being applied


(André Morais) #1

Hello!

I have configured an analyzer in my YML to exclude all words that do 

not start with either # or @ to process hash tags and at tags. This
analyzer works fine if using the analyzer API but when I index data it is
not being applied.

I thought that when indexing, analyzers would replace the original 

contents with the analysis result. Is that not so?

Thank you for your help!

      André

Here is my YML configuration:

index.analysis.analyzer.tags:
type: custom
tokenizer: whitespace
filter: fntags, fnsize
index.analysis.filter.fntags:
type : pattern_replace
pattern: "^[^#@]+.*$"
replacement: ""
index.analysis.filter.fnsize:
type : length
min : 2
max : 200

Here is my type mapping for the field:

Hash_Tags: {
analyzer: tags
type: string
}

Here is the result when using the analyzer API:

curl -XGET 'localhost:9200/catalog/_analyze?field=Hash_Tags&pretty=true' -d
'NO a la violencia! Comparta esto en la medida de lo posible si espera
verdaderamente un mejor mundo para Navidad... y despus! #lifeworthbetter'
{
"tokens" : [ {
"token" : "#lifeworthbetter",
"start_offset" : 126,
"end_offset" : 142,
"type" : "word",
"position" : 23
} ]
}

And here the result for a match_all query:

Hash_Tags: " NO a la violencia! Comparta esto en la medida de lo posible si
espera verdaderamente un mejor mundo para Navidad... y después!
#lifeworthbetter"

I was expecting:
Hash_Tags: "#lifeworthbetter"

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b719f38f-c8a0-439f-9cdc-0bf3f709429d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(André Morais) #2

Figured it out... Analyzer is correct but, contrary to what I thought, all
the data I put in the index is stored there and the analyzer is only
applied when indexing or querying it. So, retrieving my hashtag field data
should return all the data I put there and not merely the tokens resulting
from data analysis with my analyzer.

   Thanks anyway!

Terça-feira, 24 de Junho de 2014 18:50:53 UTC+1, André Morais escreveu:

Hello!

I have configured an analyzer in my YML to exclude all words that do 

not start with either # or @ to process hash tags and at tags. This
analyzer works fine if using the analyzer API but when I index data it is
not being applied.

I thought that when indexing, analyzers would replace the original 

contents with the analysis result. Is that not so?

Thank you for your help!

      André

Here is my YML configuration:

index.analysis.analyzer.tags:
type: custom
tokenizer: whitespace
filter: fntags, fnsize
index.analysis.filter.fntags:
type : pattern_replace
pattern: "^[^#@]+.*$"
replacement: ""
index.analysis.filter.fnsize:
type : length
min : 2
max : 200

Here is my type mapping for the field:

Hash_Tags: {
analyzer: tags
type: string
}

Here is the result when using the analyzer API:

curl -XGET 'localhost:9200/catalog/_analyze?field=Hash_Tags&pretty=true'
-d 'NO a la violencia! Comparta esto en la medida de lo posible si espera
verdaderamente un mejor mundo para Navidad... y despus! #lifeworthbetter'
{
"tokens" : [ {
"token" : "#lifeworthbetter",
"start_offset" : 126,
"end_offset" : 142,
"type" : "word",
"position" : 23
} ]
}

And here the result for a match_all query:

Hash_Tags: " NO a la violencia! Comparta esto en la medida de lo posible
si espera verdaderamente un mejor mundo para Navidad... y después!
#lifeworthbetter"

I was expecting:
Hash_Tags: "#lifeworthbetter"

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e3330517-750e-4895-bc8a-6cef8249c89d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3