Length Token Filter

windoz · July 2, 2012, 12:38pm

I'm new to Elasticsearch and want to know how I can use the length token
filter. I'm trying to limit my search to exclude two letter words.

drewr · July 2, 2012, 2:15pm

windoz wrote:

I'm new to Elasticsearch and want to know how I can use the length
token filter. I'm trying to limit my search to exclude two letter
words.

Can you give us an example of what you've tried based on the
documentation?

Note that there's a length token filter at the bottom of the sample
config:

myTokenFilter2 :
   type : length
   min : 0
   max : 2000

-Drew

windoz · July 3, 2012, 8:43am

I've been reading around and found that I can use a custom analyzer with
custom stop. But It is not working. I posted the analyzer (custom.txt
)below using curl - X POST --data "@custom.txt"
http//localhost:9200/sample/test/1 on Windows OS. Is the correct way of
using the analyzers.

Here is custom.txt
{
"analysis": {
"analyzer": {
"symphony_fulltext" : {
"type": "custom",
"tokenizer" : "standard",
"filter": ["stop", "asciifolding", "snowball", "lowercase",
"custom_synonyms", "custom_stop"]
},
"symphony_autocomplete" : {
"type": "custom",
"tokenizer" : "standard",
"filter": ["asciifolding", "lowercase"]
}
},
"filter" : {
"custom_synonyms": {
"type": "synonym",
"ignore_case": "true",
"synonyms": [
"i-pod, i pod => ipod",
"definately, definitly, definetly => definitely"
]
},
"custom_stop": {
"type": "stop",
"stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by",
"into", "is", "it", "of", "on", "or", "such", "that", "the", "their",
"there", "these", "they", "this", "to", "was", "will"]
}
}
}
}
On Monday, July 2, 2012 2:38:23 PM UTC+2, windoz wrote:

I'm new to Elasticsearch and want to know how I can use the length token
filter. I'm trying to limit my search to exclude two letter words.

drewr · July 3, 2012, 2:20pm

windoz wrote:

I've been reading around and found that I can use a custom analyzer with
custom stop. But It is not working. I posted the analyzer (custom.txt
)below using curl - X POST --data "@custom.txt"
http//localhost:9200/sample/test/1 on Windows OS. Is the correct way of
using the analyzers.

[...]

By sending that data to /sample/test/1, you're just indexing it as a
regular doc in ES. You need to store it as part of your index
settings. Try something like this:

curl -s -XPUT localhost:9200/test
-d @<(curl -s http://p.draines.com/13413250381814c87452d.txt)

Then you can check the settings with:

curl -s localhost:9200/test/_settings?pretty=1

-Drew

windoz · July 4, 2012, 10:37am

I tried what you said, but unfortunately when I do my search for the top
ten most used words in the documents I still get the stop words being
indexed. What could be the problem?

On Tuesday, July 3, 2012 4:20:55 PM UTC+2, Drew Raines wrote:

windoz wrote:

I've been reading around and found that I can use a custom analyzer with
custom stop. But It is not working. I posted the analyzer (custom.txt
)below using curl - X POST --data "@custom.txt"
http//localhost:9200/sample/test/1 on Windows OS. Is the correct way
of
using the analyzers.

[...]

By sending that data to /sample/test/1, you're just indexing it as a
regular doc in ES. You need to store it as part of your index
settings. Try something like this:

curl -s -XPUT localhost:9200/test \
-d @<(curl -s http://p.draines.com/13413250381814c87452d.txt)

Then you can check the settings with:

curl -s localhost:9200/test/_settings?pretty=1

-Drew

Ivan · July 5, 2012, 9:46pm

Are you correctly apply your analyzer as the mapping of your field?
Can you gist your mapping as well?

--
Ivan

On Wed, Jul 4, 2012 at 3:37 AM, windoz victor.21.marisa@gmail.com wrote:

I tried what you said, but unfortunately when I do my search for the top ten
most used words in the documents I still get the stop words being indexed.
What could be the problem?

On Tuesday, July 3, 2012 4:20:55 PM UTC+2, Drew Raines wrote:

windoz wrote:

I've been reading around and found that I can use a custom analyzer with
custom stop. But It is not working. I posted the analyzer (custom.txt
)below using curl - X POST --data "@custom.txt"
http//localhost:9200/sample/test/1 on Windows OS. Is the correct way
of
using the analyzers.

[...]

By sending that data to /sample/test/1, you're just indexing it as a
regular doc in ES. You need to store it as part of your index
settings. Try something like this:

curl -s -XPUT localhost:9200/test
-d @<(curl -s http://p.draines.com/13413250381814c87452d.txt)

Then you can check the settings with:

curl -s localhost:9200/test/_settings?pretty=1

-Drew

windoz · July 6, 2012, 8:32am

I'm now trying a new way, shown below, If i use a query to search the top
ten words in the message field of the docs index, I still get the words
[is, the, this,.... ] that i have included in the stop words list in my
custom filter. The search_analyzer deals with the searching part and
index_analyzer with the indexing part.

Here is the mapping, analyzers and filters.
{
"mappings" : {
"message" : {
"properties" : {
"title" : {
"type" : "string",
"search_analyzer" : "str_search_analyzer",
"index_analyzer" : "str_index_analyzer"

    }
  }
}

},

"settings" : {
"analysis" : {
"analyzer" : {
"str_search_analyzer" : {
"tokenizer" : "keyword",
"filter" : ["lowercase","custom_stop"]
},

    "str_index_analyzer" : {
      "tokenizer" : "keyword",
      "filter" : ["lowercase", ]
    }
  },

  "filter" :

"custom_stop": {
"type": "stop",
"stopwords": ["a", "an", "and", "are", "as", "at", "be", "but", "by",
"into", "is", "it", "of", "on", "or", "such", "that", "the", "their",
"there", "these", "they", "this", "to", "was", "will","we"]
}
}
}
}
}

On Thursday, July 5, 2012 11:46:09 PM UTC+2, Ivan Brusic wrote:

Are you correctly apply your analyzer as the mapping of your field?
Can you gist your mapping as well?

--
Ivan

On Wed, Jul 4, 2012 at 3:37 AM, windoz wrote:

I tried what you said, but unfortunately when I do my search for the top
ten
most used words in the documents I still get the stop words being
indexed.
What could be the problem?

On Tuesday, July 3, 2012 4:20:55 PM UTC+2, Drew Raines wrote:

windoz wrote:

I've been reading around and found that I can use a custom analyzer
with
custom stop. But It is not working. I posted the analyzer (custom.txt
)below using curl - X POST --data "@custom.txt"
http//localhost:9200/sample/test/1 on Windows OS. Is the correct
way
of
using the analyzers.

[...]

By sending that data to /sample/test/1, you're just indexing it as a
regular doc in ES. You need to store it as part of your index
settings. Try something like this:

curl -s -XPUT localhost:9200/test \
-d @<(curl -s http://p.draines.com/13413250381814c87452d.txt)

Then you can check the settings with:

curl -s localhost:9200/test/_settings?pretty=1

-Drew

drewr · July 6, 2012, 7:26pm

windoz wrote:

I'm now trying a new way, shown below, If i use a query to search the top
ten words in the message field of the docs index, I still get the words
[is, the, this,.... ] that i have included in the stop words list in my
custom filter.

Can you provide a script that reproduces what you're seeing and what
you would like it to do instead? Something that sets up your index,
indexes something, queries, and then tell us how it differs from what
you expected.

-Drew

Igor_Motov · July 8, 2012, 4:19pm

Hi windoz,

There is a couple of syntax errors in your example. A curly braket is
missing here:

"filter" :
"custom_stop": {

And a filter is missing in str_index_analyzer definition:

   "str_index_analyzer" : {
      "tokenizer" : "keyword",

```
     "filter" : ["lowercase", ]*
  }
```

I also don't think that "keyword" tokenizer is what you want in your case.
It emits content of the entire field as a single token, which doesn't allow
stop word filter to do its job unless your fields consist of single words.
I think, it might be better to use standard tokenizer instead. With these
changes, this is how your example might look
like: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/C9lp8oHrg7k · GitHub

Igor

On Friday, July 6, 2012 3:26:50 PM UTC-4, Drew Raines wrote:

windoz wrote:

I'm now trying a new way, shown below, If i use a query to search the
top
ten words in the message field of the docs index, I still get the words
[is, the, this,.... ] that i have included in the stop words list in my
custom filter.

Can you provide a script that reproduces what you're seeing and what
you would like it to do instead? Something that sets up your index,
indexes something, queries, and then tell us how it differs from what
you expected.

Elasticsearch Platform — Find real-time answers at scale | Elastic

-Drew

windoz · July 9, 2012, 8:25am

Thanks Motov !

Your code seems to be working fine so far.

On Sunday, July 8, 2012 6:19:36 PM UTC+2, Igor Motov wrote:

Hi windoz,

There is a couple of syntax errors in your example. A curly braket is
missing here:

"filter" :
"custom_stop": {

And a filter is missing in str_index_analyzer definition:
   "str_index_analyzer" : {
      "tokenizer" : "keyword",
     "filter" : ["lowercase", ]*
  }
I also don't think that "keyword" tokenizer is what you want in your case.
It emits content of the entire field as a single token, which doesn't allow
stop word filter to do its job unless your fields consist of single words.
I think, it might be better to use standard tokenizer instead. With these
changes, this is how your example might look like:
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/C9lp8oHrg7k · GitHub

Igor

On Friday, July 6, 2012 3:26:50 PM UTC-4, Drew Raines wrote:

windoz wrote:

I'm now trying a new way, shown below, If i use a query to search the
top
ten words in the message field of the docs index, I still get the words
[is, the, this,.... ] that i have included in the stop words list in
my
custom filter.

Can you provide a script that reproduces what you're seeing and what
you would like it to do instead? Something that sets up your index,
indexes something, queries, and then tell us how it differs from what
you expected.

Elasticsearch Platform — Find real-time answers at scale | Elastic

-Drew

Topic		Replies	Views
Cannot get the Length Token Filter to work in a custom anaylzer Elasticsearch	6	416	January 16, 2020
Can't use length filter in a custom normalizer. any way around? Elasticsearch	1	442	February 26, 2018
Custom stop words in default_search Elasticsearch	2	912	July 6, 2017
My stopwords filter is not working Elasticsearch	5	1936	July 6, 2017
Synonym analyzer issue in elastic search Elasticsearch	1	774	September 9, 2019

Length Token Filter

Related topics