Was there any progress in adding the concatenation filter [1] to
Lucene (and ES) last summer? I can't find any evidence of built-in
support for this type of filter.
Thanks,
Cole
Was there any progress in adding the concatenation filter [1] to
Lucene (and ES) last summer? I can't find any evidence of built-in
support for this type of filter.
Thanks,
Cole
Hi Cole,
no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me know
Stephane
Hi Stephane,
Thanks for the reply. I'd very much appreciate seeing your
concatentation filter code. Do you have it up somewhere you can link
to?
Thanks,
Cole
On Feb 3, 5:42 am, Stephane Bastian stephane.bastian....@gmail.com
wrote:
Hi Cole,
no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me knowStephane
Hello Cole,
Here is the code for the concatenate filter. As you can see, it's very
simple but does the job for me.
|public final class ConcatenateFilter extends TokenFilter {
private final static String DEFAULT_TOKEN_SEPARATOR = " ";
private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);
private String tokenSeparator = null;
private StringBuilder builder = new StringBuilder();
public ConcatenateFilter(Version matchVersion, TokenStream input,
String tokenSeparator) {
super(input);
this.tokenSeparator = tokenSeparator!=null ? tokenSeparator :
DEFAULT_TOKEN_SEPARATOR;
}
@Override
public boolean incrementToken() throws IOException {
boolean result = false;
builder.setLength(0);
while (input.incrementToken()) {
if (builder.length()>0) {
// append the token separator
builder.append(tokenSeparator);
}
// append the term of the current token
builder.append(termAtt.buffer(), 0, termAtt.length());
}
if (builder.length()>0) {
termAtt.setEmpty().append(builder);
result = true;
}
return result;
}
}|
As you can see above the code is pure lucene (no ES code). In order to
use the filter in ES you need to implement another class:
|
public class ConcatenateTokenFilterFactory extends
AbstractTokenFilterFactory {
private String tokenSeparator = null;
@Inject
public ConcatenateTokenFilterFactory(Index index, @IndexSettings
Settings indexSettings, @Assisted String name, @Assisted Settings
settings) {
super(index, indexSettings, name, settings);
// ||the token_separator is defined in the ES configuration file|
| tokenSeparator = settings.get("token_separator");
}
@Override
public TokenStream create(TokenStream tokenStream) {
return new *ConcatenateFilter*(Version.LUCENE_CURRENT,
tokenStream, tokenSeparator);
}
}|
and to glue things together you then need to declare the
|ConcatenateTokenFilterFactory in ES config file:|
| "index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "letter",
"filter": ["lowercase", "asciifolding", "filter-concatenate"]
}
},
"filter": {
"filter-concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": " "
}
}
}
} |
Cole, feel free to use any part of the code above. I'm glad if it helps
Al the best,
Stephane Bastian
On 02/03/2012 08:27 PM, cole wrote:
Hi Stephane,
Thanks for the reply. I'd very much appreciate seeing your
concatentation filter code. Do you have it up somewhere you can link
to?Thanks,
ColeOn Feb 3, 5:42 am, Stephane Bastianstephane.bastian....@gmail.com
wrote:Hi Cole,
no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me knowStephane
Thanks, Stephane! I appreciate you explaining how everything is glued
together. Very helpful!
Thanks,
Cole
On Feb 6, 2:25 am, Stephane Bastian stephane.bastian....@gmail.com
wrote:
Hello Cole,
Here is the code for the concatenate filter. As you can see, it's very
simple but does the job for me.|public final class ConcatenateFilter extends TokenFilter {
private final static String DEFAULT_TOKEN_SEPARATOR = " "; private final CharTermAttribute termAtt =
addAttribute(CharTermAttribute.class);
private String tokenSeparator = null;
private StringBuilder builder = new StringBuilder();public ConcatenateFilter(Version matchVersion, TokenStream input,
String tokenSeparator) {
super(input);
this.tokenSeparator = tokenSeparator!=null ? tokenSeparator :
DEFAULT_TOKEN_SEPARATOR;
}@Override public boolean incrementToken() throws IOException { boolean result = false; builder.setLength(0); while (input.incrementToken()) { if (builder.length()>0) { // append the token separator builder.append(tokenSeparator); } // append the term of the current token builder.append(termAtt.buffer(), 0, termAtt.length()); } if (builder.length()>0) { termAtt.setEmpty().append(builder); result = true; } return result; }
}|
As you can see above the code is pure lucene (no ES code). In order to
use the filter in ES you need to implement another class:
|
public class ConcatenateTokenFilterFactory extends
AbstractTokenFilterFactory {private String tokenSeparator = null; @Inject public ConcatenateTokenFilterFactory(Index index, @IndexSettings
Settings indexSettings, @Assisted String name, @Assisted Settings
settings) {
super(index, indexSettings, name, settings);
// ||the token_separator is defined in the ES configuration file|
| tokenSeparator = settings.get("token_separator");
}@Override public TokenStream create(TokenStream tokenStream) { return new *ConcatenateFilter*(Version.LUCENE_CURRENT,
tokenStream, tokenSeparator);
}}|
and to glue things together you then need to declare the
|ConcatenateTokenFilterFactory in ES config file:|| "index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "letter",
"filter": ["lowercase", "asciifolding", "filter-concatenate"]
}
},
"filter": {
"filter-concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": " "
}
}
}
} |Cole, feel free to use any part of the code above. I'm glad if it helps
Al the best,
Stephane Bastian
On 02/03/2012 08:27 PM, cole wrote:
Hi Stephane,
Thanks for the reply. I'd very much appreciate seeing your
concatentation filter code. Do you have it up somewhere you can link
to?Thanks,
ColeOn Feb 3, 5:42 am, Stephane Bastianstephane.bastian....@gmail.com
wrote:Hi Cole,
no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me knowStephane
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.