Is there a concatenation filter?

Was there any progress in adding the concatenation filter [1] to
Lucene (and ES) last summer? I can't find any evidence of built-in
support for this type of filter.

Thanks,
Cole

[1] http://elasticsearch-users.115913.n3.nabble.com/Code-contribution-Concatenate-filter-td3137058.html#a3707818

Hi Cole,

no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me know

Stephane

Hi Stephane,

Thanks for the reply. I'd very much appreciate seeing your
concatentation filter code. Do you have it up somewhere you can link
to?

Thanks,
Cole

On Feb 3, 5:42 am, Stephane Bastian stephane.bastian....@gmail.com
wrote:

Hi Cole,

no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me know

Stephane

Hello Cole,

Here is the code for the concatenate filter. As you can see, it's very
simple but does the job for me.

|public final class ConcatenateFilter extends TokenFilter {

 private final static String DEFAULT_TOKEN_SEPARATOR = " ";

 private final CharTermAttribute termAtt = 

addAttribute(CharTermAttribute.class);
private String tokenSeparator = null;
private StringBuilder builder = new StringBuilder();

 public ConcatenateFilter(Version matchVersion, TokenStream input, 

String tokenSeparator) {
super(input);
this.tokenSeparator = tokenSeparator!=null ? tokenSeparator :
DEFAULT_TOKEN_SEPARATOR;
}

 @Override
 public boolean incrementToken() throws IOException {
     boolean result = false;
     builder.setLength(0);
     while (input.incrementToken()) {
         if (builder.length()>0) {
             // append the token separator
             builder.append(tokenSeparator);
         }
         // append the term of the current token
         builder.append(termAtt.buffer(), 0, termAtt.length());
     }
     if (builder.length()>0) {
         termAtt.setEmpty().append(builder);
         result = true;
     }
     return result;
 }

}|

As you can see above the code is pure lucene (no ES code). In order to
use the filter in ES you need to implement another class:
|
public class ConcatenateTokenFilterFactory extends
AbstractTokenFilterFactory {

 private String tokenSeparator = null;

 @Inject
 public ConcatenateTokenFilterFactory(Index index, @IndexSettings 

Settings indexSettings, @Assisted String name, @Assisted Settings
settings) {
super(index, indexSettings, name, settings);
// ||the token_separator is defined in the ES configuration file|
| tokenSeparator = settings.get("token_separator");
}

 @Override
 public TokenStream create(TokenStream tokenStream) {
     return new *ConcatenateFilter*(Version.LUCENE_CURRENT, 

tokenStream, tokenSeparator);
}
}|

and to glue things together you then need to declare the
|ConcatenateTokenFilterFactory in ES config file:|

| "index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "letter",
"filter": ["lowercase", "asciifolding", "filter-concatenate"]
}
},
"filter": {
"filter-concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": " "
}
}
}
} |

Cole, feel free to use any part of the code above. I'm glad if it helps

Al the best,

Stephane Bastian

On 02/03/2012 08:27 PM, cole wrote:

Hi Stephane,

Thanks for the reply. I'd very much appreciate seeing your
concatentation filter code. Do you have it up somewhere you can link
to?

Thanks,
Cole

On Feb 3, 5:42 am, Stephane Bastianstephane.bastian....@gmail.com
wrote:

Hi Cole,

no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me know

Stephane

Thanks, Stephane! I appreciate you explaining how everything is glued
together. Very helpful!

Thanks,
Cole

On Feb 6, 2:25 am, Stephane Bastian stephane.bastian....@gmail.com
wrote:

Hello Cole,

Here is the code for the concatenate filter. As you can see, it's very
simple but does the job for me.

|public final class ConcatenateFilter extends TokenFilter {

 private final static String DEFAULT_TOKEN_SEPARATOR = " ";

 private final CharTermAttribute termAtt =

addAttribute(CharTermAttribute.class);
private String tokenSeparator = null;
private StringBuilder builder = new StringBuilder();

 public ConcatenateFilter(Version matchVersion, TokenStream input,

String tokenSeparator) {
super(input);
this.tokenSeparator = tokenSeparator!=null ? tokenSeparator :
DEFAULT_TOKEN_SEPARATOR;
}

 @Override
 public boolean incrementToken() throws IOException {
     boolean result = false;
     builder.setLength(0);
     while (input.incrementToken()) {
         if (builder.length()>0) {
             // append the token separator
             builder.append(tokenSeparator);
         }
         // append the term of the current token
         builder.append(termAtt.buffer(), 0, termAtt.length());
     }
     if (builder.length()>0) {
         termAtt.setEmpty().append(builder);
         result = true;
     }
     return result;
 }

}|

As you can see above the code is pure lucene (no ES code). In order to
use the filter in ES you need to implement another class:
|
public class ConcatenateTokenFilterFactory extends
AbstractTokenFilterFactory {

 private String tokenSeparator = null;

 @Inject
 public ConcatenateTokenFilterFactory(Index index, @IndexSettings

Settings indexSettings, @Assisted String name, @Assisted Settings
settings) {
super(index, indexSettings, name, settings);
// ||the token_separator is defined in the ES configuration file|
| tokenSeparator = settings.get("token_separator");
}

 @Override
 public TokenStream create(TokenStream tokenStream) {
     return new *ConcatenateFilter*(Version.LUCENE_CURRENT,

tokenStream, tokenSeparator);
}

}|

and to glue things together you then need to declare the
|ConcatenateTokenFilterFactory in ES config file:|

| "index": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"tokenizer": "letter",
"filter": ["lowercase", "asciifolding", "filter-concatenate"]
}
},
"filter": {
"filter-concatenate": {
"type":
"com.monpetitguide.elasticsearch.analysis.ConcatenateTokenFilterFactory",
"token_separator": " "
}
}
}
} |

Cole, feel free to use any part of the code above. I'm glad if it helps

Al the best,

Stephane Bastian

On 02/03/2012 08:27 PM, cole wrote:

Hi Stephane,

Thanks for the reply. I'd very much appreciate seeing your
concatentation filter code. Do you have it up somewhere you can link
to?

Thanks,
Cole

On Feb 3, 5:42 am, Stephane Bastianstephane.bastian....@gmail.com
wrote:

Hi Cole,

no, I got so busy right after my email last summer that I didn't follow up
and dropped the ball.
However if you are interested I can send you the code for the filter. just
let me know

Stephane