Analyzer for web page name

Is there a way to apply the following code to a web page field that i have?
I have it in many types, so iseally, if i can add analyer, and reindex the data, that will be great..
Here is a node.js code( language is not important) , that includes the rules i need:

exports.cleanPage = function (page) {

if (!page || page.length ==0)
    return page;

if (page == "/") {
    return page;

if (!page.startsWith("/")) { // a page must be started with "/"
    page = "/" + page;

if (page.endsWith('/')) { // we want to prevent duplication such as en,/en/,en/
    page = page.slice(0, -1);

if (page.indexOf("?") > -1) {
    page = page.split("?")[0]; //remove query string from page name

return page.toLowerCase();


I'm using 1.7


there is no direct component to remove query parameters and normalize parts of URLs. There is the url uax email tokenizer, that leaves URLs as a single piece. You could however use the pattern replace tokenfilter to use regular expression to trim down your path.

I'd still recommend doing that before indexing data - this might make things much more explanatory than complex regular expressions.

With 5.0 you could take a look at the new ingest node feature assist you with this and change the field to your needs before indexing.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.