Replace <script> tag on indexation

Hello,

i want to replace <script>....</script> on index time.
I tried :

POST _analyze
 {
   "tokenizer" : "icu_tokenizer",
 
       "filter": [
                 {
                    "type": "pattern_replace",
                    "pattern": "(<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>)",
                    "replacement": "",
                    "flags":"CASE_INSENSITIVE|MULTILINE"
                 }],
       "text":"<ul class=wrap><li><a class=toggleTabContent href=#rendement title=Rendement>Rendement</a></li><li><a class=toggleTabContent href=#frais title=Frais>Frais</a></li><li><a class=toggleTabContent href=#comment-investir title=Comment investir ?>Comment investir ?</a></li><li><a class=toggleTabContent href=#supports title=Supports>Supports</a></li><li><a class=toggleTabContent href=#souscrire-documentation title=Souscrire / documentation>Souscrire / documentation</a></li></ul></div><script>function lazyLoadIframes(){jQuery('[data-loaded=false]').each(function() {var llazy = jQuery(this);if (llazy[0].getBoundingClientRect().top != 0 && llazy[0].getBoundingClientRect().top - window.innerHeight < 0) {jQuery(this).attr(data-loaded, true);jQuery(this).attr(src, jQuery(this).attr(data-src));}});}window.addEventListener(scroll, lazyLoadIframes);$('body').on('click', '.s-accordion-title', function(){setTimeout(lazyLoadIframes, 300);});</script><style>iframe[data-loaded]{}</style><script type=\"text/javascript\">if(typeof(wpDataCharts)=='undefined'){wpDataCharts = {};}; wpDataCharts[1] = {render_data: {\"columns\":[{\"type\":\"string\",\"label\":\"Support\",\"orig_header\":\"support\"},{\"type\":\"number\",\"label\":\"Pourcentage\",\"orig_header\":\"pourcentage\"}],\"rows\":[[\"Fonds en euros (Actif g\u00e9n\u00e9ral)\",70],[\"Swisslife Funds Defensive - FR0010308825\",7.5],[\"M&G (Lux) Optimal Income Fund - LU1670724373\",7.5],[\"Oddo Avenir CR-EUR - FR0000989899\",7.5],[\"Comgest Monde C FR0000284689\",7.5]],\"axes\":{\"major\":{\"type\":\"string\",\"label\":\"Support\"},\"minor\":{\"type\":\"number\",\"label\":\"\"}},\"options\":{\"title\":\"\",\"series\":[],\"height\":\"400\",\"responsive_width\":1,\"hAxis\":{\"title\":\"\",\"direction\":\"1\"},\"vAxis\":{\"title\":\"\",\"direction\":\"1\",\"viewWindow\":{\"min\":\"\",\"max\":\"\"}},\"backgroundColor\":{\"fill\":\"\",\"strokeWidth\":\"0\",\"stroke\":\"\",\"rx\":\"0\"},\"chartArea\":{\"backgroundColor\":{\"fill\":\"\",\"strokeWidth\":\"\",\"stroke\":\"\"}},\"fontSize\":\"\",\"fontName\":\"Arial\",\"crosshair\":{\"trigger\":\"\",\"orientation\":\"\"},\"orientation\":\"horizontal\",\"titlePosition\":\"out\",\"tooltip\":{\"trigger\":\"focus\"},\"legend\":{\"position\":\"right\",\"alignment\":\"center\"}},\"vAxis\":[],\"hAxis\":[],\"errors\":[],\"series\":[{\"label\":\"Pourcentage\",\"color\":\"\",\"orig_header\":\"pourcentage\"}],\"group_chart\":false,\"show_grid\":true,\"type\":\"google_donut_chart\"}, engine: \"google\", type: \"google_donut_chart\", title: \"Allocations stars - Profil défensif - Répartition par support\", container: \"wpDataChart_1\", follow_filtering: 0, wpdatatable_id: 1, group_chart: 0}</script>"
     }

But <script> still remain. How can i do it ?
Thanks

the original json of a document is not modified using a filter. what is modified is what is stored in the inverted index, against which searches are executed.

You could use a ingest processor to do your modification before the JSON is stored, i.e. a script processor.

Thanks for your help.
But, what i don't understand, if i use filter to replace a word, it works. Why my all <script>...</script> tag doesn't replace by my regex ?

du'h. I fully misread your question. So here is what happens. Try running your sample without the filter part and you will see three terms being emitted. So the tokenizer first splits your tokens, and then the token filter regex gets applied. However due to having three tokens at this stage already, the regular expression will never match.

Take a look at the pattern replace char filter instead.

I tried, but <script></script> still remain

    PUT test
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "icu_tokenizer",
              "char_filter": [
                "my_char_filter"
              ],
              "filter": [
                "lowercase"
              ]
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "pattern_replace",
              "pattern": "(<script\b[^<]*(?:(?!</script>)<[^<]*)*</script>)",
              "replacement": " "
            }
          }
        }
      },
      "mappings": {
        "ess": {
          "properties": {
            "text": {
              "type": "text",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }

POST test/_analyze
{
  "analyzer": "my_analyzer",
  "text": "<ul class=wrap><li><a class=toggleTabContent href=#rendement title=Rendement>Rendement</a></li><li><a class=toggleTabContent href=#frais title=Frais>Frais</a></li><li><a class=toggleTabContent href=#comment-investir title=Comment investir ?>Comment investir ?</a></li><li><a class=toggleTabContent href=#supports title=Supports>Supports</a></li><li><a class=toggleTabContent href=#souscrire-documentation title=Souscrire / documentation>Souscrire / documentation</a></li></ul></div><script>function lazyLoadIframes(){jQuery('[data-loaded=false]').each(function() {var llazy = jQuery(this);if (llazy[0].getBoundingClientRect().top != 0 && llazy[0].getBoundingClientRect().top - window.innerHeight < 0) {jQuery(this).attr(data-loaded, true);jQuery(this).attr(src, jQuery(this).attr(data-src));}});}window.addEventListener(scroll, lazyLoadIframes);$('body').on('click', '.s-accordion-title', function(){setTimeout(lazyLoadIframes, 300);});</script><style>iframe[data-loaded]{}</style><script type=\"text/javascript\">if(typeof(wpDataCharts)=='undefined'){wpDataCharts = {};}; wpDataCharts[1] = {render_data: {\"columns\":[{\"type\":\"string\",\"label\":\"Support\",\"orig_header\":\"support\"},{\"type\":\"number\",\"label\":\"Pourcentage\",\"orig_header\":\"pourcentage\"}],\"rows\":[[\"Fonds en euros (Actif g\u00e9n\u00e9ral)\",70],[\"Swisslife Funds Defensive - FR0010308825\",7.5],[\"M&G (Lux) Optimal Income Fund - LU1670724373\",7.5],[\"Oddo Avenir CR-EUR - FR0000989899\",7.5],[\"Comgest Monde C FR0000284689\",7.5]],\"axes\":{\"major\":{\"type\":\"string\",\"label\":\"Support\"},\"minor\":{\"type\":\"number\",\"label\":\"\"}},\"options\":{\"title\":\"\",\"series\":[],\"height\":\"400\",\"responsive_width\":1,\"hAxis\":{\"title\":\"\",\"direction\":\"1\"},\"vAxis\":{\"title\":\"\",\"direction\":\"1\",\"viewWindow\":{\"min\":\"\",\"max\":\"\"}},\"backgroundColor\":{\"fill\":\"\",\"strokeWidth\":\"0\",\"stroke\":\"\",\"rx\":\"0\"},\"chartArea\":{\"backgroundColor\":{\"fill\":\"\",\"strokeWidth\":\"\",\"stroke\":\"\"}},\"fontSize\":\"\",\"fontName\":\"Arial\",\"crosshair\":{\"trigger\":\"\",\"orientation\":\"\"},\"orientation\":\"horizontal\",\"titlePosition\":\"out\",\"tooltip\":{\"trigger\":\"focus\"},\"legend\":{\"position\":\"right\",\"alignment\":\"center\"}},\"vAxis\":[],\"hAxis\":[],\"errors\":[],\"series\":[{\"label\":\"Pourcentage\",\"color\":\"\",\"orig_header\":\"pourcentage\"}],\"group_chart\":false,\"show_grid\":true,\"type\":\"google_donut_chart\"}, engine: \"google\", type: \"google_donut_chart\", title: \"Allocations stars - Profil défensif - Répartition par support\", container: \"wpDataChart_1\", follow_filtering: 0, wpdatatable_id: 1, group_chart: 0}</script>"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.