Show only hited text

I'm using ES to save crawled web page.
But when I searched with regex, it's too annoying to find what word matched because "html" data field have entire HTML page.
Is there no way to print matched word only or abbreviate around the matched word on the column?

Thanks for your help.

What are you using to index the data into Elasticsearch? Logstash? If the web pages you are crawling have a common structure you might be able to break up the data into different fields, but if every page is unique, I don't know how you could break up a single html field into different fields.

What is your end goal?

You might also want to move this question into the Logstash channel if it's specifically about how to improve the format your data is stored in.

I'm not using Logstash but using the crawler program that work with ES.

My goal is

  1. search with regex pattern and extract exactly matched field. (find email, credit card number ...)
  2. Identify the context of a word easily with abbreviated html page.

May be grep(linux command) style output would be nice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.