I'm using ES to save crawled web page.
But when I searched with regex, it's too annoying to find what word matched because "html" data field have entire HTML page.
Is there no way to print matched word only or abbreviate around the matched word on the column?
What are you using to index the data into Elasticsearch? Logstash? If the web pages you are crawling have a common structure you might be able to break up the data into different fields, but if every page is unique, I don't know how you could break up a single html field into different fields.
What is your end goal?
You might also want to move this question into the Logstash channel if it's specifically about how to improve the format your data is stored in.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.