Sample code to find similar HTML Documents

Hi, I want to index around 10 million of web pages HTML code in elasticsearch,
now, I want to give the HTML content as a search query and get the most similar documents related to the search query ( which is an HTML )

so If I put the HTML page that already indexed, it should return the exact same documents, otherwise, it should send nearest one.

how can I do this with elasticsearch? is there any sample code for this?

Best Regards

Welcome!

That's normally the default behavior. It's all about relevancy.

how can I do this with elasticsearch? is there any sample code for this?

Something like:

GET /indexname/_search
{
  "query": {
    "match": {
      "content": "YOUR HTML CONTENT HERE"
    }
  }
}

That might give you good results out of the box.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.