API Endpoint to initiate crawl of single page

Hi all -

I am looking for a way to initiate the crawl of a single page on demand via an API call. While looking through the docs I was lead to this page on the Swiftype site. This is exactly what I am looking for but it does not seem to work on App Search version 7.13.1.

The endpoint I am trying to call is:

http://localhost:3002/api/as/v1/engines/[engine_name]/domains/[domain_id]/crawl_url.json

I get the error :

{"errors": "Routing Error. The path you have requested is invalid."}.

Is there an Elastic specific version of this API endpoint?

Thanks as always.

Hey @tchocky! Always super exciting to see folks using our beta crawler. It looks like the API URL is a little different in Elastic vs. Swiftype, but you should still be able to do what you want using the /api/as/v0/engines/[engine_name]/crawler/crawl_requests API endpoint:

Hope that helps!
Constance

Hey @constancecchen - thanks for the reply. My assumption of that endpoint was that that it initiates a full crawl for an engine. Is there a way to pass in a single URL like the Swiftype version? That is the feature I am looking for.

Thanks again.

Hi @tchocky ,

When you crawl the new domain, do you want the results to be ingested into a new App Search engine, or added to an existing engine? Are you changing the domain each time, or just the entry point within a domain?

Say I have an App Search engine with a crawl already configured for it. The crawl has 1 domain, example.com, and 1 entry point, /foo. I can change the entry point from /foo to /bar by invoking:

GET /api/as/v0/engines/$ENGINE/crawler
{
  "domains": [
    {
      "id": "$DOMAIN_UUID",
      "name": "https://example.com",
      "document_count": 0,
      "created_at": "Thu, 17 Jun 2021 19:19:02 +0000",
      "last_visited_at": "Thu, 17 Jun 2021 19:19:15 +0000",
      "entry_points": [
        {
          "id": "$ENTRY_POINT_UUID",
          "value": "/foo",
          "created_at": "Thu, 17 Jun 2021 19:19:02 +0000"
        }
      ],
      "crawl_rules": [],
      "default_crawl_rule": {
        "id": "-",
        "order": 0,
        "policy": "allow",
        "rule": "regex",
        "pattern": ".*",
        "created_at": "Thu, 17 Jun 2021 19:20:54 +0000"
      },
      "sitemaps": []
    }
  ],
  "onboarding_completed": true
}

PUT /api/as/v0/engines/$ENGINE_NAME/crawler/domains/$DOMAIN_UUID/entry_points/$ENTRY_POINT_UUID
{"value":"/bar"}

Then I can invoke the API @constancecchen mentioned to start the crawl.

Hi @Rich_Kuzsma -

Yes, it is one domain within one engine. When a new page is created in our CMS I am looking to crawl that page right away rather than waiting for a scheduled crawl. We have 10K+ pages to crawl so it might take a while for the new page to be indexed if I don't manually trigger it. Your solution seems like it might work and I appreciate your creative thinking.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.