Checking date of last Full Site Crawl. Or Last Index date of a document?

We use a Swiftype engine to manage search across our documentation site. There are over 1600 pages/documents in our engine right now.

We are trying to write a few checks on Swiftype for internal controls/QA to ensure that swiftype is seeing all our pages, and trying to determine the last full crawl date of our engine was within the past 7 days. This data is available via the userPortal ..

But this is not the updated_at param on the engine itself.

Is there a way to get the last full crawl date of the engine? OR is there a way I can get the last index date on a document ?

Any help would be greatly appreciated.

I'm happy to hear if my question is not valid, or checking the index date is not possible. Any response would be great.

Hi @Andrew_Sepic, the updated_at value should be accurate if you are looking at a document rather than the engine, see Crawler Overview | Swiftype Documentation. The updated_at should be the date the doc was last indexed (i.e. last crawl date).
Would this be the value you're looking for for what you're writing?

Hi @nfeekery Thanks for the reply. Ok, great, so the updated_at date should be accurate. How often should I expect a document to be indexed?

If I'm seeing that my engine has it's most recent full crawl completed today, and plans to start a new full site crawl today, I'm assuming that the updated_at date should be similar.. ie: it should be the current date or within 24 hrs. Is that accurate?

@Andrew_Sepic yes that's correct, the updated_at value should update for the documents of any URL endpoints encountered during that crawl. It will differ by n seconds/minutes/hours per doc depending on how long the crawl takes, as the value is specifically when that document was indexed.

If the Swiftype crawler doesn't encounter a URL for an existing doc during its full crawl, it should delete the doc. So I don't think there should be a situation where there are large discrepancies between updated_at values.

@nfeekery Thanks for verifying that. Seems to make sense to me.

I don't understand why I'm not seeing that though..

  1. I make a request via the node client @elastic/site-search-node and get 200 documents back from my engine.

  2. I random choose an externalId and make a request to get that document that looks like this.. https://api.swiftype.com/api/v1/engines/${engine}/document_types/page/documents/${externalId}?auth_token=${apiKey})

And the object I get back on that document (as of today June 21/2024) is below. It does look like there is a duplication of content in the response, but maybe I'm interpreting that wrong.. But the updated_at date is clearly March 21,2024. Suspiciously 3 months behind today. If I make requests to other documents, I get the same consistent date of 2024-03-21.

Any idea whats going on?

{
  "external_id": "f8a46f9624ddee1152b9e56865bbf468e69a8a19",
  "engine_id": "5c6adb81d3b68758d0c5c15a",
  "document_type_id": "5c6adb82d3b68758d0c5c15b",
  "id": "65fb78a0196a678073395947",
  "updated_at": "2024-03-21T00:00:32Z",
  "title": "Directions API Playground",
  "excerpt": "Retrieve turn-by-turn instructions using four different Mapbox routing profiles.",
  "image": "https://static-assets.mapbox.com/branding/social/social-120x120.v2.png",
  "site": "Developer Playgrounds",
  "contentType": "playground",
  "sections": ["Directions API Playground"],
  "body": "...",
  "type": "",
  "published_at": "2024-06-21T12:19:01Z",
  "popularity": 1,
  "info": "",
  "url": "https://docs.mapbox.com/playground/directions/",
  "updated_at": "2024-03-21T00:00:32Z",
  "title": "Directions API Playground",
  "excerpt": "Retrieve turn-by-turn instructions using four different Mapbox routing profiles.",
  "image": "https://static-assets.mapbox.com/branding/social/social-120x120.v2.png",
  "site": "Developer Playgrounds",
  "contentType": "playground",
  "sections": ["Directions API Playground"],
  "body": "...",
  "type": "",
  "published_at": "2024-06-21T12:19:01Z",
  "popularity": 1,
  "info": "",
  "url": "https://docs.mapbox.com/playground/directions/"
}

My engine says this..

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.