App Search: how to best index a site that doesn't crawl well - Confluence wiki for example

I'm very new to search and elastic, and I would like to close a large gap in my understanding. I'm reading your docs, done the tutorials, but I feel like I'm missing something basic.

So I'm working on a niche tech support site that has a search box that searches on more than a dozen sites of differing types.

So far I've set up a meta engine with Elastic Could App Search, set up webcrawls on everything, and used the Search UI, which all works beautifully. Very easy - thank you Elastic.

Now I'm running into the expected problem that webcrawl doesn't work everywhere. As stated in the docs, webcrawler won't work on single page-type websites. And we've got an Atlassian Confluence Wiki Space where searches don't seem to work.

So I see that there are many types of connectors for Workplace Search, including one for Confluence. Can I somehow use these with App Search? What do people do when they want to index something like their confluence wiki pages into an App Search engine?

Hey @tonyfam ,

Have you looked at Connecting Confluence Cloud | Workplace Search Guide [8.4] | Elastic already?

App Search and Workplace Search are both part of Enterprise Search. While the Confluence Connectors are currently part of Workplace Search, they do have access to a lot of the same features available in App Search. Is there something in particular in App Search that you're tied to?

Definitely I would not expect Crawler to work well on Confluence. Unfortunately, crawler shines best right now when you have only static sites, and are in full control of your HTML and sitemaps. That's just not the situation with Confluence.

Sean, thank you for the answer. Yes, I have looked at that, which is why I was wondering how to do the same in App Search.

After spending plenty of time with the elastic docs and presentations, I was under the impression Workplace Search is for someone's internal workplace, like within a company.

So we need a search bar inside a website for many sites, and the Search UI from App Search, as well as the webcrawling and the engines, seemed to work well for this. Have I made a big mistake?

Note, the site we are making will be public.

Sean, I had forgotten that Serena Chou explained something about the Confluence connector a month ago when I was getting started. So I will see if I can get that working.

But I'm still concerned that you are wondering why we don't use Workplace Search and whether I went the right direction for our case.

Hey @tonyfam

Like I'd said, Workplace Search and App Search share a lot under the hood. You're not alone in getting the impression that Workplace Search is ONLY for searching inside your own workplace, and App Search could ONLY be used for building search experiences for others. We're working to blur these lines in 8.3, 8.4, and future versions, but we haven't yet got a Confluence Cloud connector for App Search that is GA.

You've got a few options.

One is to use Workplace Search, and make use of the Search UI + Workplace Search integration. This could let you allow your customers to search over the two products from a single interface.

Another option, as Serena had suggested, is to use the 8.3 Confluence Cloud connector package. Note, this connector package was in Technical Preview in 8.3 and is NOT present in 8.4. We hope to bring our full suite of connectors outside of Workplace Search in a future release, but we just aren't there yet.

Another option could be to just look at the Confluence Cloud connector package codebase, and use that to write up your own integration to index data into App Search directly via the App Search Documents API.

Good luck!

Thank you very much for that explanation. Sounds like the product is evolving!

So my problem: I've got the search all set up with a customized App Search Search UI, using url-host as facets so user can pick their source. This is for a website going online very soon, with the webcrawler working very well for most of the sites. But it sounds like I made a big mistake going with App Search, since we do have a very important Confluence cloud source as well.

Thank you for laying out options.

For option 1) Set everything up in worksplace search. Worksplace Search UI (the doc says the connector is still in technical preview though) Does WS also do web crawling? When you say "search over 2 products from single interface", does that mean I'd be able to show results blended from the workplace search connectors and also our existing app search engines? Do you have a feel for whether I could easily move the App Search UI to Workplace Search UI, or is it completely different?

For option 2) Confluence Cloud connector package that can be used outside of WS - do you have a feel for when this could be ready?

For option 3) if I did try to implement confluence connector into app search, is it some of the same work that you are doing in 2? Is it a big job or something people do all the time.

I'm trying to weigh my next step to fix my problem the quickest way possible to get the confluence docs into our search.
Thank you so much for any help!

PS Is "Site Search" still a part of Enterprise search going forward, and how would that fit in?

But it sounds like I made a big mistake going with App Search

I don't see it that way. There are tradeoffs between App Search and Workplace Search, and neither would perfectly cover 100% of what you're trying to do. Sounds like App Search was the right choice for all the sites that are working well with the App Search Crawler, and now we just need to figure out how to get your Confluence search results displayed alongside the crawl results. :slight_smile:

I'll try to answer individual questions below.

Does WS also do web crawling?

No.

When you say "search over 2 products from single interface", does that mean I'd be able to show results blended from the workplace search connectors and also our existing app search engines?

Yes. Our Search UI product is a front-end framework for helping you build search experiences. App Search can generate a Search UI experience for you, but you can modify the generated experience, or build your own from scratch. Search UI can work with App Search, Workplace Search, and/or Elasticsearch.

Confluence Cloud connector package that can be used outside of WS - do you have a feel for when this could be ready?

I do not. I can encourage you to watch our release notes, as it is a connector we know many of our customers care about. But we cannot make promises for when new features will be delivered, as we are a publicly traded company. However, if you have a support relationship with Elastic, I'd encourage you to file an Enhancement Request, to help us understand your use case and timeline, and that may help influence timings.

if I did try to implement confluence connector into app search, is it some of the same work that you are doing in 2? Is it a big job or something people do all the time?

It's very doable. In fact, in a free week, I built a logstash input plugin that could extract data from Confluence Cloud. You'd be welcome to fork this if you'd like. Do note - this repo is not an Elastic product, but my personal code, so it does not come with any support guarantees. But even if you don't use it directly, it should give you an idea of how you could take the connectors_sdk gem and use it as a library to get data out of Confluence.

Is "Site Search" still a part of Enterprise search going forward, and how would that fit in?

"Site Search" is going to live on swiftype.com for the foreseeable future. We're working to ensure that the Crawler is as full-serviced in Enterprise Search as it is in swiftype.com, but we do see them as separate use cases, and have not planned to migrate customers at this time.

Sean, I greatly appreciate your help! I'll comment back when we get something working. Thank you.

OK, this is great, we are well on our way with option 1: using an index from workplace search in app search.
Just upgraded to 8.4, and, for the benefit of anyone happening on this thread, this doc page is so helpful! Engines and content sources | Elastic Enterprise Search documentation [8.4] | Elastic

So close! Now a new problem, but hopefully this one is minor: we have our Confluence connected up to Workplace Search, and I can see the index when I view hidden indexes.
But, when I go to App Search Create Engine from Elastic Search index, I only get one index item from the drop box, and it doesn't let me enter one. Am I missing a step?

Here is my new worksplace search index

Ha, this is a creative approach that I did not anticipate. I'm entirely sure it will work. But it just might.

The App Search UI is only going to let you pick elasticsearch indices/aliases that start with search-, that's what the blue informational banner is where you can't find the index you're looking for. You can get around this first hurdle by creating an alias to .ent-search-engine-documents-confluence-cloud-* that starts with search-.

The next bit is where things will be dicey. The mappings for the Workplace Search managed index may not be exactly what App Search is expecting. App Search will gracefully degrade in the areas where the mappings are not correct, but that may mean that elements you're hoping for (like precision tuning) may not be available for this engine.

Points for creativity though. The "Bring your own Elasticsearch Index" feature is pretty new, and I don't think I've heard of anyone else anticipating it being used like this. Let us know how it goes, and definitely share any feedback you have!

:sweat_smile: I should have clicked on the docs link you shared. I didn't realize this was something we actually have documented! In that case, it should work great, and the only thing in your way should be creating that alias.

I think that doc might be very new! Maybe even just a few days? At any rate, I hadn't found it before and it and others explain the app search/workplace uses and indexes across both sections very well, which may be kind of new in 8.4, at least for dashboard support?

So I am still blocked, because the problem is not about creating the alias, the problem is that I can't select my index from the App Search Create Engine (with Index) form. Looks like the dropdown that lists available indexes is only showing the non-hidden indexes. As I understand it, you can create an alias there once you pick an index from the list (which I can't do because it is not there).
I'm not sure how to get around this-- but we are so close! As a workaround, is there a way to un-hide my confluence index, or otherwise get it into an app search engine?

What do you mean, that's it's creative? I thought I was following that documentation to a T. Distinct possibility always exists that I misunderstood something along the way!

Steps:

  1. Using Workplace Search user interface, connected our Confluence site with the Confluence Cloud connector. Synchronized documents. This creates the index, I think.
  2. I confirmed my new confluence-docs index was there using Indexes page, which I think is new.
  3. Go to App Search and Add Engine. Pick Indexes (new as of 8.4)
    That is the screen where I have trouble. I see the thing about that it needs to start with "search" but that is what the "create alias" is for I think.

I tested it using the kibana-sample** (the only index available to select on this dialog box! It does not start with "search".) When I select the kibana-sample index, then the "alias" field auto-fills with search-kibana-sample***. I press contine, and voila, a new App Search engine with an "Elastic Search Index" is created. Like this:

I feel like if I could only get our index into this dialog box, then we too could have an engine in app search made from an Elasticsearch index (made with Workplace Search) But how? Do I have to do something else to our index, perhaps some switch somewhere... something... to get it on the list in this dialog box?

@tonyfam I expect that we have a gap in this UI where it won't let you create search- prefixed aliases for hidden indices. I'll raise this with our team.

Can you try using the Alias API instead of this UI to make yourself an alias for your Workplace Search documents index?

Thanks for trying out this new feature! What you're trying is absolutely supported, and is one of the core use cases behind this feature. We want to provide more interoperability between Workplace Search content sources, App Search engines, and direct Elasticsearch functionality.

To create an App Search engine from a Workplace Search content source in 8.4, you'll need to do the following:

  1. Create an Elasticsearch alias with the prefix search- for the Workplace Search content source index.

This document explains how to find your Workplace Search content source index, e.g. .ent-search-engine-documents-custom-62fe4494b1720490773b75d7.

To create a search- prefixed alias for the Workplace Search content source index, use the Elasticsearch Dev Tools or Elasticsearch API directly to create an alias, e.g.:

PUT /.ent-search-engine-documents-custom-62fe4494b1720490773b75d7/_alias/search-my-content
  1. In App Search, create an Elasticsearch index-based engine from the search- alias created in step 1.

Now you should be able to use App Search engines with your Workplace Search content.

The use case you're trying to achieve is complicated by these factors:

  • The index listing on the "Create engine" screen does not show hidden indices, and the .ent-search-documents-* indices are hidden.
  • It's only possible to create an engine from:
    • A non-hidden index with the prefix search-, or
    • An alias with the prefix search- that points to a non-hidden index, or
    • An alias with the prefix search- that points to a hidden index that is itself prefixed with .ent-search-documents. This is a special case to allow mixing Workplace Search content sources with App Search engines.

Sorry this is not well explained in the documentation; we will definitely correct that. And we are actively working to improve some of these rough edges in the beta. Your feedback here is already helping us improve.

Thank you Rich and Sean!
We now have a working App Search engine from a Workplace Search connector index!

I followed Rich's instructions to make an alias for our hidden index, doing the Put Command in the Dev Tools Console.
The alias of the hidden index now shows up in the list in the Create App Engine form.

Thank you, elastic team, for your quick responses!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.