Dec 7th, 2021: [en] Seven tricks to building fast, reliable search data integrations

Building fast, reliable search data integrations can make or break search experiences. Why? Because developers, like you, have high aspirations for search experiences. And end-users have high expectations for the results their queries return, too. But building data integrations aren’t as simple as they look–there are many considerations to take into account. Whether your data powers customer search on your corporate website, support portal, e-commerce site or your organization’s search across enterprise productivity tools, how that data is served through search impacts end-user success and satisfaction. In this post, we’ll give you tips on how to bridge the gap between building data connectors for the ideal search experience and actually getting it done.

1. Start with the end state

When building content integrations, define desired goals and keep them in mind as you design. Picture the search experience you want so your vision guides decision-making along the way. Consider what page display elements are needed like fields to filter on, for example, and what metadata is most useful to end-users. Don’t forget to consider your use case! The data that end-users need varies depending on whether you’re building ecommerce or document search, for example.

2. Only index the data you need

Start small and exercise restraint as you begin to index data. Avoid data pollution and don’t start a project with an unwieldy scope. How much data to ingest should depend upon what stage your project is in. Limiting data sources and objects at the outset of ingestion will also help you identify what data is needed as you build. You can always add more later as you develop the right search experience for end-users!

3. Push or pull? It depends.

Whether you push data to a search index or pull it from a third-party platform, to push or pull is one of the most critical data integration decisions you’ll make. Not coincidentally, it’s also a very common customer question at Elastic. Which way is best? There are pros and cons to both approaches and each one can impact performance and search relevance. Figure out what’s necessary for your unique search experience.

The case for pulling aka polling aka crawling
Advantageous if content is already available publicly.

  • Pros: Requires less technical work. Developers can rely on existing tools like web crawlers, existing connectors, or APIs. Can schedule synchronizations to retrieve data when there are fewer demands on infrastructure. When changes are made down the line, less technical know-how is needed.
  • Cons: Data is never ingested in real-time. With this method, data is always pulled at a single point in time.

The case for pushing data from content sources to your search experience via API
Well-suited for applications built in-house or custom layers built on top of databases. Useful for applications with webhooks and/or real-time content.

  • Pros: This approach gives operators the most control, because you can dictate when to send data and how it’s sent–including the format you use for sending. Keeps data fresh and end-users get the benefit of searching over real-time information.
  • Cons: Requires more technical know-how. If the team wants changes to search output, technical expertise is necessary.

4. Real-time indexing is not always the answer

Repeat after us! Real-time data is not always the end-all be-all. Take a close look at the use case you are solving for, how often your data changes, and network traffic considerations. Be smart about when to run synchronization jobs based on data refresh patterns and keep your users’ expectations in mind.

5. Index incrementally

Start small with the data you plan to ingest and iterate from there. Don’t start with a plan to ingest multiple content sources. It’s best to begin with one or two data sources and add more later. Be sure your initial design allows for flexibility so you can add connector objects and sources over time. Think about what you need to build for–don’t build a framework unless you need one. It’s always best at the outset of a project to ingest data efficiently at first, leaving room to augment it in time. And if you’re using a pre-existing data connector, keep in mind that your content source may also add objects. Build with simplicity and flexibility in mind.

6. Don’t reinvent the wheel–use existing code

Use pre-existing connectors, data libraries, open source software, API clients, and code examples to help you determine the best experience for end-users. It can be tempting to build your own tools, but this is a great time to rely on the specialists who build and maintain integrations. Before deciding what expert code to leverage, make sure to do due diligence to determine if projects are fully maintained. Be prepared: you may need to make tradeoffs. If pre-existing code doesn’t meet 100% of your requirements, they still may be worth using if you don’t need to maintain the code yourself. If a pre-built tool gets you 80% of the way there, why not consider it?

7. Test and automate as early as possible

Catching bugs early will save you money and keep you headache free–before users complain or you get an influx of support tickets. Test integrations frequently as you move data from one repository to the next so you catch issues along the way. Writing automated tests is a bigger up-front investment, but they pay off over time and allow you to test code at scale.

This holiday season, we hope these tips save you time and keep heartburn at bay as you build the next great search experience for your users. But we want to hear from you! What tips do you agree with? Which ones do you disagree with? Which tips did we miss?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.