Indexing Sitecore Content Hub with Solr
Intro
Content Hub maintains its own search index internally this enables a number of standardized search capabilities via its interfaces and APIs, but may be sufficient in some cases, but not nearly as flexible and customizable as, say, a Solr index for Sitecore CMS. What if we need to manage company assets, products and content in Content Hub, but in addition to simply showing these on the website, we need to make them searchable via rich custom UI, where results can be faceted, sorted, boosted, etc.?
Why bother indexing Content Hub content with Solr?
Sitecore Content Hub is a unified platform, which allows to bring together and centralize all content and assets in one system. Content Hub maintains its own search index internally this enables a number of standardized search capabilities via its interfaces and APIs, but may be sufficient in some cases, but not nearly as flexible and customizable as, say, a Solr index for Sitecore CMS. What if we need to manage company assets, products and content in Content Hub, but in addition to simply showing these on the website, we need to make them searchable via rich custom UI, where results can be faceted, sorted, boosted, etc.? Here's one example of such a rich search interface, this follows a typical pattern where company offerings can be searched for, filtered, paged, and sorted based on somewhat complex business requirements for this kind of requirements Content Hub's out-of-the-box
One solution is to use Content Hub as the source of truth for assets, products, and content and then have them synchronized to, say, Sitecore CMS via Sitecore Connect for CMP, in which case everything is synchronized to Sitecore CMS, which in turn makes it available on the web and to 3rd party apps and services, like so:
This is exciting, as there's no more need for all this constant syncing to Sitecore CMS. The only challenge is search: the GraphQL API in its current form does not provide rich search capabilities to power rich custom search UI, like this one. And what is the solution? To create a custom index for Content Hub. This may sound like a lot of effort, but Content Hub's OOTB API hooks and Azure Logic apps where custom and default Azure functions can be combined make things much easier than writing it from scratch
Creating Solr Indexer for Content Hub with Azure Logic Apps and Functions
High-level architecture
I fell in love with Azure Logic Apps and functions, so I chose them as a hosting mechanism and I believe. It's a cloud-native and very cost-effective hosting mechanism where deployment and hosting overhead is low. I've built a few custom Azure functions for Content Hub and JSON processing specific tasks and leveraged some of those provided by the platform where possible.
Here's a high-level architecture diagram describing how these main components work together
TBD
Azure Logic App to index Content Hub content
The below diagram is taken from the Azure logic App designer for my sample indexer. I like this very visual way of coding apps - easy to understand and easy to maintain. This app is POC to illustrate the idea, not quite production ready, but it works and can easily be improved with Azure queues, retry and poison message handling logic, enhanced logging with App insights, and so on, but that would make this blog post very long :)
Here's how it works: once triggered an App will read the affected entity from Content Hub via its REST API using "Get Content Hub Entity Data", then generate Solr payload with "Render JSON Template" and POST to the Solr server.
Details on Azure Functions
When an HTTP request is received
The first item is an entry point of this entire Azure Logic App, It's an abstraction, defining entry point URL and payload JSON schema, which is what Content Hub's "API call" action will invoke each time when target entity is created or deleted (more on this in sections below). An ID of target Entity from Content Hub is getting passed in the payload of this request it will be used to retrieve that Entity in functions/steps below.
Get Content Hub Entity Data
The next function is a custom one: it makes an API call to Content Hub REST API to read the target entity, as well as all related entities, listed in the EntityRelations header parameter. The TargetEntityIdJsonPath parameter is a JSON Path expression, pointing to where the actual ID of the target entity can be found in payload JSON. For the purpose of this POC, I simply passed Content Hub credentials via function header parameters in real-life production scenarios such information should be stored in configs or, even better, Azure Key vault.
Internally this function will read all properties from target entities, then read and add all fields from specified related entities and finally append renditions of the target entity. The output is serialized into JSON, which looks like this:
{
"Properties": { /* target entity properties */},
"Properties": { /* a collection of elements, holding properties of the related entities, specified in EntityRelations parameter above */},
"Renditions": { /* collection of rendition names and their Urls in Content Hub */}
}
See additional sections below for more details on function code and link to code project in GitHub
Render JSON Template
Another custom function, which is a very simple version of templating engine. The template is expected to have two kinds of tokens:
- Tokens enclosed in double curly brackets hold JSON path to value, which should be found in Entity data from the previous call and then injected in place of a given token. Here's how it may look like {{$.Properties.Id}}
- Tokens enclosed in double square brackets can have the actual values to be injected in place of a given token. Those can look like this: [[ Hello world ]]
I used simple Regex to find and then replace all in the template. See additional sections below for more details on function code and link to code project in GitHub.
Update Solr Index
This is an out-of-the-box HTTP POST call to the Solr server to update an index. It takes the output of the above "Render JSON Template" function and forwards it to Solr. Note the Authorization header: this often is the case to have Solr servers be protected with basic authentication in production environments. The value of Authorization header is username:password string encoded as a base64 string
Response
The last element is to format and send a response to the caller. I simply forward the response body from the above call to Solr since calling the action in Content Hub doesn't really care about the response.
Using Content Hub Actions and Triggers to call above Logic App
Please refer to Sitecore documentation for more details.
Appendix
Source code: https://github.com/sergyatsenko/SY.ContentHub.AzureFunctions