allBlogsList

insite-elasticsearch-only-product-search

January 23rd, 2018

Introduction

Insite's decision to switch to Elasticsearch, in version 4.2, was incredible! Insite continues to amaze me in each new version. Elasticsearch indices are fast. Out of the box, Insite leverages Elasticsearch to identify products for a search query. Elasticsearch is responsible for identifying, sorting, and paging product result sets. Once products are identified, Insite consults the database for the latest information about them. This ensures that search results don't contradict the product detail page. However, this database lookup degrades search performance. So, let's explore the possibility of storing everything needed for search results in Elasticsearch.

When evaluating search performance using RTT (round trip time) as the metric, two facets have significant impact timing:

  1. Product information retrieval - The amount of time required before start sending back results to the browser. We'll improve this by storing sufficient product information in Elasticsearch to suit search needs.
  2. Product information size - Insite's expand parameter gives us some control over the amount of information returned. But, we're looking to optimize the result set. So, in our custom API, we'll create a result model that only includes what is needed for the search results page.

Before beginning, lets first take a look at the pros and cons of using this approach, over the out of the box Insite solution.

Advantages

  • Speed - The primary motivation for implementing a custom search API. My local development environment was producing a thousand products in under half a second.
  • Reducing SQL workload - By removing database dependency for search, we reduce the impact on the database, allowing it's strengths to shine: manage customer orders, aggregate order totals, relate orders to order lines, and report order histories. Elasticsearch isn't built to do aggregate math, manage permissions and ownership, or relate documents to one another.

Disadvantages

  • Not for cloud - The level of customization required for this solution needs Insite 4.2 or 4.3 Enterprise versions. Insite's cloud architecture grows more customizable. But currently, it doesn't allow for this level of customization.
  • Upgrades - By implementing a custom product search API, we're wandering away from how Insite operates out of the box. Customization complicates upgrades. Further, customization complexity increases the complexity of upgrade complications. If you go this route, expect to be revisiting this code in future upgrades. For this reason, I highly recommend taking the optional step, mentioned below, where I suggest duplicating the product search chain of responsibility to create a custom end point.
  • Duplication of data / Stale data - Insite's product source of truth is the Insite database. This approach duplicates the data in Elasticsearch. So, if your products are as volatile as the stock market, this may not be the right approach for your needs. It's possible to update Elasticsearch when each product change occurs, but that's out of scope for this article.
  • Increased indexing time - The approach outlined below requires adding content to the Elasticsearch index. The information needs to be collected and potentially pre-processed to streamline query performance.

Implementation Steps

The implementation requires touching quite a few classes. It may seem complicated, when starting out. Don't fret, InSite designed their classes well. Search is implemented in two process flows:

  • Indexing - The process flow responsible for creating and updating the search index, based on the products in the Insite database. This process flow is triggered by using one of the rebuild tasks in the Marketing / Indexing portion of the Insite adminstration console. The classes involved in this process are in the Insite.Search.Elasticsearch.DocumentTypes.Product.Index namespace.

    Insite already has a great [support article](https://developer.insitesoft.com/hc/en-us/articles/115003321543-Search-Extensions "Search Extensions") covering how to add information to the Elasticsearch index. However, here's an overview of the steps:

    1. Identify the product properties required for your implementation of the search results page. The support article shows how to add value properties, but more complex types are possible. For my purposes, I didn't need to search against the new fields. So, I serialized the complex properties (like default pricing, and variant child information) as JSON to a string property in the IndexableProduct. Elasticsearch allows for complex properties, but you'll have to figure out how to set up the mapping, and Elasticsearch analysis for them.
    2. Extend ElasticsearchProduct with the new properties. This class is used to represent the result from the SQL query returned by IndexableProductRepository.
    3. Extend IndexableProduct with new properties. This class is used to represent the document added to Elasticsearch.
    4. Extend ElasticsearchProductMapping to identify how the new properties are mapped from IndexableProduct to ElasticsearchProduct. This implementation can include any transformations needed for differences between the ElasticsearchProduct (SQL result) and IndexableProduct (Elasticsearch document).
    5. Extend ProductSearchIndexerElasticsearch to use our custom models.
    6. Extend IndexableProductRepository to add your custom fields with SQL queries. The results from the SQL query needs to match the properties in your ElasticsearchProduct extension. Note: the AllProductQueryCustomFields override is consumed in the AllProductQuery(). So, if you need performance gains, I'd recommend adding temporary table queries to your custom AllProductsSql, and then consume the temp tables in ProductQueryCustomFields. In my case, where I needed to add default pricing (for a client who provides break pricing), and variant children information, these performance gains improved an index time of a couple hours down to 12 minutes.
    7. Update the application setting ProductSearchProvider to use our custom indexer. Either through the admin console, or if you want to propagate the configuration change to all environments, you can use a migration script (see the "Altering the Database Schema" section of Insite's rel="noopener noreferrer" support article entitled [Working with Models/ORM (EF)](https://developer.insitesoft.com/hc/en-us/articles/115003321883-Working-with-Models-ORM-EF- "Working with Models/ORM (EF)")).

    Insite Product Indexing Class Diagram

  • Querying - The process flow responsible for searching the Elasticsearch collections. It retrieves Elasticsearch documents based on a search query. The classes involved in this process are in the Insite.Search.Elasticsearch.DocumentTypes.Product.Query namespace.

    Steps to retrieve extension information from the Elasticsearch Product collection:

    1. Extend ProductDto with the properties you wish to expose through the product collection endpoint.
    2. Extend ProductSearchResultDto as an abstract generic class to allow flexibility in the result model.
    3. Create concrete implementation of the new ProductSearchResultDto base class to use your ProductDto extension.
    4. Extend ProductSearchProvider as an abstract generic class to designate a custom ElasticsearchProduct implementation that the new process will extract from the Elasticsearch product collection. This base class makes the ElasticsearchProduct type mutable. So, the custom extended version will be available in the appropriate methods. See the example base class, below.
    5. Implement a concrete version of the abstract ProductSearchProvider class, using the custom ElasticsearchProduct. Primarily, this class provides an implementation of the base class that uses your specific extension of the ElasticsearchProduct, and the extended product search dto. However, it can also provide any customizations you desire based on the new document model.
    6. Optional, but recommended: Copy the existing GetProductCollection chain of responsibility into a custom API. Alter it to return your ProductSearchResultDto extension, so the properties will be reflected in swagger. Also, this reduces the risk of upgrades - worse case, you can go back to using the standard API, by changing the ProductSearchProvider back to what it was, originally.
    7. Update the GetProductCollectionHandler to only use Elasticsearch information in the FindProductsWithSearch() execution path. I'd recommend staying away from the FindProductsWithLookup functionality. It can still leverage fresh product information from the database, for use in the product detail page.

    Insite Product Query Customization Class Diagram

Abstract ProductSearchProvider

public abstract class ProductSearchProviderElasticsearchBase<TProductResult, TProductSearchResultDto> : ProductSearchProviderElasticsearch 
	where TProductResult : ElasticsearchProduct
	where TProductSearchResultDto : ProductSearchResultDto
{
	protected ProductSearchProviderElasticsearchBase(IElasticsearchIndex index, ICacheManager cacheManager, ICatalogCacheKeyProvider catalogCacheKeyProvider, IPerRequestCacheManager perRequestCacheManager, IUnitOfWorkFactory unitOfWorkFactory, IProductSearchFacetProcessor facetProcessor, IElasticsearchQueryBuilder queryBuilder, IPhraseSuggestConfiguration phraseSuggestConfiguration, IBoostHelper boostHelper, IApplicationSettingProvider applicationSettingProvider) 
		: base(index, cacheManager, catalogCacheKeyProvider, perRequestCacheManager, unitOfWorkFactory, facetProcessor, queryBuilder, phraseSuggestConfiguration, boostHelper, applicationSettingProvider)
	{
	}

	protected override IProductSearchResult RunQuery(IProductSearchParameter parameter, bool isAutoComplete = false)
	{
		if (typeof(TProductResult) == typeof(ElasticsearchProduct))
		{
			return base.RunQuery(parameter, isAutoComplete);
		}

		var searchCriteria = parameter.SearchCriteria;
		var useBasicPricing = ApplicationSettingProvider.GetOrCreateByName<bool>("UseBasicPricing");
		var filterWebsiteId = EnableWebsiteSpecificFilters
			? SiteContext.Current.Website.Id.ToString().ToUpper()
			: string.Empty;
		var query = GenerateQueries(parameter.SearchCriteria, parameter.SearchWithin, isAutoComplete);
		FilterContainer filterNoCategory;
		FilterContainer filterNoPriceFilter;
		var filter = GenerateFilters(parameter, out filterNoCategory, out filterNoPriceFilter);
		var searchDescriptor = new SearchDescriptor<TProductResult>().Filter(filter);
		if (query != null)
		{
			searchDescriptor = !EnableProductBoost
				? searchDescriptor.Query(query)
				: searchDescriptor
					.Query(aa =>
						aa.FunctionScore(fs => fs.Query(q => query)
							.Functions(f => f.FieldValueFactor(fv => fv.Field(p => p.Boost)))
							.BoostMode(FunctionBoostMode.Multiply)));
			if (PhraseSuggestConfiguration.Enabled && parameter.IncludeSuggestions)
			{
				var didYouMeanThreshold =
					Math.Max(
						Math.Min(
							UnitOfWork.GetTypedRepository<IApplicationSettingRepository>()
								.GetOrCreateByName<decimal>("Search_Suggestions_DidYouMean_Threshold"), new decimal(5)), decimal.Zero);
				var autoCorrectThreshold =
					Math.Max(
						Math.Min(
							UnitOfWork.GetTypedRepository<IApplicationSettingRepository>()
								.GetOrCreateByName<decimal>("Search_Suggestions_AutoCorrect_Threshold"), new decimal(5)), decimal.Zero);
				searchDescriptor =
					searchDescriptor.SuggestPhrase("didyoumean",
						o => PhraseSuggestConfiguration.Configure(parameter.SearchCriteria, didYouMeanThreshold, o))
						.SuggestPhrase("correction",
							o => PhraseSuggestConfiguration.Configure(parameter.SearchCriteria, autoCorrectThreshold, o));
			}
		}
		var sortBy = parameter.SortBy;
		bool priceSort;
		bool manualSort;
		var sortOrder = GetSortOrder(ref sortBy, parameter, out priceSort, out manualSort);
		if (parameter.DoFacetedSearches)
			searchDescriptor = AddAggregations(searchDescriptor, !parameter.SearchCriteria.IsBlank(), query, filterNoCategory,
				filterNoPriceFilter, filter);
		if (priceSort && !useBasicPricing)
		{
			parameter.SortBy = "1";
			sortOrder = GetSortOrder(ref sortBy, parameter, out priceSort, out manualSort);
			parameter.PageSize = ActualPriceSortMaximum;
			parameter.StartRow = 0;
		}
		searchDescriptor = AddSortOrder(searchDescriptor, sortOrder).From(parameter.StartRow).Size(parameter.PageSize);
		var productDtos = new List<TProductSearchResultDto>();
		var sponsoredDocumentCount = 0L;
		ISearchResponse<TProductResult> result;
		long totalDocumentCount;
		try
		{
			var flag = parameter.DoFacetedSearches &&
					   (parameter.CategoryId.HasValue || parameter.PriceFilters.Any() || parameter.AttributeValueIds.Any() ||
						!parameter.SearchWithin.IsBlank());
			if (!isAutoComplete && !flag && (sortBy == "1" && !parameter.SearchCriteria.IsBlank()) &&
				ApplicationSettingProvider.GetOrCreateByName<bool>("Search_SponsoredSearch_Enabled"))
			{
				var sponsoredResults = GetSponsoredResults(query, filter);
				var sponsoredDocumentIds = sponsoredResults.Documents.Select(o => o.Id).ToList();
				sponsoredDocumentCount += sponsoredDocumentIds.Count;
				searchDescriptor = searchDescriptor.Filter(o => o.And(p => filter, p => p.Not(q => q.Ids(sponsoredDocumentIds))));
				if (parameter.StartRow == 0)
				{
					var sponsoredProductDtos = ConvertSearchProducts(sponsoredResults.Hits);
					sponsoredProductDtos.Each(o => o.IsSponsored = true);
					productDtos.AddRange(sponsoredProductDtos);
					searchDescriptor = searchDescriptor.Size(parameter.PageSize - sponsoredDocumentIds.Count);
				}
			}
			result = Index.Client.Search<TProductResult>(searchDescriptor);
			productDtos.AddRange(ConvertSearchProducts(result.Hits));
			totalDocumentCount = sponsoredDocumentCount + result.Total;
		}
		catch (Exception ex)
		{
			AddApplicationLog($"Elasticsearch search failed on: '{searchCriteria}'\r\n{ex}");
			return new ProductSearchResult()
			{
				Products = new List<ProductSearchResultDto>()
			};
		}
		if (ApplicationSettingProvider.GetOrCreateByName<bool>("Search_LogQueries"))
			AddApplicationLog($"Elasticsearch product search ({result.Total} hits): {result.ConnectionStatus}");
		var results = new ProductSearchResultXc<TProductSearchResultDto>()
		{
			Count = (int)totalDocumentCount,
			SortOptions = SortOptions,
			SortOrder = sortBy,
			CustomProducts = productDtos,
			AttributeTypeDtos = ConvertAggregationToAttributeTypeFacets(parameter.CategoryId, result.Aggs, filterWebsiteId),
			CategoryDtos = ConvertAggregationToCategoryFacets(result.Aggs, parameter.CategoryId),
			PriceRangeDto = ConvertAggregationToPriceRangeFacets(result.Aggs)
		};
		if (parameter.IncludeSuggestions && result.Suggest != null)
			SetSuggestions(parameter.SearchCriteria, result, results);
		SetResultSortOptions(results, useBasicPricing, ref priceSort);
		return results;
	}

	protected virtual SearchDescriptor<TProductResult> AddSortOrder(SearchDescriptor<TProductResult> searchDescriptor, SortOrderField[] sortOrder)
	{
		throw new NotSupportedException($"To allow extensions of ElasticsearchProduct (like '{typeof(TProductResult).Name}'), the concrete class needs to override the default implementation.");
	}

	protected virtual SearchDescriptor<TProductResult> AddAggregations(
		SearchDescriptor<TProductResult> searchDescriptor, bool isSearch, QueryContainer queryContainer,
		FilterContainer filterNoCategory, FilterContainer filterNoPrice, FilterContainer filter)
	{
		throw new NotSupportedException($"To allow extensions of ElasticsearchProduct (like '{typeof(TProductResult).Name}'), the concrete class needs to override the default implementation.");
	}

	protected new virtual ISearchResponse<TProductResult> GetSponsoredResults(QueryContainer query, FilterContainer filter)
	{
		throw new NotSupportedException($"To allow extensions of ElastisearchProduct (like '{typeof(TProductResult).Name}'), the concrete class needs to override the default implementation.");
	}

	protected virtual void SetSuggestions(string searchCriteria, ISearchResponse<TProductResult> result, IProductSearchResult productSearchResult)
	{
		throw new NotSupportedException($"To allow extensions of ElastisearchProduct (like '{typeof(TProductResult).Name}'), the concrete class needs to override the default implementation.");
	}

	protected virtual List<TProductSearchResultDto> ConvertSearchProducts(IEnumerable<IHit<TProductResult>> hits)
	{
		throw new NotSupportedException($"To allow extensions of ElastisearchProduct (like '{typeof(TProductResult).Name}'), the concrete class needs to override the default implementation.");
	}
}
		
public class ProductSearchResultXc<TProductSearchResultDto> : ProductSearchResult where TProductSearchResultDto : ProductSearchResultDto
{
	public override List<ProductSearchResultDto> Products
	{
		get { return CustomProducts.Cast<ProductSearchResultDto>().ToList(); }
		set { CustomProducts = value.Cast<TProductSearchResultDto>().ToList(); }
	}

	public virtual List<TProductSearchResultDto> CustomProducts { get; set; }
}