Thursday, July 16, 2009

You May Be Interested In The Following Products…

In my last blog entry, we came up with an inline table-valued function in AdventureWorks (AW) that accepts a ProductID and CustomerID and will return 3 “suggested” ProductID’s. The idea was that a Customer would go to AW’s website and look at a Product, and the 3 suggested products would be presented under the title “Customers who bought this item also bought…”

We took a random customer from the AW database named Zachary Hughes (CustomerID 12203) in order to build and test our function. We’ll continue to use our pal Zack in this blog entry also.

We left off with looking at the products suggested to Zack when he’s looking at ProductID 777:

select * from dbo.ufn_GetSuggProducts(777,12203)
/*
SuggProdID
-----------
872
934
878
*/
So it occurred to me that Zack has bought several products in the past. Just because I’m a data nerd, I was wondering what kind of suggested products come up if we were to “feed” those previously-purchased products to our function for Zack?

First we have to find out what Zack has purchased. Let’s use a Common Table Expression (CTE) called CustProds to get all the distinct products that Zack has bought, and we’ll JOIN in the Product table so we can see the names of those products.

declare @CustomerID int
set
@CustomerID=12203 --Our pal Zack!
;with CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
select CustProdID=cp.ProductID
,CustProdName=p1.Name
from
CustProds cp
join Production.Product p1 on cp.ProductID=p1.ProductID
/*
CustProdID CustProdName
----------- --------------------------
711 Sport-100 Helmet, Blue
712 AWC Logo Cap
713 Long-Sleeve Logo Jersey, S
870 Water Bottle - 30 oz.
871 Mountain Bottle Cage
873 Patch Kit/8 Patches
874 Racing Socks, M
921 Mountain Tire Tube
929 ML Mountain Tire
930 HL Mountain Tire
*/
Now for each of those products, we’ll “feed” that product into our function via a CROSS APPLY operator. And we’ll JOIN in the Product table once again so we can also see the names of those suggested products:

declare @CustomerID int
set
@CustomerID=12203 --Our pal Zack!
;with CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
select CustProdID=cp.ProductID
,CustProdName=p1.Name
,SuggProdID
,SuggProdName=p2.Name
from
CustProds cp
join Production.Product p1 on cp.ProductID=p1.ProductID
cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp
join Production.Product p2 on sp.SuggProdID=p2.ProductID
/*
CustProdID CustProdName SuggProdID SuggProdName
----------- -------------------------- ----------- -----------------------
711 Sport-100 Helmet, Blue 922 Road Tire Tube
711 Sport-100 Helmet, Blue 923 Touring Tire Tube
711 Sport-100 Helmet, Blue 878 Fender Set - Mountain
712 AWC Logo Cap 707 Sport-100 Helmet, Red
712 AWC Logo Cap 922 Road Tire Tube
712 AWC Logo Cap 878 Fender Set - Mountain
713 Long-Sleeve Logo Jersey, S 922 Road Tire Tube
713 Long-Sleeve Logo Jersey, S 878 Fender Set - Mountain
713 Long-Sleeve Logo Jersey, S 707 Sport-100 Helmet, Red
870 Water Bottle - 30 oz. 707 Sport-100 Helmet, Red
870 Water Bottle - 30 oz. 708 Sport-100 Helmet, Black
870 Water Bottle - 30 oz. 878 Fender Set - Mountain
871 Mountain Bottle Cage 878 Fender Set - Mountain
871 Mountain Bottle Cage 707 Sport-100 Helmet, Red
871 Mountain Bottle Cage 708 Sport-100 Helmet, Black
873 Patch Kit/8 Patches 877 Bike Wash - Dissolver
873 Patch Kit/8 Patches 878 Fender Set - Mountain
873 Patch Kit/8 Patches 707 Sport-100 Helmet, Red
874 Racing Socks, M 878 Fender Set - Mountain
874 Racing Socks, M 922 Road Tire Tube
874 Racing Socks, M 707 Sport-100 Helmet, Red
921 Mountain Tire Tube 708 Sport-100 Helmet, Black
921 Mountain Tire Tube 707 Sport-100 Helmet, Red
921 Mountain Tire Tube 878 Fender Set - Mountain
929 ML Mountain Tire 708 Sport-100 Helmet, Black
929 ML Mountain Tire 707 Sport-100 Helmet, Red
929 ML Mountain Tire 878 Fender Set - Mountain
930 HL Mountain Tire 708 Sport-100 Helmet, Black
930 HL Mountain Tire 707 Sport-100 Helmet, Red
930 HL Mountain Tire 878 Fender Set - Mountain
*/
Each of Zack’s original products produced its own list of 3 suggested products. So, for example, looking at the Racing Socks (ProductID 874) that Zack had purchased, it looks like Customers who bought that item also bought a Mountain Fender Set and/or a Road Tire Tube, and/or a Red Sport Helmet. (Remember that the suggested products that our function produces are items that Zack himself has never purchased before).

Note that there are many repeated suggestions here. That makes me think of something. If we take a Customer’s purchasing history, we can come up with a list of (distinct) suggested products based on that and promote them to the customer when he logs into the AW website under a heading “You may be interested in the following products…” We’ve got to let the customer know that he’s sort of “missing out” on all this wonderful stuff that other people have bought but he hasn’t.

So let’s come up with yet another table function that will accept a CustomerID and will come up with some items to promote to the customer based on his purchasing history.

Continuing on with Zack as our guinea pig test case, let’s GROUP BY the suggested products we produced in our last query and see how many times they occur, showing the most-occurring products at the top. Note that I created a second CTE (called SuggProds) that produces our Suggested Products and we do the GROUP BY query off of that CTE:

declare @CustomerID int
set
@CustomerID=12203 --Our pal Zack!
;with CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
,
SuggProds as
(
select SuggProdID
from CustProds cp
cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp
)
select SuggProdID
,SuggProdName=p.Name
,NumOccurs=count(*)
from SuggProds sp
join Production.Product p on sp.SuggProdID=p.ProductID
group by SuggProdID
,p.Name
order
by count(*) desc
/*
SuggProdID SuggProdName NumOccurs
----------- ----------------------- -----------
878 Fender Set - Mountain 10
707 Sport-100 Helmet, Red 9
708 Sport-100 Helmet, Black 5
922 Road Tire Tube 4
923 Touring Tire Tube 1
877 Bike Wash - Dissolver 1
*/
Looks like that Mountain Fender Set is something that Zack really needs, huh?

But notice that the list came up with 2 different Helmets… one red and one black. There are also two kinds of Tire Tubes. We don’t really want our list of promo products to be repetitive like that. In general, I’m guessing that we don’t really want to repeat items of the same Product SubCategory.

Let’s add the Product SubCategory to our query so we can analyze this further:

declare @CustomerID int
set
@CustomerID=12203 --Our pal Zack!
;with CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
,
SuggProds as
(
select SuggProdID
from CustProds cp
cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp
)
select SuggProdID
,SuggProdName=p.Name
,SuggSubCatg=p.ProductSubCategoryID
,NumOccurs=count(*)
from SuggProds sp
join Production.Product p on sp.SuggProdID=p.ProductID
group by SuggProdID
,p.Name
,p.ProductSubCategoryID
order by count(*) desc
/*
SuggProdID SuggProdName SuggSubCatg NumOccurs
----------- ----------------------- ----------- -----------
878 Fender Set - Mountain 30 10
707 Sport-100 Helmet, Red 31 9
708 Sport-100 Helmet, Black 31 5
922 Road Tire Tube 37 4
923 Touring Tire Tube 37 1
877 Bike Wash - Dissolver 29 1
*/
Yep, just as we thought… the two helmets are from SubCategory #31 and the Tire Tubes are from SubCategory #37. I think we should pull out the top product (based on the highest number of occurrences) from each SubCategory.

We can easily do that using the ROW_NUMBER() function. Let’s just see what happens when we add that window function to our query, doing a PARTITION BY the SubCategoryID and ORDERing BY the Number of Occurrences (which is just COUNT(*)) in descending order:

declare @CustomerID int
set
@CustomerID=12203 --Our pal Zack!
;with CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
,
SuggProds as
(
select SuggProdID
from CustProds cp
cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp
)
select SuggProdID
,SuggProdName=p.Name
,SuggSubCatg=p.ProductSubCategoryID
,NumOccurs=count(*)
,RowNum=row_number() over (partition by p.ProductSubCategoryID
order by count(*) desc)
from SuggProds sp
join Production.Product p on sp.SuggProdID=p.ProductID
group by SuggProdID
,p.Name
,p.ProductSubCategoryID
order by count(*) desc
/*
SuggProdID SuggProdName SuggSubCatg NumOccurs RowNum
----------- ----------------------- ----------- ----------- ------
878 Fender Set - Mountain 30 10 1
707 Sport-100 Helmet, Red 31 9 1
708 Sport-100 Helmet, Black 31 5 2
922 Road Tire Tube 37 4 1
923 Touring Tire Tube 37 1 2
877 Bike Wash - Dissolver 29 1 1
*/
All the items that got assigned RowNum=1 are the ones we should present as our Promo Products. So let’s put our main query into its own CTE (which we’ll call SuggSummary), and we’ll pull out all rows with RowNum=1 from that:

declare @CustomerID int
set
@CustomerID=12203 --Our pal Zack!
;with CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
,
SuggProds as
(
select SuggProdID
from CustProds cp
cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp
)
,
SuggSummary as
(
select SuggProdID
,SuggProdName=p.Name
,SuggSubCatg=p.ProductSubCategoryID
,NumOccurs=count(*)
,RowNum=row_number() over (partition by p.ProductSubCategoryID
order by count(*) desc)
from SuggProds sp
join Production.Product p on sp.SuggProdID=p.ProductID
group by SuggProdID
,p.Name
,p.ProductSubCategoryID
)
select SuggProdID
,SuggProdName=SuggProdName
,SuggSubCatg
from SuggSummary
where RowNum=1
order by NumOccurs desc
/*
SuggProdID SuggProdName SuggSubCatg
----------- ----------------------- -----------
878 Fender Set - Mountain 30
707 Sport-100 Helmet, Red 31
922 Road Tire Tube 37
877 Bike Wash - Dissolver 29
*/
Looks good! We’ve got all of our Promo Products from different SubCategories, so there’s a good variety of items here.

I think we can go ahead and create our function now. Just so we don’t overwhelm the customer once he signs on to the AW website, we’ll limit our Promo Products to 5 items. So that means adding a TOP (5) to our main query. Since our function is only going to return a single column of the Promo ProductID’s, we don’t need the Product.Name column in our SuggSummary CTE anymore… it was just there so we could see the descriptions as we were building our solution. Here’s the final function:

if object_id('ufn_GetPromoProducts') is not null drop function dbo.ufn_GetPromoProducts
go
create function dbo.ufn_GetPromoProducts
(
@CustomerID int
)
returns table
as
return
with
CustProds as
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
)
,
SuggProds as
(
select SuggProdID
from CustProds cp
cross apply dbo.ufn_GetSuggProducts(cp.ProductID,@CustomerID) sp
)
,
SuggSummary as
(
select SuggProdID
,SuggSubCatg=p.ProductSubCategoryID
,NumOccurs=count(*)
,RowNum=row_number() over (partition by p.ProductSubCategoryID
order by count(*) desc)
from SuggProds sp
join Production.Product p on sp.SuggProdID=p.ProductID
group by SuggProdID
,p.ProductSubCategoryID
)
select top (5) PromoProdID=SuggProdID
from SuggSummary
where RowNum=1
order by NumOccurs desc
So now let’s test out the function with Zack’s CustomerID:

select * from dbo.ufn_GetPromoProducts(12203)
/*
PromoProdID
-----------
878
707
922
877
*/
Yep, those are the ones. And what if we take another customer… any customer… how about Elizabeth Johnson (Customer 11004)? What would we promote to her when she logs on to the AW website?

select * from dbo.ufn_GetPromoProducts(11004)
/*
PromoProdID
-----------
870
921
711
*/
So we built a pretty sophisticated and complicated function here in a step-by-step manner, building from the ground up. Our new function involves multiple CTEs and multiple JOINs and a CROSS APPLY function call to yet another function that employs many JOINs and WHERE predicates.

If we were to expand everything we did into one giant query (converting the CTEs to derived tables), this is what it looks like:

select top (5) PromoProdID=SuggProdID
from
(
select SuggProdID
,SuggSubCatg=p.ProductSubCategoryID
,NumOccurs=count(*)
,RowNum=row_number()
over (partition by p.ProductSubCategoryID
order by count(*) desc)
from
(
select SuggProdID
from
(
select distinct ProductID
from Sales.SalesOrderHeader soh
join Sales.SalesOrderDetail sod
on soh.SalesOrderID=sod.SalesOrderID
where CustomerID=@CustomerID
) cp
cross apply
(
select top (3) SuggProdID=d2.ProductID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1
on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c
on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2
on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2
on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p
on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc
on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=cp.ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and p.ProductSubCategoryID<>(select ProductSubCategoryID
from Production.Product
where ProductID=cp.ProductID)
and sc.ProductCategoryID in (3,4)
and d2.ProductID not in (select ProductID
from Sales.SalesOrderDetail sod
join Sales.SalesOrderHeader soh
on sod.SalesOrderID=soh.SalesOrderID
where CustomerID=@CustomerID)
group by d2.ProductID
having count(distinct h2.SalesOrderID)>5
order by count(distinct h2.SalesOrderID) desc
) ufn_GetSuggProds
) sp
join Production.Product p on sp.SuggProdID=p.ProductID
group by SuggProdID
,p.ProductSubCategoryID
) SuggSummary
where RowNum=1
order by NumOccurs desc
That looks pretty intimidating, but the path to getting there was quite easy, taking it one step at a time.

Wednesday, July 15, 2009

Customers Who Bought This Item Also Bought…

Last week, Peter Myers of Solid Quality Mentors gave a terrific presentation on Data Mining to our San Francisco SQL Server User Group. And I happened to have “mining” on my mind since I had just written my silly blog entry about the Seven Dwarfs, so I guess it was a sign of some kind.

In giving an overview of data mining, Peter brought up one example of it that all of us have seen many times: When you go to Amazon and many other websites and you are looking at an item, they give you suggestions for other products under the heading “Customers who bought this item also bought…”. This is a great marketing tool. I know I’ve clicked on those items many times… in fact, just after I read a good book, I look it up on Amazon to get ideas for others.

By the way, on a side note, there is a data mining anecdote (perhaps an urban legend?) that’s been going around for years that a surprising correlation was discovered in that people who bought diapers also tended to by beer as well. You can read about it in more detail here and here.
Anyway, this got me to thinking how I would approach this “Customers Who Bought..” scenario. Even though the Data Mining Tool in Analysis Services can help you with this, I just thought it would be a fun little challenge to solve from scratch using good old-fashioned SQL.

So let’s roll up our sleeves and fool around with the database that we’re all really sick of… AdventureWorks.

AdventureWorks sells bicycle equipment (bikes and their components, and related clothing and accessories), and their customers are both stores and individuals. Here’s our scenario. The marketing department of AdventureWorks really wants to push products from the Clothing and Accessories categories. When an individual customer visits the AdventureWorks website and looks at a particular Product, we should make suggestions of clothing and/or accessory items to them based on the product they’re looking at. In other words, customers who bought this item also bought… these wonderful clothing and accessory items! We will show 3 suggestions.

Our ultimate goal is to come up with an in-line table function that will accept a ProductID and a CustomerID and will return 3 suggested ProductID’s.

So let’s approach this one step at a time, building as we go.

I’m just going to pull an individual (i.e. non-store) customer out of of the database at random to play with… and the winner is… Customer 12203, Zachary Hughes. Congrats, Zack! And I’m going to pull a product at random… and that is Product 777, Mountain-100 Black, 44.

So Zack is looking at that Black Mountain bike. Which three clothing and/or accessory items will we beat him over the head with suggest to him?

First of all, let’s find all the other customers (i.e. not Zack himself) who have bought Product 777. That involves JOINing the SalesOrderDetail (where the ProductID lives) to the SalesOrderHeader (where we find the CustomerID):

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select OrigProdID=d1.ProductID
,OrigSalesOrder=h1.SalesOrderID
,OrigCustomer=h1.CustomerID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
/*
OrigProdID OrigSalesOrder OrigCustomer
----------- -------------- ------------
777 43853 11
777 45056 17
777 45796 17
777 43843 18
777 45266 18
777 46026 18
777 44126 20
777 44792 20
777 45570 20
And so on... (242 rows total)
*/
However, remember these other customers must be individuals as opposed to stores, so we must JOIN in the Customer table in order to filter only those with CustomerType of ‘I’:

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select OrigProdID=d1.ProductID
,OrigSalesOrder=h1.SalesOrderID
,OrigCustomer=h1.CustomerID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
/*
OrigProdID OrigSalesOrder OrigCustomer
----------- -------------- ------------
777 43767 11001
777 43837 11009
777 43757 11017
777 43732 11025
777 44029 11050
777 44058 11056
777 44657 11120
777 45631 11237
777 45765 11245
777 44016 11247
And so on... (60 rows total)
*/
Now that we have all the individuals who buy the product, let’s pull in all the orders that those individuals have ever made. That means JOINing in the SalesOrderHeader table once again, linking it ON CustomerID:

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select OrigProdID=d1.ProductID
,OrigSalesOrder=h1.SalesOrderID
,OrigCustomer=h1.CustomerID
,OtherSalesOrder=h2.SalesOrderID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
/*
OrigProdID OrigSalesOrder OrigCustomer OtherSalesOrder
----------- -------------- ------------ ---------------
777 43767 11001 43767
777 43767 11001 51493
777 43767 11001 72773
777 43837 11009 43837
777 43837 11009 51562
777 43837 11009 57736
777 43757 11017 43757
777 43757 11017 51256
777 43757 11017 68396
And so on... (148 rows total)
*/
And now that we have all those orders, we need to find out what products were in those orders. So JOIN the SalesOrderDetail table again:

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select OrigProdID=d1.ProductID
,OrigSalesOrder=h1.SalesOrderID
,OrigCustomer=h1.CustomerID
,OtherSalesOrder=h2.SalesOrderID
,SuggProdID=d2.ProductID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
/*
OrigProdID OrigSalesOrder OrigCustomer OtherSalesOrder SuggProdID
----------- -------------- ------------ --------------- -----------
777 43767 11001 43767 777
777 43767 11001 51493 779
777 43767 11001 51493 878
777 43767 11001 51493 870
777 43767 11001 51493 871
777 43767 11001 51493 884
777 43767 11001 51493 712
777 43767 11001 72773 997
777 43767 11001 72773 870
777 43767 11001 72773 872
777 43767 11001 72773 708
And so on... (311 rows total)
*/
But remember, we are only interested in those other products being part of the Clothing or Accessories categories. Products in the database are associated with a SubCategory (like Helmets, Locks, Caps, Gloves, etc) and those are, in turn, associated with a main Category (Bikes, Components, Clothing, Accessories), so we must JOIN in the Product and ProductSubCategory tables in order to filter the ProductCategoryID of 3 (Clothing) or 4 (Accessories). Since we’re pulling in the Product table, let’s spit out the name of the product also:

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select OrigProdID=d1.ProductID
,OrigSalesOrder=h1.SalesOrderID
,OrigCustomer=h1.CustomerID
,OtherSalesOrder=h2.SalesOrderID
,SuggProdID=d2.ProductID
,SuggProdName=p.Name
from
Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and sc.ProductCategoryID in (3,4)
/*
OrigProdID OrigSalesOrder OrigCustomer OtherSalesOrder SuggProdID SuggProdName
----------- -------------- ------------ --------------- ----------- -------------------------------
777 43767 11001 51493 878 Fender Set - Mountain
777 43767 11001 51493 870 Water Bottle - 30 oz.
777 43767 11001 51493 871 Mountain Bottle Cage
777 43767 11001 51493 884 Short-Sleeve Classic Jersey, XL
777 43767 11001 51493 712 AWC Logo Cap
777 43767 11001 72773 870 Water Bottle - 30 oz.
777 43767 11001 72773 872 Road Bottle Cage
777 43767 11001 72773 708 Sport-100 Helmet, Black
And so on... (163 rows total)
*/
Great, that looks good. Now all we need to do is find out which products were ordered the most. That means we will GROUP BY the SuggProdID and sort the list in descending order of the number of orders in which it appeared (which is just the COUNT(DISTINCT SalesOrderID)). Note that I’m also outputting the Product Name, just so we can see it and satisfy our curiousity. In our final in-line function we create, we will just return a list of ProductID’s only.

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select SuggProdID=d2.ProductID
,SuggProdName=p.Name
,NumOrds=count(distinct h2.SalesOrderID)
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and sc.ProductCategoryID in (3,4)
group by d2.ProductID
,p.Name
order
by count(distinct h2.SalesOrderID) desc
/*
SuggProdID SuggProdName NumOrds
----------- --------------------------- -----------
870 Water Bottle - 30 oz. 23
872 Road Bottle Cage 15
711 Sport-100 Helmet, Blue 13
712 AWC Logo Cap 10
871 Mountain Bottle Cage 10
934 Touring Tire 10
878 Fender Set - Mountain 9
873 Patch Kit/8 Patches 8
708 Sport-100 Helmet, Black 8
923 Touring Tire Tube 8
930 HL Mountain Tire 6
877 Bike Wash - Dissolver 5
707 Sport-100 Helmet, Red 5
880 Hydration Pack - 70 oz. 5
921 Mountain Tire Tube 4
882 Short-Sleeve Classic Jersey 4
883 Short-Sleeve Classic Jersey 3
884 Short-Sleeve Classic Jersey 3
860 Half-Finger Gloves, L 3
881 Short-Sleeve Classic Jersey 2
874 Racing Socks, M 1
865 Classic Vest, M 1
866 Classic Vest, L 1
869 Women's Mountain Shorts, L 1
715 Long-Sleeve Logo Jersey, L 1
716 Long-Sleeve Logo Jersey, XL 1
858 Half-Finger Gloves, S 1
859 Half-Finger Gloves, M 1
929 ML Mountain Tire 1
*/
That’s great, except for one thing. It’s only meaningful to suggest another product if it’s been on at least a few orders. For example, it may not be helpful to suggest an item that’s been ordered with Product 777 only 1 time in the past. So we will introduce a HAVING COUNT(DISTINCT SalesOrderID)>5 to make sure we only pay attention to those items that have been on more than 5 orders.

Now we’re ready to make our recommendations of 3 other items, so we’ll just show the TOP (3) items:

declare @ProductID int, @CustomerID int
set
@ProductID=777
set @CustomerID=12203
select top (3)
SuggProdID
=d2.ProductID
,SuggProdName=p.Name
from
Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and sc.ProductCategoryID in (3,4)
group by d2.ProductID
,p.Name
having
count(distinct h2.SalesOrderID)>5
order by count(distinct h2.SalesOrderID) desc
/*
SuggProdID SuggProdName
----------- ----------------------
870 Water Bottle - 30 oz.
872 Road Bottle Cage
711 Sport-100 Helmet, Blue
*/
Looks great. Let’s say Zack is looking at ProductID 711 instead of 777. What comes out then?

/*
SuggProdID SuggProdName
----------- ----------------------
711 Sport-100 Helmet, Blue
921 Mountain Tire Tube
870 Water Bottle - 30 oz.
*/
Well that’s strange. People who bought ProductID 711 tended most to buy ProductID 711 again? Oh, wait a minute. ProductID 711 is a Helmet, and therefore already part of the Accessory category. We’ll have to introduce a predicate to the WHERE clause to eliminate products from the suggestion list that match the product we’re initially inquiring on. In fact, maybe we should widen that restriction. For example, if a person is looking at a Blue Helmet, we don’t want to suggest a Red one. So let’s just make sure the SubCategoryID of the suggested products is different from the original product's SubCategoryID. So we’ll add that to the WHERE clause:

declare @ProductID int, @CustomerID int
set
@ProductID=711
set @CustomerID=12203
select top (3)
SuggProdID
=d2.ProductID
,SuggProdName=p.Name
from
Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and p.ProductSubCategoryID<>(select ProductSubCategoryID
from Production.Product
where ProductID=@ProductID)
and sc.ProductCategoryID in (3,4)
group by d2.ProductID
,p.Name
having
count(distinct h2.SalesOrderID)>5
order by count(distinct h2.SalesOrderID) desc
/*
SuggProdID SuggProdName
----------- ---------------------
921 Mountain Tire Tube
870 Water Bottle - 30 oz.
922 Road Tire Tube
*/
There, that’s better. I think we’ve got something here. So if Zack is looking at ProductID 711, we can suggest ProductID’s 921, 870, and 922. And if he's is looking at ProductID 777, we can suggest 870, 872, and 711.

But if Zack is looking at ProductID 777, we don’t want to just blindly recommend 870, 872, and 711. Our buddy Zack may have already bought one of those in the past, and we want to promote something he’s never bought before. So we’ll add a predicate to the WHERE clause so that we don’t recommend any products to the customer that he has bought in the past.

Let’s go ahead and create the in-line function out of what we have… we’re only going to return ProductID’s and not the names:

if object_id('ufn_GetSuggProducts') is not null drop function dbo.ufn_GetSuggProducts
go
create function dbo.ufn_GetSuggProducts
(
@ProductID int
,@CustomerID int
)
returns table
as
return
select
top (3) SuggProdID=d2.ProductID
from Sales.SalesOrderDetail d1
join Sales.SalesOrderHeader h1 on d1.SalesOrderID=h1.SalesOrderID
join Sales.Customer c on h1.CustomerID=c.CustomerID
join Sales.SalesOrderHeader h2 on h1.CustomerID=h2.CustomerID
join Sales.SalesOrderDetail d2 on h2.SalesOrderID=d2.SalesOrderID
join Production.Product p on d2.ProductID=p.ProductID
join Production.ProductSubCategory sc on p.ProductSubCategoryID=sc.ProductSubCategoryID
where d1.ProductID=@ProductID
and h1.CustomerID<>@CustomerID
and c.CustomerType='I'
and p.ProductSubCategoryID<>(select ProductSubCategoryID
from Production.Product
where ProductID=@ProductID)
and sc.ProductCategoryID in (3,4)
and d2.ProductID not in (select ProductID
from Sales.SalesOrderDetail sod
join Sales.SalesOrderHeader soh
on sod.SalesOrderID=soh.SalesOrderID
where CustomerID=@CustomerID)
group by d2.ProductID
having count(distinct h2.SalesOrderID)>5
order by count(distinct h2.SalesOrderID) desc
And let’s give it a try:

select * from dbo.ufn_GetSuggProducts(777,12203)
/*
SuggProdID
-----------
872
934
878
*/
Finally!

In my next blog entry, we’ll use this function as a basis to do some further badgering marketing to the AdventureWorks victims customers.