## Summary

Our online marketing team conducted an A/B test and deep statistical analyses that focused on six distinct hypotheses that included not only lead gen differences, but also differences in: average transaction size, total revenue, and closure rate. This case study illustrates how simply focusing on a single conversion metric is insufficient.

## Part 1: Improving PPC Performance

In the interest of continuous improvement, we had our digital strategy team propose a new landing page for Plumbing Masters’ online marketing. The new landing page went through our standard development process consisting of research, wireframing, mockups and development. Both the new page (Page B) and the existing page (Page A) used responsive design to ensure that tablet and smartphone users had experiences tailored for those devices.

While many landing page tests focus solely on conversion as a percentage of visitors, we instead focused on a variety of business outcomes in order to gain a comprehensive picture of how the new landing page was impacting the client’s business. The items evaluated consisted of the following:

- Number of Phone Calls
- Number of Appointments Set
- Closure Rate
- Number of Contact Forms
- Total Leads
- Average Transaction Size

An explanation of these business outcomes and their place in the client’s business model is as follows. Potential customers perform a search for a relevant keyword. They then see the client’s paid search ad. Some of them then click through to the landing page. Once on the landing page, they do one of three things. One, they do nothing at all. Two, they call the client via the tracking phone number that is on the landing page. Three, they fill in a contact form on the landing page. The client then sends a plumbing technician out to the home or business of each customer that has an appointment. Those appointments then turn into a sale or they don’t. Definitions of each of the six items evaluated are as follows:

- Item A is the Number of Phone Calls that the client’s call center receives.
- Item B is the Number of Appointments Set from the phone calls in Item A.
- Item C is the Closure Rate for the phone calls in Item A—i.e. the percentage of the time that a phone call turns into an appointment.
- Item D is the Number of Contact Forms that the client receives
- Item E is the Total Leads, defined as the total number of all of the phone calls in Item A (i.e. not just the ones that turned into appointments) and the number of contact forms in Item D.
- Item F is the Average Transaction Size—i.e. the average of all of the sale dollar amounts for every appointment that the client’s technicians go out on, including the appointments that result in zero dollars.

Chi-square tests were performed for Items A through E and a t-test was performed for Item F, with the hypotheses being that the new landing page, i.e. Page B, would perform better than the existing page, i.e. Page A. Because of the directionality of our hypotheses, a one-sided statistical test was appropriate, and our hypotheses are correctly phrased as follows:

*H*_{A}: Page B will generate a greater proportion of Phone Calls as a percentage of visitors to the page than Page A*H*_{B}: Page B will generate a greater proportion of Appointments Set as a percentage of visitors to the page than Page A*H*_{C}: Page B will have a higher Closure Rate for the phone calls that Page B generates than Page A will have for the phone calls that Page A generates—i.e. Page B will be better than Page A at generating phone calls that wind up turning into appointments*H*_{D}: Page B will generate a greater proportion of Contact Forms as a percentage of visitors to the page than Page A*H*_{E}: Page B will generate a greater proportion of Total Leads as a percentage of visitors to the page than Page A*H*_{F}: Page B will generate a higher Average Transaction Size than Page A

A detailed statistical analysis can be found farther down in this write-up, however a summary of the results is as follows.

- Page A converted 12% better than Page B with a confidence level of 94% that Page A would improve on the number of phone calls made over Page B. This result was not statistically significant since the hypothesis was phrased in favor of B and this was a one-sided test.
- Page B converted 17% better than Page A with a confidence level of 92% that Page B would improve on the number of appointments set over Page A. This was statistically significant (
*p*< .10). - Page B converted 30% better than Page A with a confidence level of 100% that Page B would improve on the number of proportion of appointments set from phone calls over Page A. This was statistically significant (
*p*< .10). - Page B converted 7% better than Page A with a confidence level of 61% that Page B would improve on the number of contact forms over Page A. This was not statistically significant.
- Page A converted 9% better than Page B with a confidence level of 92% that Page A would improve on the number of leads over Page B. This result was not statistically significant since the hypothesis was phrased in favor of B and this was a one-sided test.
- Although Page B had a higher mean transaction amount than Page A, i.e. $609.72 for Page B vs. $583.59 for Page A, the difference was not statistically significant.

Since the primary goal of the new landing page is to maximize ROI, the landing page test can be declared a success. Since both Page A and Page B had the same amount of traffic and incurred the same amount of cost to generate that traffic, the fact that Page B resulted in a greater proportion of appointments set (with statistical significance) means that Page B was the appointment setting winner. Page B also used the time of phone answering personnel more efficiently since a greater percentage of the incoming calls were converted to appointments when the calls came from Page B than when the calls came from Page A. And while Page B did have a higher average transaction size, this result was not statistically significant, but even if Page A and Page B have the same (from a statistical standpoint) average transaction size, Page B is still also the winner from a revenue perspective.

Note that Appointments Set only includes the appointments that were able to be matched to calls coming through the call tracking system by doing a match on the caller ID in the call tracking system and the phone number that the caller verbally provided to the customer service rep that was setting up the appointment. This causes a loss of data, but the loss should exhibit no exclusion bias since this occurred randomly in the case of both Page A and Page B as both pages were rotated in real-time.

## Part 2: STATISTICAL ANALYSIS METHODS AND ASSUMPTIONS FOR ANALYSIS

Statistical analyses included a series of chi-square tests of independence and an independent samples t-test.

SPSS v.22 was used for the analyses. A 90% level of significance (*p* < .10) was set for all inferential tests. Each test was one-sided, with the research hypotheses favoring Page B over Page A. Results are presented according to each analysis performed.

A series of chi-square tests of independence were used to test items a, b, c, d, and e. Assumptions for the chi square test of independence are that the records are independent (each record is counted in only one cell), each cell in the table has at least one observation, and at least 20% of the cells contain 5 or more observations. These assumptions were met.

An independent samples t-test was performed to test item f. Assumptions for the independent samples t-test are the absence of missing data, the absence of outliers, adequate sample size, equality of variances, and univariate normality.

There was no missing data in the dataset. Therefore, this assumption was met.

Outliers in a dataset have the potential to distort results of an inferential analysis. A visual inspection of a boxplot for the dependent variable of Ticket Amount was performed to check for outliers. A total of 12 outliers were found in the upper range of the Ticket Amount variable. The values of each outlier were checked for accuracy, and all 12 outliers appeared to be reasonable dollar amounts. Since the outliers were valid data (not a data entry error, or other anomaly) they were retained for analysis.

Normality for the dependent variable of Ticket Amount was investigated with SPSS Explore. The Kolmogorov-Smirnov Test (K-S) for normality indicated a non-normal distribution (*p* < .01). However, the K-S Test is sensitive to larger sample sizes, with significant findings returned when sample sizes are larger (*n* > 50; Pallant, 2007). A visual check of histograms and Normal Q-Q plots for the Ticket Amount variable indicated distributions close to normal, with a slight skew to the right. The right skew was a result of the outliers in the upper tail of the distribution. The assumptions of outliers and normality can be relaxed when the assumption of equal variances are met (Tabachnick & Fidell, 2007). The variances of the Ticket Amount variable were equal between the web page types (see the next paragraph). Therefore, no transformations or other corrective actions were taken to adjust the dollar amounts of the Ticket Amount variable, and the raw data for the Ticket Amount variable was used for the t-test analysis.

Levene’s Test of Equality of Variances was performed to investigate violations of the equal variance assumption for the independent samples t-test The assumption of equal variances was not violated for the analysis involving the independent variable on the Ticket Amount variable (*p *= .762). Therefore, the assumption of equality of variances was met.

## Part 3: DESCRIPTIVE FINDINGS

Table 1 presents the frequency counts and percentages of the dependent variables of study including: phone calls, appointments set, contact forms, and total leads generated according to the dataset entitled “Appointments and Leads”. Table 2 presents the measures of central tendency for the dependent variable of Ticket Amount according to the different web pages. This information was taken from the dataset entitled “Transactions”.

*Note. M* = Mean; *SD* = Standard Deviation; *Mdn* = Median

## Part 4: INFERENTIAL ANALYSIS FINDINGS

### Chi-Square for Item A – Number of Phone Calls

An A/B test was performed via a chi-square test of independence with the independent variable of Web Page, with two categories of Page A vs. Page B, and the dependent variable of Phone Calls, with two categories of yes (phone call made) vs. no (no phone call made). Table 3 presents the cross-tabulation table for the analysis.

Results were not statistically significant [χ^{2}(1) = 2.32, *p* = .064]. Although the p-value was lower than .10, the results favored Group A over Group B for the proportions of phone calls made.

Although the results were not statistically significant, the results of the A/B test indicated that Page A converted 12% better than Page B with a confidence level of 94% that Page A would improve on the number of phone calls made over Page B.

Adj. std. residual = Adjusted Standardized Residual.

### Chi-Square for Item B – Number of Appointments Set

A chi-square test of independence was performed with the independent variable of Web Page, with two categories of Page A vs. Page B, and the dependent variable of Appointments Set, with two categories of yes vs. no. Table 4 presents the cross-tabulation table for the analysis.

Results were statistically significant [χ^{2}(1) = 2.02, *p* = .078], indicating that Page B had a significantly greater proportion of appointments set than Page A.

The results of the A/B test indicated that Page B converted 17% better than Page A with a confidence level of 92% that Page B would improve on the number of appointments set over Page A.

Adj. std. residual = Adjusted Standardized Residual.

### Chi-Square for Item C – Closure Rate

A chi-square test of independence was performed with the independent variable of Web Page, with two categories of Page A vs. Page B, and the dependent variable of Closure Rate, with two categories of yes (appointment made through phone call) vs. no (no appointment made through phone call). Table 5 presents the cross-tabulation table for the analysis.

Results were statistically significant [χ^{2}(1) = 10.39, *p* < .0005], indicating statistically significant differences in the proportions of closure rates for the two web pages, with a greater closure rate for Page B over Page A.

The results of the A/B test indicated that Page B converted 30% better than Page A with a confidence level of 100% that Page B would improve on the number of appointments set over Page A.

Adj. std. residual = Adjusted Standardized Residual.

### Chi-Square for Item D – Number of Contact Forms

A chi-square test of independence was performed with the independent variable of Web Page, with two categories of Page A vs. Page B, and the dependent variable of Contact Forms, with two categories of yes vs. no. Table 6 presents the cross-tabulation table for the analysis.

Results were not statistically significant [χ^{2}(1) = 0.08, *p* = .392], indicating no statistically significant differences in the proportions of contact forms for the two web pages.

Although the results were not statistically significant, the results of the A/B test indicated that Page B converted 7% better than Page A with a confidence level of 61% that Page B would improve on the number of contact forms over Page A.

Adj. std. residual = Adjusted Standardized Residual.

### Chi-Square for Item E – Total Leads

A chi-square test of independence was performed with the independent variable of Web Page, with two categories of Page A vs. Page B, and the dependent variable of Total Leads, with two categories of yes (records with phone calls made + records with contact forms) vs. no (records with no phone calls made + records without contact forms). Table 7 presents the cross-tabulation table for the analysis.

Results were not statistically significant [χ^{2}(1) = 1.91, *p* = .084]. Although the p-value was lower than .10, Page A generated a significantly larger proportion of leads than Page B.

The results of the A/B test indicated that Page A converted 9% better than Page B with a confidence level of 92% that Page A would improve on the number of contact forms over Page B.

Adj. std. residual = Adjusted Standardized Residual.

### Independent Samples T-Test for Item F – Average Transaction Size

An independent samples t-test was performed to test item f. The t-test compared the dependent variable of Ticket Amount (expressed in dollars) between the independent variables of Page A (*M* = 583.59, *SD* = 1219.60) and Page B (*M* = 609.72, *SD* = 1725.35). Although Page B had a higher mean transaction amount than Page A, a significant mean difference was not found; t(166) = -0.11, *p* = .457.

## Part 5: SUMMARY OF FINDINGS

It is important to remember to identify the most important metrics in a landing page test as we have done here, and to not simply focus on lead volume. In this case, we have evaluated and proven that the new page, Page B is better for appointment setting in total, more efficient at appointment setting, and a better revenue generator overall.