Equations need to be viewed in Intenet Explorer. Symbol fonts are not available in Firefox.

Chapter 5
Parameter Estimation

5.1 Calibrating Attraction Models

In Chapter 3 we presented market-share attraction models in detail. As we tried to describe realistically the market and competitive structures, more and more complex models had to be introduced - ending at a cross-effects model which has a unique role for each piece of information (e.g., each price or each feature) on the brand to which it refers as well as on every other competitor. From the practical point of view, however, these complex models are not useful unless it is possible for one to calibrate them from the actual market performance of brands. Calibration establishes the value or importance of each of these roles in determining the market performance of each brand. In this chapter we will review the techniques to estimate the parameters of attraction models. We will begin with the most basic models, i.e., the simple-effects form of MCI and MNL models, and then proceed to more complex models such as differential-effects and cross-effects models. To remind the reader, the general specification of simple-effects attraction models is given below.

A_i = exp(a_i + e_i)

K
Õ
k = 1

f_k(X_ki)^bk

(5.1)

s_i = A_i /

m
å
j = 1

A_j

where:

s_i = the market share of brand i

A_i = the attraction of brand i

m = the number of brands

X_ki = the value of the k^th explanatory variable (X_k) for brand i (e.g., prices, product attributes, expenditures for advertising, distribution, sales force)

K = the number of explanatory variables

f_k = a positive, monotone transformation of X_k

e_i = the specification-error term

a_i, b_k (i = 1, 2, ¼,m; k = 1, 2, ¼, K) = parameters to be estimated.

We may choose either MCI or MNL models, depending on whether f_k is an identity transformation or an exponential transformation. We will often use the MNL model below in order to simplify our presentation, but the corresponding derivations for the MCI model would be straightforward. Before presenting the use of regression analysis, we will first discuss other estimation techniques applicable to model (5.1).

5.1.1 Maximum-Likelihood Estimation

The maximum-likelihood approach to parameter estimation assumes that the data are obtained from a random sample (sample size n) of individuals who are asked to choose one brand from a set of brands (i.e., choice set ).See Haines, George, H., Jr., Leonard S. Simon & Marcus Alexis [1972], ``Maximum Likelihood Estimation of Central-City Food Trading Areas,'' Journal of Marketing Research , IX (May), 154-59. Also see McFadden [1974]. The resultant data consist of the number of individuals who selected object i , n_i (i = 1, 2, ¼, m). This describes a typical multinomial choice process. In order for us to use this type of data, we must modify the definition of the model (5.1) slightly. We assume that the probability, p_i , rather than the market share s_i , that an individual chooses brand i , is specified asSee sections 2.8 and 4.1 for discussions of when market shares and choice probabilities are interchangeable.

p_i = A_i /

m
å
j = 1

A_j .

Clearly p_i is a function of the parameters of the model, that is, the a 's and b 's. We may write the likelihood for a set of observed choices n₁, n₂,¼, n_m as

L(a₁, a₂, ¼, a_m; b₁, b₂,¼, b_K) =

m
Õ
i = 1

p_iⁿi

(5.2)

and the logarithm of the likelihood function as

logL(a₁, a₂, ¼, a_m; b₁, b₂,¼, b_K) =

m
å
i = 1

n_i logp_i .

By maximizing L or logL with respect to the parameters of the model, we obtain the maximum-likelihood estimates of them. The maximum-likelihood technique may be extended to the cases where observations are taken at more than one choice situation (multiple time periods, locations, customer groups, etc.) provided that an independent sample of individuals is drawn at each choice situation. For example, if a series of independent samples is drawn over time, the log-likelihood function may be written as

logL(a₁, a₂, ¼, a_m; b₁, b₂,¼, b_K) =

T
å
t = 1

m
å
i = 1

n_it logp_it

where n_it and p_it are the number of individuals who chose brand i in period t and the probability that an individual chooses brand i in period t , respectively, and T is the number of periods under observation.

The maximum-likelihood procedure is a useful technique for parameter estimation in that the properties of estimated parameters are well known,See Haines, et al. [1972].ut we choose not to use it in this book for several reasons. First, since the likelihood and log-likelihood functions are nonlinear in parameters a 's and b 's, the maximum-likelihood procedure requires a nonlinear mathematical-programming algorithm to obtain parameter estimates. Besides being cumbersome to use, such an algorithm does not ensure that the global maximum for the likelihood function is always found. Second, we will be using POS data primarily in calibrating the model. Since POS data generated at a store include multiple purchases in a period made by the same customers, the observed n_i 's may not follow the assumptions of a multinomial distribution which underlie the likelihood function. Third, we will be in most cases using observed market shares, that is, the proportions of purchases of brand i , p_i , based on an unknown but large total number of purchases.Neither multiple purchases in a single shopping trip, nor purchases of a brand on each of multiple shopping trips within a single reporting period (e.g., a week), fit well with the multinomial-sampling assumptions. Yet both such occurrences can be common in POS data. When analyzing POS data at the store-week level the market shares are not subject to the sampling variation with which maximum-likelihood procedures deal so well. Only the specification error requires special treatment. Section 5.4 presents generalized least-squares (GLS) procedures to cope with the issues.The regression techniques developed in the next section are more easily adaptable to this type of data than the maximum-likelihood procedure.

5.1.2 Log-Linear Estimation

We will be presenting estimation procedures based on regression analysis in the next section, but the fact that logit models could be estimated by first applying a log-linear transformation and then applying a regression procedure has been known for a long time. We will review some of these procedures before we turn to the approach which we believe is the most convenient.

Over thirty-five years ago BerksonBerkson, Joseph [1953], ``A Statistically Precise and Relatively Simple Method of Estimating the Bioassay with Quantal Response, Based on the Logistic Function,'' Journal of the American Statistical Association , 48 (September), 565-99.showed that a logistic model of binary choice becomes linear in parameters by the so-called logit transformation . Suppose that each individual in a sample (of size n) independently chooses object 1 with probability p₁ , given by

p₁ =

1 + b₀ exp(-

K
å
k = 1

b_k X_k1)

where:

p₁ = the probability that object 1 is chosen in a binary choice

X_k1 = the k^th characteristic of object 1

b₀, b₁, ¼, b_K = the parameters to be estimated.

If the logit transformation is applied to the above model, we have

log

æ
ç
è

p₁

1 - p₁

ö
÷
ø

= - logb₀ +

K
å
k = 1

b_k X_k1

(5.3)

That equation (5.3) is linear in parameters logb₀ and b_k (k = 1, 2, ¼, K) suggests the use of regression analysis. But, since the probability p₁ is unobservable, it must be replaced in the left-hand side of (5.3) by p₁ which is the proportion of individuals in the sample who selected object 1. The final estimating equation is in the following form.

log

æ
ç
è

p_1t

1 - p_1t

ö
÷
ø

= - logb₀+

K
å
k = 1

b_k X_k1t + e_t .

(5.4)

The subscript t indicates the t^th subgroup from which the p₁ 's are calculated. The error term e_t is the difference between logit transforms of p₁ and p₁ , and known to be a function of p₁ and the sample size per subgroup from which p₁ is calculated.To examine the property of the error term, first expand the left-hand side of (5.4) by the Taylor expansion, keep the first two terms, and apply the mean-value theorem to obtain

p₁

p₁

p₁^*

(p₁-p₁)

where p₁^* is a value between p₁ and p₁ . The error term is clearly a function of p₁ and therefore heteroscedastic (i.e. unequal variance). If we assume a simple binomial process for each individual selecting object 1 and a reasonably large sample size (n > 100 , say), the variance of e is approximately equal to 1/np₁(1 - p₁) . The use of a generalized least-squares procedure is called for.

Berkson's method has been extended to the estimation of parameters of multinomial logit (MNL) models by Theil.Theil, Henri [1969], ``A Multinomial Extension of the Linear Logit Model,'' International Economics Review , 10 (October), 251-59.Assume a multinomial choice process in which each individual independently selects object i with probability p_i from a set of m objects in a single trial, and let p_i be specified by an MNL model

A_i = exp(a+

K
å
k = 1

b_k X_ki + e_i)

p_i = A_i /

m
å
j = 1

A_j

This model differs from (5.1) in that a single parameter a is specified instead of m parameters, a₁, a₂, ¼, a_m . Theil noted that

log

æ
ç
è

p_i

p₁

ö
÷
ø

= log

æ
ç
è

A_i

A₁

ö
÷
ø

K
å
k = 1

b_k(X_ki -X_k1) + (e_i -e₁)

where 1 is an arbitrarily chosen object, and suggested the following estimation equation which is linear in parameters b₁, b₂,¼, b_K .

log

æ
ç
è

p_it

p_1t

ö
÷
ø

K
å
k = 1

b_k (X_kit - X_k1t) + e_it^*

(5.5)

where p_i is the proportion of individuals who chose object i in sample, and e_it^* is the combined error term. Subscript t indicates the t^th subsample. It is obvious that equation (5.4) is a special case of (5.5) for which the number of objects in the choice set, m , equals 2. The total degrees of freedom for this estimation equation is (m - 1)T where T is the number of subsamples. It is known that the variances of e_it^* 's are unequal, and McFadden [1974] studied a method for correcting for this problem. The estimation technique which we will propose in the next section is a variant of Theil's method. It is true that both Theil's method and our method yield identical estimates of parameters and their properties are also identical, but we believe that our method has an advantage in its ease of interpretation.

5.2 Log-Linear Regression Techniques

As we have noted in Chapter 2, model (5.1) becomes linear in its parameters by applying the log-centering transformation. Take the MNL model, for example. First, take the logarithm of both sides of (5.1).

logs_i =

a_i +

K
å
k = 1

b_k X_ki +e_i

- log[

m
å
j = 1

exp(a_j +

K
å
k = 1

b_k X_kj +e_j) ] .

If we sum the above equation over i (i = 1, 2, ¼, m) and divide by m , we have

log

~
s

K
å
k = 1

b_k

_
X

- log[

m
å
j = 1

exp(a_j +

K
å
k = 1

b_k X_kj +e_j) ]

where [s\tilde] is the geometric mean of s_i and [`(a)], [`X]_k and [`(e)] are the arithmetic means of a_i , X_ki and e_i , respectively, over i . Subtracting the above equation from the preceding one, we obtain the following form which is linear in its parameters.

log

æ
ç
ç
ç
ç
è

s_i

~
s

ö
÷
÷
÷
÷
ø

= (a_i -

) +

K
å
k = 1

b_k (X_ki -

_
X

) + (e_i -

) .

Similarly, the application of the log-centering transformation to the MCI model results in

log

æ
ç
ç
ç
ç
è

s_i

~
s

ö
÷
÷
÷
÷
ø

= (a_i -

) +

K
å
k = 1

b_k log(X_ki/

~
X

) + (e_i -

)

where [X\tilde]_k is the geometric mean of X_ki . Since those two equations are linear in parameters a_i^* = (a_i - [`(a)]) (i = 1, 2, ¼, m) and b_k (k = 1, 2, ¼, K ), one may estimate those parameters by regression analysis.

Suppose that we obtain market-share data for T choice situations . In the following, we often let subscript t indicate the observations in period t , but this is simply an example. Needless to say, the data do not have to be limited to time-series data, and choice situations may be stores, areas, customer groups, or combinations such as store-weeks. Applying the log-centering transformation to the market shares and the marketing variables for each situation t creates the following variables:

s_it^* = log(s_it/ [s\tilde]_t) (i = 1, 2, ¼, m)

[s\tilde]_t = the geometric mean of s_it

X_kit^* = log(X_kit / [X\tilde]_kt) ( i = 1, 2, ¼,m; k = 1, 2, ¼, K)

[X\tilde]_kt = the geometric mean of X_kit .

Using the above notation, the regression models actually used to estimate the parameters are specified as follows.

MNL Model:

s_it^* = a₁ +

m
å
j = 2

a_j^¢d_j +

K
å
k = 1

b_k (X_kit -

_
X

) + e_it^*

(5.6)

MCI Model:

s_it^* = a₁ +

m
å
j = 2

a_j^¢ d_j +

K
å
k = 1

b_k X_kit^* + e_it^*

(5.7)

where e_it^* = (e_it - [`(e)]_t) and [`(e)]_t is the arithmetic mean of e_it over i in period t . Variable d_j is a dummy (binary-valued) variable which takes value of 1 if j = i and 0 otherwise. Note that estimated values of a_i^¢ (i = 2, 3, ¼, m) from (5.6 - 5.7) are not the estimates of original parameters a_i , but the estimates of difference (a_i - a₁) where brand 1 is an arbitrarily chosen brand. Thus we have shown that the parameters of attraction model (5.1) are estimable by simple log-linear regression models (5.6 - 5.7). However, as was surmised from the discussion of Berkson's and Theil's methods, the error term e_it^* in those regression models may not have an equal variance for all i and t . We will turn to this problem in a later section.

In earlier workNakanishi, Masao & Lee G. Cooper [1982], ``Simplified Estimation Procedures for MCI Models,'' Marketing Science , 1, 3 (Summer), 314-22.we showed that the regression models (5.6 - 5.7) are in turn equivalent to the following regression models.

MNL Model:

logs_it = a₁ +

m
å
j = 2

a_j^¢ d_j +

T
å
u = 2

g_u D_u +

K
å
k = 1

b_k X_kit + e_it

(5.8)

MCI Model:

logs_it = a₁ +

m
å
j = 2

a_j^¢ d_j +

T
å
u = 2

g_u D_u +

K
å
k = 1

b_k logX_kit+ e_it

(5.9)

Variable D_u is another dummy variable which takes value of 1 if u = t and 0 otherwise. The corresponding models (5.6 - 5.7) and (5.8 - 5.9) yield an identical set of estimates of a_i^¢'s and b_k 's , and in this sense they are redundant. But one of the advantages of (5.8 - 5.9) is that it is not necessary to apply the log-centering transformation to market shares and marketing variables before regression analysis can be performed, and therefore reduces the need for pre-processing of data. If the number of choice situations, T , is reasonably small, it is perhaps easier to use (5.8 - 5.9). If T is so large that the specification of dummy variables D_u (u = 2, 3, ¼, T) becomes cumbersome, then the use of (5.6 - 5.7) is recommended. In addition, the properties of the error term e_it in (5.8 - 5.9) are easier to analyze than those of e_it^* in (5.6 - 5.7).

5.2.1 Organization of Data for Estimation

Leaving theoretical issues aside for a while, let us look at the actual procedures one must follow for parameter estimation. Given a standardized statistical-program package, such as SAS^(R), the first thing one must do is to arrange the data so that the regression analysis program in such a package may handle regression models (5.6 - 5.7) and (5.8 - 5.9).

Suppose that we have market-share data for m brands in T choice situations (periods, areas, customer groups, etc.), and accompanying marketing activities data. Market-share data may be in the form of percentages (or proportions) or in absolute units. If one ignores for the moment the heteroscedasticity (i.e., unequal variances and nonzero covariances) problems associated with the error terms in regression models (5.6 - 5.9), whether the market-share data are in absolute units or in percentages is immaterial, because the log-centering transformation yields identical parameter estimates regardless of whether it is applied to proportions or the actual numbers of units sold.This property of log-centering is called the homogeneity of the 0^th degree. The estimated values of a₁ in (5.8 - 5.9) are the only terms affected by the choice between proportions and actual numbers, but it does not influence the values of market shares estimated by the inverse log-centering transformation. able 5.1 is an example of market-share data generated by a POS system.

These data were actually obtained at a single store in 14 weeks (i.e., T = 14 ). There are five national and two regional brands of margarine ( m = 7 ). Brand 2 is the same as brand 1 and brand 4 is the same as brand 3, but in larger packages. All brands are half-pound (225g) packages except brands 2 and 4 which are one-pound (450g) packages. The market shares do not sum to one presumably due to private-label brands not listed here. Market shares are volume shares computed by first converting the numbers of units sold to weight volumes and then computing the weight-volume share of each brand. Inspection of the table will show that the market is obviously very price-sensitive.

We will now try to estimate the price elasticity of market shares based on attraction model (5.1). Also given are average daily sales volumes of margarine in this store expressed in units of half-pound package equivalents. The first step in estimation is to create a data set which includes dummy variables d_j (j = 2, 3, ¼, m) and D_u (u = 2, 3, ¼, T) so that regression model (5.8 - 5.9) may be used. We chose (5.8 - 5.9) because the number of periods T is reasonably small (= 14). Table 5.2 shows a partial listing of the data set arranged for estimation with the REG procedure in the SAS^(R) statistical package.

Market share and price data are taken from Table 5.1, and the logarithms of shares and prices are added. In addition two sets of dummy variables - week dummies and brand dummies - are put in the data set. The dummy variables (D1-D5) for only the first five weeks are reported to save space. If the reader examines the pattern of two sets of dummy variables, their meaning should be self-explanatory. The dummy

Table 5.1: POS Data Example (Margarine)


								Ave.
	Brands							Daily
Weeks	1	2	3	4	5	6	7	Vol.

1 share	4	51	3	3	0	1	9	83
price	192	139.5	158	146	163	128	148
2 share	2	75	2	1	0	0	5	103
price	192	140	158	170	163	128	148
3 share	3	48	1	1	21	0	13	98
price	192	138.5	158	170	100	138	133
4 share	4	44	24	-	0	0	11	72
price	192	139	139	170	163	148	128
5 share	5	23	10	1	-	26	7	84
price	192	139	141	170	163	128	128
6 share	6	6	3	2	0	36	13	61
price	192	176	158	170	163	128	128
7 share	4	5	5	3	-	12	20	74
price	192	179	163	170	163	128	128
8 share	3	2	2	2	41	8	11	107
price	192	169	185	161	100	134	128
9 share	8	5	3	10	-	21	17	57
price	192	168	188	129.5	163	138	128
10 share	19	3	1	47	-	5	8	77
price	178	179	188	120	163	138	128
11 share	12	2	2	19	0	18	15	65
price	178	179	188	136.5	163	138	128
12 share	6	47	1	5	0	10	9	87
price	180	139.5	188	149	163	141	128
13 share	2	23	1	13	26	6	5	120
price	192	139	188	137	100	138	128
14 share	28	15	10	19	3	3	6	107
price	132	139	144	134	109	143	128
^a Brand 2 is the 1 lb. package of brand 1.
^b Brand 4 is the 1 lb. package of brand 3.
^c Market Share in %.
^d Price per 1/2 pound in Yen.

Table 5.2: Data Set for Estimation


	B	S		P		Week					Brand
W	r	h	Log	r	Log	Dummies					Dummies
e	a	a		i
e	n	r	Share	c	Price	D	D	D	D	D	d	d	d	d	d	d	d
k	d	e		e		1	2	3	4	5	1	2	3	4	5	6	7

1	1	4	1.38629	192	5.25750	1	0	0	0	0	1	0	0	0	0	0	0
1	2	51	3.93183	139	4.93806	1	0	0	0	0	0	1	0	0	0	0	0
1	3	3	1.09861	158	5.06260	1	0	0	0	0	0	0	1	0	0	0	0
1	4	3	1.09861	146	4.98361	1	0	0	0	0	0	0	0	1	0	0	0
1	5	0	.	163	5.09375	1	0	0	0	0	0	0	0	0	1	0	0
1	6	1	0.00000	128	4.85203	1	0	0	0	0	0	0	0	0	0	1	0
1	7	9	2.19722	148	4.99721	1	0	0	0	0	0	0	0	0	0	0	1
2	1	2	0.69315	192	5.25750	0	1	0	0	0	1	0	0	0	0	0	0
2	2	75	4.31749	140	4.94164	0	1	0	0	0	0	1	0	0	0	0	0
2	3	2	0.69315	158	5.06260	0	1	0	0	0	0	0	1	0	0	0	0
2	4	1	0.00000	170	5.13580	0	1	0	0	0	0	0	0	1	0	0	0
2	5	0	.	163	5.09375	0	1	0	0	0	0	0	0	0	1	0	0
2	6	0	.	128	4.85203	0	1	0	0	0	0	0	0	0	0	1	0
2	7	5	1.60944	148	4.99721	0	1	0	0	0	0	0	0	0	0	0	1
3	1	3	1.09861	192	5.25750	0	0	1	0	0	1	0	0	0	0	0	0
3	2	48	3.87120	138	4.93087	0	0	1	0	0	0	1	0	0	0	0	0
3	3	1	0.00000	158	5.06260	0	0	1	0	0	0	0	1	0	0	0	0
3	4	1	0.00000	170	5.13580	0	0	1	0	0	0	0	0	1	0	0	0
3	5	21	3.04452	100	4.60517	0	0	1	0	0	0	0	0	0	1	0	0
3	6	0	.	138	4.92725	0	0	1	0	0	0	0	0	0	0	1	0
3	7	13	2.56495	133	4.89035	0	0	1	0	0	0	0	0	0	0	0	1
4	1	4	1.38629	192	5.25750	0	0	0	1	0	1	0	0	0	0	0	0
4	2	44	3.78419	139	4.93447	0	0	0	1	0	0	1	0	0	0	0	0
4	3	24	3.17805	139	4.93447	0	0	0	1	0	0	0	1	0	0	0	0
4	4	.	.	170	5.13580	0	0	0	1	0	0	0	0	1	0	0	0
4	5	0	.	163	5.09375	0	0	0	1	0	0	0	0	0	1	0	0
4	6	0	.	148	4.99721	0	0	0	1	0	0	0	0	0	0	1	0
4	7	11	.39790	128	4.85203	0	0	0	1	0	0	0	0	0	0	0	1
5	1	5	.60944	192	5.25750	0	0	0	0	1	1	0	0	0	0	0	0
5	2	23	.13549	139	4.93447	0	0	0	0	1	0	1	0	0	0	0	0
5	3	10	.30259	141	4.94876	0	0	0	0	1	0	0	1	0	0	0	0
5	4	1	0.00000	170	5.13580	0	0	0	0	1	0	0	0	1	0	0	0
5	5	.	.	163	5.09375	0	0	0	0	1	0	0	0	0	1	0	0
5	6	26	3.25810	128	4.85203	0	0	0	0	1	0	0	0	0	0	1	0
5	7	7	1.94591	128	4.85203	0	0	0	0	1	0	0	0	0	0	0	1

variables for weeks graphically reflect that the influence of a particular week is constant over brands. The dummy variables for brands graphically reflect that the baseline level of attraction for each brand is constant over weeks, and thus independent of variations in market conditions.

5.2.2 Reading Regression-Analysis Outputs

Now we are in a position to estimate the parameters of attraction model (5.1), in which the only marketing variable is price. Letting P_it be the price of brand i in week t , there is only one attraction component for the MCI version of (5.1) which may be written as

A_it = exp(a_i + e_it) P_it^bp

which in turn shows that the regression model (5.8) is applicable here.

logs_it = a₁ +

m
å
j = 2

a_j^¢ d_j +

T
å
u = 2

g_u D_u + b_p logP_it+ e_it .

Table 5.3 gives the estimation results from the SAS^(R) REG procedure.

The dependent variable is, of course, the logarithm of market share. The first part of the output gives the analysis of variance results. The most important summary statistic for us is, of course, the R² figure of 0.735 (or the adjusted R² value of 0.65) which suggests that almost 75% of the total variance in the dependent variable (log of share) has been explained by the independent (=exploratory) variables (log of price, in this case) and dummy variables d₂ through d₇ and D₂ through D₁₄ . The F-test with the ``Prob > F'' figure of 0.0001 shows that the R² value is high enough for us to put our reliance on the regression results.This test is really against a null hypothesis that all the parameters are zero. There is less than a one-in-ten-thousand chance that this null hypothesis is true. So we can be confident that something systematic is going on, but it takes a much closer look to understand the sources and meaning of these systematic influences.Note that the total degrees of freedom (i.e., the available number of observations -1) is not 97 but 83. This is because there are observations in the data set (see Table 5.2) for which the market share is zero. Since one cannot take the logarithm of zero, the program treats those observations as missing, decreasing the total degrees of freedom. The problems associated with zero market shares will be discussed in section 5.11.

The second part of the output gives the parameter estimates; the intercept gives the estimate of a₁ ; D2 through D7 give estimates of

Table 5.3: Regression Results for MCI Equation (5.8)


Model: MODEL1
Dep Variable: LSHARE
Analysis of Variance
		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	20	77.33391	3.86670	8.765	0.0001
Error	63	27.79373	0.44117
C Total	83	105.12764

Root MSE		0.66421	R-Square	0.7356
Dep Mean		1.92529	Adj R-Sq	0.6517
C.V.		34.49902

Parameter Estimates
		Parameter	Standard	T for H₀
Variable	DF	Estimate	Error	Parm=0	Prob > \|T\|
INTRCPT	1	44.798271	4.25812533	10.521	0.0001
D2	1	-0.623847	0.29148977	-2.140	0.0362
D3	1	-1.485840	0.26424009	-5.623	0.0001
D4	1	-1.866469	0.30893368	-6.042	0.0001
D5	1	-3.550847	0.61502980	-5.773	0.0001
D6	1	-1.971343	0.36375236	-5.419	0.0001
D7	1	-2.253214	0.37405428	-6.024	0.0001
DD2	1	0.254732	0.40530020	0.629	0.5319
DD3	1	0.117670	0.38957828	0.302	0.7636
DD4	1	0.620464	0.43444539	1.428	0.1582
DD5	1	0.269731	0.38377375	0.703	0.4847
DD6	1	0.634560	0.38485999	1.649	0.1042
DD7	1	0.644783	0.38546807	1.673	0.0993
DD8	1	0.243568	0.37504599	0.649	0.5184
DD9	1	0.778571	0.38417509	2.027	0.0469
DD10	1	0.424670	0.38363952	1.107	0.2725
DD11	1	0.742352	0.38454418	1.930	0.0581
DD12	1	0.547800	0.38402005	1.426	0.1587
DD13	1	0.274498	0.37351312	0.735	0.4651
DD14	1	-0.214251	0.37808396	-0.567	0.5729
LPRICE	1	-8.337254	0.81605692	-10.217	0.0001

a₂^¢, ¼, a₇^¢ ; DD2 through DD14 give the estimates of g₂, g₃, ¼, g₁₄ ; the value next to LPRICE gives the estimate of b_p , and so forth. From this table several important facts concerning the competitive structure of margarine in this store are learned.

First, the estimated price parameter is a large negative value, -8.34 , indicating that the customers of this store are highly price-sensitive. The statistical significance for the estimate is shown by the T-value and ``Prob > |T|'' column, both of which show that the estimate is highly significant.To be precise it is significantly different from zero. It should also be noted that the reported probability levels are for two-tailed tests. While nondirectional hypotheses are appropriate for time-period and brand dummy variables, we often have directional hypotheses about the influences of prices or other marketing instruments. The reported probabilities should be cut in half to assess the level of significance of one-sided tests.Recall from Chapter 2 that the parameter value is not the same as the share elasticity for a specific brand. In the case of an MCI model, the latter is given by b_p(1 - s_it) . For example, if a brand has a 20% share, its share elasticity with respect to price is approximately -8.34 ×(1 - 0.2) = -6.67 , indicating a 10% price cut should lead to a 66.7% increase in share (from 20% to 33%).

Second, the estimates of brand specific parameters, a₂^¢,¼, a_m^¢ , are all negative and statistically significant. The true values of a₂^¢, ¼, a_m^¢ are estimated by adding the corresponding regression estimates to the estimated value of a₁ . Since a₁ is estimated at 44.8, we know that brand 1 has the strongest attraction if other things are equal. Brand 5 has the weakest attraction with a₁ + a₅ = (44.8-3.55) = 41.25 . This implies that, other things being equal, brand 1 is 35 times (= exp 3.55) as attractive as brand 5. It is rather interesting to note that brand 2 (which is one-pound package of brand 1) has approximately one-half the attraction (exp-.62 » 0.54) of brand 1. Even within a brand a weaker size has to resort to lower unit prices than the stronger size to gain a larger share .

Third, the estimates of g₂, g₃, ¼, g_T are with few exceptions (weeks 6, 7, and 11) statistically insignificant. This normally suggests that dummy variables D₂, D₃, ¼ , D_T may be deleted from the regression model, which in turn suggests that a multiplicative model of market share (discussed in Chapter 2) probably would have done as well as the attraction (MCI) model in analyzing the data in Table 5.1. However, we chose an attraction model not only because of how well it fits the data but because it represents a more logically consistent view of the market

Table 5.4: Regression Results for MNL Equation (5.9)


Model: MODEL1
Dep Variable: LSHARE
Analysis of Variance
		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	20	77.22749	3.86137	8.719	0.0001
Error	63	27.90015	0.44286
C Total	83	105.12764

Root MSE		0.66548	R-Square	0.7346
Dep Mean		1.92529	Adj R-Sq	0.6504
C.V.		34.56501

Parameter Estimates
		Parameter	Standard	T for H₀
Variable	DF	Estimate	Error	Parm=0	Prob > \|T\|
INTRCPT	1	11.250720	1.01638598	11.069	0.0001
D2	1	-0.743850	0.29829963	-2.494	0.0153
D3	1	-1.582301	0.26788475	-5.907	0.0001
D4	1	-1.980421	0.31598491	-6.267	0.0001
D5	1	-3.087742	0.58245966	-5.301	0.0001
D6	1	-2.074613	0.37148467	-5.585	0.0001
D7	1	-2.309865	0.37915203	-6.092	0.0001
DD2	1	0.240284	0.40596127	0.592	0.5560
DD3	1	0.133747	0.39036655	0.343	0.7330
DD4	1	0.648161	0.43501731	1.490	0.1412
DD5	1	0.301956	0.38439750	0.786	0.4351
DD6	1	0.665472	0.38586824	1.725	0.0895
DD7	1	0.680742	0.38658443	1.761	0.0831
DD8	1	0.282518	0.37617724	0.751	0.4554
DD9	1	0.829837	0.38524729	2.154	0.0351
DD10	1	0.486656	0.38459756	1.265	0.2104
DD11	1	0.773457	0.38552149	2.006	0.0491
DD12	1	0.555236	0.38479525	1.443	0.1540
DD13	1	0.315302	0.37443996	0.842	0.4029
DD14	1	-0.236656	0.37918145	-0.624	0.5348
PRICE	1	-0.053868	0.00528884	-10.185	0.0001

and competition.The parameters for the time periods merely serve the role of insuring that the other parameters are identical to those of the original nonlinear model. This structure guarantees that the model will produce market-share estimates which are always non-negative and always sum to one over estimates for all alternatives in a choice situation. ince our purpose is to estimate the parameters of an attraction model correctly, it is not justified for us to drop those dummy variables from the regression equation.

Table 5.4 gives the estimation results by equation (5.8) of the MNL version of attraction model (5.1). The independent variables are the same as those of (5.9), except that price itself is used instead of the logarithm of price. The overall pattern of estimated parameters is very similar to those from (5.9). The estimated value of the price elasticity parameter, b_p , is -0.054 . Recall that the share elasticity with respect to a marketing variable (price in this case) is given by b_p P_it(1 -s_it) . If s_it is 0.2 and price is 150 yen for a brand, the price elasticity is approximately -6.5 , which agrees well with the estimated elasticity value from equation (5.9).

5.2.3 The Analysis-of-Covariance Representation

It may added that regression models (5.8 - 5.9) are equivalent to an analysis-of-covariance (ANCOVA) model of the following form.

MNL Model:

log(s_it) = m+ m_i + m_t +

K
å
k = 1

b_k X_kit +e_it

MCI Model:

log(s_it) = m+ m_i + m_t +

K
å
k = 1

b_k log(X_kit)+ e_it

where:

m = the grand mean

m_i = the brand main effects (i = 1, 2, ¼, m)

m_t = the period main effects (t = 1, 2, ¼, T).

There is no brand-by-period interaction term because there is one observation per brand-period combination. The ANCOVA models yield parameter estimates that are identical to those obtained from models (5.8 - 5.9). This ANCOVA representation clarifies the characteristics of (5.8); an attraction model requires that the period main effects be taken out before the parameters of marketing variables are to be estimated. If we ignore the properties of the error term (discussed in the next section), the ANCOVA model may be convenient to use in practice since it does not require cumbersome specification of brand and period dummy variables.

5.3 Properties of the Error Term

We have deferred the discussion of the analysis of the error term up to this point, though it has been suggested that the error terms in regression models (5.6 - 5.7) and (5.8 - 5.9) are known to have unequal variances and non-zero covariances in some cases and may require special care in estimation. Before we show this, we will have to make some assumptions as to the composition of the error term with respect to the sources of error.

It is important to recognize two sources of errors inherent in the estimation of market-share models. The variability due to sampling is clearly one source of error, but there is another source of error we must consider. Recall that attraction model (5.1) includes an error term, e_i , which arises due to the omission of some relatively minor factors from its specification of explanatory variables, the X_kit 's, in (5.1). We will call this source of error the specification error . Considering those sources of error, the error terms in regression models (5.8 - 5.9) may be expressed as

e_it = e_1it + e_2it

where e_i1t is the specification-error term and e_2it is the sampling-error term.To be precise, the error term in attraction model (5.1) should be written as e_1it , but we will not change the notation at this point for the reasons that will become apparent later. he error term in regression model (5.6) is given by subtracting [`(e)]_t , the means of e_it over i in period t , from e_it . Hence we may write

e_it^* = e_1it^* + e_2it^* = (e_1it -

) + (e_2it -

)

where [`(e)]_1t and [`(e)]_2t are respective means of e_1it and e_2it over i in period t .

5.3.1 Assumptions on the Specification-Error Term

We will make the following assumptions regarding the specification-error term, e_1it , throughout the remainder of this book.

e_1it is normally distributed with mean 0 and variance s_i² ,
the covariance between e_1it and e_1jt is s_ij for all t ,
there is no correlation between e_1it and e_1ju if u ¹ t ,
e_1it is uncorrelated with the sampling-error term, e_2it .

We have so far made no assumption about the sampling-error term (except that it is uncorrelated with the specification-error term) because the method of data collection greatly affects the properties of sampling errors. Two basic methods of data collection will be distinguished.

One is the survey method in which a sample is randomly drawn from a universe of consumers/buyers. In this case the unit of analysis is the individuals in the sample. One may ask the respondent which brand he/she selected or how many times he/she purchased each brand in a period. Individual selections or purchases are then aggregated over the sample to yield market-share estimates. It may be noted that the so-called consumer panels - diary or optical-scanner - share essentially the same characteristics as the survey method as a data collection technique because the unit of analysis is an individual consumer or household.

Another basic method concerns data gathered from POS system. It should be emphasized that POS-generated market-share data are based on all purchases made in a store in a period and not on the responses obtained from a sample of customers to the store. This means that we need not be concerned with the normal sources of sampling variations (i.e., sampling variations among customers within a store). Our only concern is with sampling variations between stores, since POS data currently available to syndicated users are usually based on a sample of stores. We will deal with each type of data collection method in turn.

5.3.2 Survey Data

Let us assume that a series of samples of consumers or buyers is obtained by a simple random sampling. We assume that an independent sample is drawn for each period (or choice situation). Since the following analysis is limited within a period, time subscript t is dropped for simplicity. As noted above, one may ask the respondent either which brand he/she chose or how many times he/she bought each brand in a period. We will have to treat those two questioning techniques separately.

First consider the case in which each respondent is asked which single brand he/she chose from a set of available brands (= choice set). In this case we may assume that the aggregated responses to the question follow a multinomial choice process. Formally stated, given a sample size n and the probability that a respondent chose brand i is p_i (i = 1, 2, ¼,m) (m is the number of available brands), the joint probability that brand i is chosen by n_i individuals (i = 1, 2, ¼, m) is given by

P(n₁, n₂, ¼, n_m) =

n₁! n₂! ¼n_m!

m
Õ
i = 1

p_iⁿⁱ .

The market-share estimates are p_i = n_i/n (i = 1, 2, ¼, m). These estimates are subject to sampling variations.

Let us now turn to the properties of the sampling-error term

e_2i = logp_i - logp_i (i = 1, 2, ¼,m)

when market-share estimates, the p_i 's, are generated by the multinomial process described above. It is well known that for a reasonably large sample size (n > 30 , say), p_i is approximately normally distributed with mean p_i and variance p_i(1 - p_i)/n . Given this approximate distribution, we want to know how e_2i is distributed. We will use the same technique as that used by Berkson. First, expand logp_i by the Taylor expansion around logp_i and retain only the first two terms. Then apply the mean-value theorem to obtain

logp_i = logp_i +

æ
ç
è

p_i - p_i

p_i^*

ö
÷
ø

where p_i^* is a value between p_i and p_i . This shows that for a reasonably large sample size, logp_i is approximately normally distributed with mean logp_i and variance p_i(1 -p_i)/np_i^*2 . The approximation will improve with the increase in sample size, n . Thus the sampling error is also approximately normally distributed with mean zero and variance p_i(1 - p_i)/np_i^*2 . Furthermore, due to the nature of a multinomial process, it is known that e_2i and e_2j ( j ¹ i) in the same period are correlated and have an approximate covariance -p_ip_j/np_i^* p_j^* where p_j^* is a value between p_j and p_j . For a reasonably large sample size, we may take

Var(e_2i)

(1-p_i)/np_i (i = 1, 2, ¼, m)

Cov(e_2i, e_2j)

-1/n (j ¹ i) .

(5.10)

Clearly the variance of the error term is a function in p_i and takes a minimum value 1/n for p_i = 0.5 and a large value for very small values of p_i . For example, if p_i = 0.01 , the variance of e_2i is approximately equal to 99/n . This phenomenon is called heteroscedasticity in the variance of e_2i . But we must also be concerned with the covariance between e_2i and e_2j to the extent 1/n is not negligible.

The above properties of the error term are based on the assumptions that each respondent is asked which brand he/she chose in a given choice situation. The properties change considerably if the respondent is asked how many units of each brand he/she purchased in a period. The individual responses are aggregated over the sample to yield the number of units of brand i bought by the entire sample, x_i (i = 1, 2, ¼, m ). The estimate of market share of brand i is given by [^s]_i = x_i/ x where x is the sum of the x_i 's over i . What are the properties of the error term when the logarithm of [^s]_i is used as the dependent variable in regression model (5.8 - 5.9) or the log-centered value of [^s]_i is used in (5.6 - 5.7)? The answer depends on the assumption we make on the process which generates the x_i 's. In general the derivation of the properties of the error term is a complicated task since [^s]_i is a ratio of two random variables x_i and x , the latter including the former as a part of it. Luckily for us, however, the estimated value of parameters of (5.6 - 5.7) will not change if we used the log-centered value of [`x]_i , the mean of x_i , in place of the log-centered value of [^s]_i in (5.6 - 5.7), since

log

æ
ç
ç
ç
ç
ç
è

_
x

~
x

ö
÷
÷
÷
÷
÷
ø

= log

æ
ç
ç
ç
ç
ç
ç
ç
è

^
s

ö
÷
÷
÷
÷
÷
÷
÷
ø

where [x\tilde] and [[^s]\tilde] are the geometric means of x_i and [^s]_i over i in a given period. This in turn suggests that in regression model (5.8 - 5.9) we may use log(x_i) as the dependent variable without changing the estimated values of parameters other than a₁ . This reduces our task in analyzing the properties of the error term considerably.

Suppose that the x_i 's are generated by an arbitrary multivariate process with means m₁, m₂, ¼, m_m and covariance matrix Q with elements {q_ij}. Note that the true market share is given by s_i = m_i/ m where m is the sum of the m_i 's over i . The sample mean of x_i , [`x]_i , is an estimate of m_i . We obtain the linear approximation of logx_i by the usual method, that is,

log

_
x

= logm_i +

x_i^*

(

_
x

- m_i)

where x_i^* is a value between [`x]_i and m_i . If we replace log([^s]_i) in the equations leading to (5.8 - 5.9) by log[`x]_i , the sampling-error term becomes

e_2i = log

_
x

- logm_i .

When the sample size is reasonably large, the approximate variances and covariances among the e_2i 's are given by

Var(e_2i)

q_ii/nm²_i

(i = 1, 2, ¼, m)

Cov(e_2i,e_2j)

q_ij/nm_im_j

(j ¹ i) .

These results agree with those for the multinomial process, if we note that m_i = p_i and [`x]_i = p_i in the latter process. The variance and covariances of the sampling error term are clearly functions of m_i and may take a large value if m_i or m_j are near zero. The existence of heteroscedasticity is obvious.

We now combine the above results with our assumptions on the specification-error term. Under the assumptions of a multinomial choice process and a single choice per individual, the approximate variances and covariances among the e_i 's in a same period are given by

Var(e_i)

s_i² +Var(e_2i)

(i = 1, 2,¼, m)

Cov(e_i,e_j)

s_ij +Cov(e_2i,e_2j)

(j ¹ i)

where Var(e_2i) and Cov(e_2i, e_2j) are given either by (5.10). Because of the heteroscedasticity of the error term, it is known that the estimated parameters of regression models (5.6 - 5.9) based on the ordinary least-squares (OLS) procedure do not have the smallest variance among the class of linear regression estimators. Nakanishi and Cooper [1974] suggested the use of a two-stage generalized least-squares (GLS) procedure in the case of a multinomial choice situation to reduce the estimation errors associated with regression models (5.6 - 5.9). The interested reader is referred to Appendix 5.14 for more details of this GLS procedure.

5.3.3 POS Data

When the market-share estimates are obtained from POS systems, it is not necessary for us to consider the sampling errors within a store, but, if our market-share data are obtained by aggregating market-share figures for a number of stores, we should expect that there are variations between stores. This presents us the heteroscedasticity problem similar to what we encountered with survey data. But there are additional problems as well. Each store tends to offer its customers a uniquely packaged marketing activities. If we aggregate market-share figures from several stores, we will somehow have to aggregate marketing variables over the stores. As discussed in Chapter 4, aggregation is safe if the causal condition (i.e., promotional variables) are homogeneous over the stores - as might be the case when stores within a grocery chain are combined. One should avoid the ambiguity which results from aggregation, either by explicitly recognizing each individual store or by aggregating only over stores (within grocery chains) with relatively homogeneous promotion policies. We will take this approach in the remainder of this book.

Stated more formally, let s_iht be the market share of brand i in store h in period t , and X_kiht be the value of the k^th marketing variable in store h in period t . Regression model (5.6 - 5.7) may be rewritten with the new notation as

MNL Model:

s^*_iht = a₁ +

m
å
j = 2

a^¢_j d_j +

K
å
k = 1

b_k (X_kiht -

_
X

kht

) + e^*_iht

(5.11)

MCI Model:

s^*_iht = a₁ +

m
å
j = 2

a^¢_j d_j +

K
å
k = 1

b_k log(X_kiht/

~
X

kht

) + e^*_iht

(5.12)

where s^*_iht is the log-centered value of s_iht in store h in period t , and [`X]_kht and [X\tilde]_kht are the arithmetic mean and geometric mean of X_kiht over i in store h in period t .

The main advantage of a disaggregated model such as (5.11 - 5.12) is that we do not have to deal with sampling errors in estimation. Similar expressions may be obtained for (5.7) or (5.9), but in actual applications there will be too many dummy variables which have to be included in the model. It will be necessary to specify (H ×T - 1) dummy variables, where H is the number of stores, which replaces the (T - 1) period dummy variables in (5.8 - 5.9). With only a moderate number of stores and periods it may become impractical to try to include all necessary dummy variables for estimation, in which case the use of models (5.11 - 5.12) is recommended.

5.4 *Generalized Least-Squares Estimation

In the preceding section we noted that the error terms in regression models for estimating parameters of market-share models tend to be heteroscedastic, i.e., have unequal variances and nonzero covariances. If market-share figures are computed from POS data, the error terms in regression models (5.6 - 5.12) involve only what we call specification errors . Let S be the variance-covariance matrix of specification errors with variances s_i² (i = 1, 2, ¼, m) on the main diagonal and covariances s_ij (j ¹ i) as off-diagonal elements. Because matrix S is heteroscedastic, Bultez and NaertBultez, Alain V. & Philippe A. Naert [1975], ``Consistent Sum-Constrained Models,'' Journal of the American Statistical Association, 70, 351 (September) 529-35.proposed an iterative GLS procedure. The steps of an iterative GLS procedure are as follows.

The OLS procedure is used to estimate the parameters in one of the regression models (5.6 - 5.12), and S is estimated from the residual errors.One can simply sort the OLS residuals by brand and time period, compute the variance of each brand's residuals and compute the covariance between ordered residuals for each pair of brands.
The data for each period are re-weighted by the estimated [^(S)]^-^¹^/2 .
The first two steps are repeated until the estimated values of the regression parameters converge.

There is one minor problem in applying this iterative procedure. It may be remembered that, in regression model (5.6), the log-centering transformation is applied to the dependent variable, the variance-covariance matrix for the e^*_it 's is given by

S^* = (I - J/m)S(I - J/m)

where I is an identity matrix and J is a matrix, all elements of which are equal to 1. The dimensions of both I and J is m ×m , where m is the number of available brands. [^(S)]^* computed from OLS residuals is therefore singular and not invertible. Since regression models (5.8 - 5.9) are equivalent to (5.6 - 5.7), the residuals estimated from the former are identical to those estimated from the latter, and hence the estimated covariance matrices are also identical. In general, if both brand-dummy variables and period- (or store-) dummy variables are inserted in a regression model, the estimated residual covariance matrix becomes singular. This certainly is an impediment to the GLS estimation procedure which requires the inverses of estimated covariance matrices.

There are three methods of circumventing this problem. One is to delete one row and corresponding column from [^(S)]^* and invert it. One observation (which corresponds to the deleted row/column of [^(S)]^*) per period is deleted and the parameters are estimated on the remaining data. The drawback of this technique is that estimated parameters will be transformations of original parameters, and hence will have to be transformed back to the original, a process which is rather cumbersome. A second method is to set to zero those off-diagonal elements of an estimated residual covariance matrix which are nearly zero. Though theoretically less justifiable, it has its merit in simplicity. Usually it is sufficient to set just a few elements to zero before the inverse may be obtained.If

one wishes to be more formal in this method, one may set to zero those elements which are not significantly different from zero statistically. On the other hand, by setting all off-diagonal elements to zero we obtain an easily implemented, weighted least-squares () procedures which compensates only for differences the variance of specification errors between brands.The third method is to find the generalized inverse of [^(S)]^* .

5.4.1 Application of GLS to the Margarine Data

As an illustration of the GLS technique consider the data set given in Table 5.1. The OLS estimation technique applied to regression model (5.8) yielded the parameter estimates in Table 5.3. Residual errors were then computed from the above OLS results and S was estimated. The estimated S and its inverse are shown below. Those elements of the estimated S which were less than 0.3 were set to zero before the matrix was inverted.

Covariance Matrix


B	1	2	3	4	5	6	7

1	0.183164	-.050222	-.009048	-.063581	0.102546	-.142236	0.020543
2	-.050222	0.386247	-.233128	-.057280	-.156411	-.261190	0.186987
3	-.009048	-.233128	0.302828	0.062024	-.066156	-.020426	-.086928
4	-.063581	-.057280	0.062024	0.234230	-.304044	-.024887	-.078644
5	0.102546	-.156411	-.066156	-.304044	0.359436	0.074360	0.020021
6	-.142236	-.261190	-.020426	-.024887	0.074360	0.880167	-.444807
7	0.020543	0.186987	-.086928	-.078644	0.020021	-.444807	0.289530

Inverse Covariance Matrix


B	1	2	3	4	5	6	7

1	-62.656	13.7604	-13.3493	14.4918	47.1276	-65.075	-108.934
2	13.760	-0.4766	3.5862	-8.6329	-13.2922	12.164	17.728
3	-13.349	3.5862	1.5801	-2.0956	6.5368	-12.807	-22.087
4	14.492	-8.6329	-2.0956	-4.2420	-14.5748	13.098	23.917
5	47.128	-13.2922	6.5368	-14.5748	-36.9378	45.266	76.131
6	-65.075	12.1642	-12.8072	13.0983	45.2664	-61.315	-102.342
7	-108.934	17.7275	-22.0868	23.9170	76.1315	-102.342	-165.360

The square-root of the above inverse matrix was pre-multiplied by the data matrix for each week, and the estimates of the following form are obtained.

(a₁, a₂, ¼, a_m , b_p)^¢ = [

T
å
t = 1

(X_t^¢

^
S

-1

X_t) ]^-1 [

T
å
t = 1

(X_t^¢

^
S

-1

y_t) ]

where X_t is the independent variable matrix and y_t is the vector of the dependent variable for period t . The re-estimated parameter values are shown in Table 5.5.

Table 5.5: GLS Estimates for Table 5.3


	Parameter		Parameter
Variable	Estimate	Variable	Estimate

Intercept	45.4977
D2	-0.6529	DD6	0.4764
D3	-1.505	DD7	0.4892
D4	-1.8942	DD8	0.0709
D5	-3.4476	DD9	0.6449
D6	-2.0313	DD10	0.5546
D7	-2.2964	DD11	0.6610
DD2	-0.1283	DD12	0.2626
DD3	-0.1412	DD13	0.0022
DD4	0.4260	DD14	-0.3082
DD5	0.1464	LOG(PRICE)	-8.4395

Table 5.5 gives the so-called two-stage GLS estimates. If necessary, residual errors and S may be computed from the above results again and another GLS estimates may be obtained. But, since the parameter estimates in Table 5.3 are extremely close to those in Table 5.5, further iterations seem unnecessary. In fact it has been our experience that OLS and GLS estimates are very similar in many cases. The OLS procedure appears satisfactory in many applications.

So far we have reviewed estimation techniques applicable to relatively simple attraction models (5.1). We have shown in Chapter 3 that attraction models may be extended to include differential effects and cross effects between brands. In the following sections we will discuss more advanced issues related to the parameter estimation of differential-effects and cross-effects (fully extended) models.

5.5 Estimation of Differential-Effects Models

The differential-effects version of attraction model (5.1) is expressed as follows.

A_i

exp(a_i +e_i)

K
Õ
k = 1

f_k(X_ki)^bki

s_i

A_i /

m
å
j = 1

A_j

(5.13)

where either an identity or exponential transformation may be chosen for f_k , depending on whether an MCI or MNL model is desired. The chief difference between (5.1) and (5.13) is the fact that parameter b_ki has an additional subscript i , suggesting that the effectiveness (and hence the elasticity) of a marketing variable may differ from one brand to the next. This is certainly a plausible model in some situations and worth calibrating.

The estimation of parameters b_ki (i = 1, 2, ¼, m) is not extremely complicated. Only a slight modification of regression models (5.6 - 5.9) achieves the result. Using the previous definitions for dummy variables d_j and D_u , the differential-effects versions of regression models (5.6 - 5.7) are given by

MNL Model:

s^*_it =

m
å
j = 2

a_j(d_j -

) +

K
å
k = 1

m
å
j = 1

b_ki (d_j -

)X_kit +e^*_it

(5.14)

MCI Model:

s^*_it =

m
å
j = 2

a_j(d_j -

) +

K
å
k = 1

m
å
j = 1

b_ki (d_j -

) logX_kit +e^*_it .

(5.15)

In regression models (5.14 - 5.15) the independent variables are replaced by each variable multiplied by (d_j - 1/m), which equals (1 - 1/m) if j = i , and -1/m otherwise. Thus the number of independent variables is (m×K) + m - 1 . Note that regression models (5.14 - 5.15) will have to be estimated without the intercept term. Most regression programs provide us with this option.If an intercept term is included, its estimated value will be zero. We cannot obtain the estimate of a₁ from (5.14) or (5.15), but this poses no problem in computing market shares since the estimated value of a_i is actually the difference between true a_i and a₁ .Rather than automatically assigning a₁ as the brand intercept to drop, one can run the regression with all brand intercepts (which will be a singular model) and find the intercept closest to zero as the one to drop.Similarly regression models (5.8 and 5.9) may be modified as follows for their respective differential-effect versions.

MNL Model:

logs_it = a₁ +

m
å
j = 2

a^¢_j d_j +

T
å
u = 2

g_u D_u +

K
å
k = 1

m
å
j = 1

b_ki d_j X_kit +e_it

(5.16)

MCI Model:

logs_it = a₁ +

m
å
j = 2

a^¢_j d_j +

T
å
u = 2

g_u D_u +

K
å
k = 1

m
å
j = 1

b_ki d_j logX_kit +e_it

(5.17)

Regression models (5.14 - 5.15) and (5.16 - 5.17) yield identical estimates of parameters a 's (except a₁) and b 's. If the number of periods (or choice situations) is large, (5.14 - 5.15) will be preferred.

The reader may feel that the following regression models are more straightforward modifications of (5.6 - 5.7), but it is not the case.

MNL Model:

s^*_it = a₁+

m
å
j = 2

a^¢_j d_j+

K
å
k = 1

m
å
j = 1

b_ki d_j(X_kit -

_
X

) + e^*_it

(5.18)

MCI Model:

s^*_it = a₁+

m
å
j = 2

a^¢_j d_j+

K
å
k = 1

m
å
j = 1

b_ki d_j X_kit^* + e^*_it

(5.19)

Models (5.18 - 5.19) do not represent an attraction model, but a log-linear market-share model in which the share of brand i is specified as

s_i = exp(a_i + e_i)

K
Õ
k = 1

f_k(X^*_ki)^bki

where X^*_ki is a centered value of X_ki , that is, ( X_kit -[`X]_kt) if f_k is an exponential transformation and (X_kit/ [X\tilde]_kt) if f_k is an identity transformation. While these models themselves may have desirable features as market-share models, models (5.18 - 5.19) are not the estimating equations for (5.13).The difference here is that (5.14 - 5.15) log-center the differential-effect variable, while (5.18 - 5.19) log-center the simple-effect variable and then multiply these log-centered variables by the brand-specific dummy variables.

Let us see what those modifications mean from the illustrative data of Table 5.1. The independent variable in this case is price. In order to estimate regression model (5.17) (for an MCI version), data must be arranged as in Table 5.6. Only the dependent variable and a part of explanatory variables (log(price) × brand dummy variables) are shown. The week and brand dummy variables are the same style as in Table 5.2.

The estimation results are shown in Table 5.7. The fit of model, as measured by R² , improved from 0.736 to 0.826. The gain from adding six more independent variables (LPD1 through LPD7 instead of LOG(PRICE)) may be measured by the incremental F-ratio 4.9386 ( = (86.8406-77.3339)/(6 × .32083)), which is significant at the .99 level (df = 6, 57). This shows that the differential-effect model is a significant improvement over the explanatory power of the simple-effects model. The estimated parameter values are markedly different from one brand to the next. Looking at the price-parameter estimates, we note that a larger size tends to be more price sensitive than a smaller size even within a brand. Brands 2 and 4 have greater (in absolute values) values than brands 1 and 2. Brand 5 is most price sensitive with the estimated value of -24.08, but this may reflect the fact that this brand's share was zero and hence not available for estimation for 10 weeks out of 14. We shall discuss this issue in a later section. Two brands, 6 and 7, are not price sensitive. Their price parameters are not statistically different from zero as indicated by their respective ``Prob. > |T|'' values. As to the estimates of a 's, we may note that they are negatively correlated with price-parameter estimates over brands, but we will not attempt to make generalizations on the basis of this single example.

The arrangement of data for estimating model (5.15) is given in Table 5.8. Only the dependent variable and the price × brand dummy

Table 5.6: Data Set for Differential-Effects Model


W	B	Log	Log(Price) × Brand Dummy Variables
e	r
e	n	Share	LPD1	LPD2	LPD3	LPD4	LPD5	LPD6	LPD7
k	d

1	1	1.38629	5.2575	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
1	2	3.93183	0.0000	4.9381	0.0000	0.0000	0.0000	0.0000	0.0000
1	3	1.09861	0.0000	0.0000	5.0626	0.0000	0.0000	0.0000	0.0000
1	4	1.09861	0.0000	0.0000	0.0000	4.9836	0.0000	0.0000	0.0000
1	5	.	0.0000	0.0000	0.0000	0.0000	5.0938	0.0000	0.0000
1	6	0.00000	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520	0.0000
1	7	2.19722	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	4.9972
2	1	0.69315	5.2575	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
2	2	4.31749	0.0000	4.9416	0.0000	0.0000	0.0000	0.0000	0.0000
2	3	0.69315	0.0000	0.0000	5.0626	0.0000	0.0000	0.0000	0.0000
2	4	0.00000	0.0000	0.0000	0.0000	5.1358	0.0000	0.0000	0.0000
2	5	.	0.0000	0.0000	0.0000	0.0000	5.0938	0.0000	0.0000
2	6	.	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520	0.0000
2	7	1.60944	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	4.9972
3	1	1.09861	5.2575	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
3	2	3.87120	0.0000	4.9309	0.0000	0.0000	0.0000	0.0000	0.0000
3	3	0.00000	0.0000	0.0000	5.0626	0.0000	0.0000	0.0000	0.0000
3	4	0.00000	0.0000	0.0000	0.0000	5.1358	0.0000	0.0000	0.0000
3	5	3.04452	0.0000	0.0000	0.0000	0.0000	4.6052	0.0000	0.0000
3	6	.	0.0000	0.0000	0.0000	0.0000	0.0000	4.9273	0.0000
3	7	2.56495	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	4.8904
4	1	1.38629	5.2575	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
4	2	3.78419	0.0000	4.9345	0.0000	0.0000	0.0000	0.0000	0.0000
4	3	3.17805	0.0000	0.0000	4.9345	0.0000	0.0000	0.0000	0.0000
4	4	.	0.0000	0.0000	0.0000	5.1358	0.0000	0.0000	0.0000
4	5	.	0.0000	0.0000	0.0000	0.0000	5.0938	0.0000	0.0000
4	6	.	0.0000	0.0000	0.0000	0.0000	0.0000	4.9972	0.0000
4	7	2.39790	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520
5	1	1.60944	5.2575	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
5	2	3.13549	0.0000	4.9345	0.0000	0.0000	0.0000	0.0000	0.0000
5	3	2.30259	0.0000	0.0000	4.9488	0.0000	0.0000	0.0000	0.0000
5	4	0.00000	0.0000	0.0000	0.0000	5.1358	0.0000	0.0000	0.0000
5	5	.	0.0000	0.0000	0.0000	0.0000	5.0938	0.0000	0.0000
5	6	3.25810	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520	0.0000
5	7	1.94591	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520
6	1	1.79176	5.2575	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
6	2	1.79176	0.0000	5.1705	0.0000	0.0000	0.0000	0.0000	0.0000
6	3	1.09861	0.0000	0.0000	5.0626	0.0000	0.0000	0.0000	0.0000
6	4	0.69315	0.0000	0.0000	0.0000	5.1358	0.0000	0.0000	0.0000
6	5	.	0.0000	0.0000	0.0000	0.0000	5.0938	0.0000	0.0000
6	6	3.58352	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520	0.0000
6	7	2.56495	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000	4.8520

Table 5.7: Regression Results for Differential-Effects Model (MCI)


Model: MODEL1
Dep Variable: LSHARE
Analysis of Variance
		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	26	86.84061	3.34002	10.411	0.0001
Error	57	18.28703	0.32083
C Total	83	105.12764
Root MSE		0.56641	R-Square	0.8260
Dep Mean		1.92529	Adj R-Sq	0.7467
C.V.		29.41967

Parameter Estimates
		Parameter	Standard	T for H₀
Variable	DF	Estimate	Error	Parm=0	Prob > \|T\|
INTRCPT	1	36.797056	9.03858643	4.071	0.0001
D2	1	28.212012	11.56852482	2.439	0.0179
D3	1	1.325003	11.79902587	0.112	0.9110
D4	1	12.426886	11.09247984	1.120	0.2673
D5	1	77.155688	41.01572380	1.881	0.0651
D6	1	-32.861595	16.87525706	-1.947	0.0564
D7	1	-43.161568	18.20494075	-2.371	0.0211
DD2	1	0.144666	0.34886522	0.415	0.6799
DD3	1	0.160885	0.34227101	0.470	0.6401
DD4	1	0.783674	0.38370743	2.042	0.0458
DD5	1	0.560437	0.33938645	1.651	0.1042
DD6	1	1.070890	0.34384160	3.114	0.0029
DD7	1	1.087786	0.34488085	3.154	0.0026
DD8	1	0.479316	0.33693519	1.423	0.1603
DD9	1	0.997026	0.34999923	2.849	0.0061
DD10	1	0.689770	0.35708659	1.932	0.0584
DD11	1	1.035196	0.35245369	2.937	0.0048
DD12	1	0.565334	0.35135768	1.609	0.1131
DD13	1	0.176690	0.34733555	0.509	0.6129
DD14	1	0.107222	0.36223872	0.296	0.7683
LPD1	1	-6.837585	1.72929552	-3.954	0.0002
LPD2	1	-12.511968	1.47178224	-8.501	0.0001
LPD3	1	-7.357565	1.51269846	-4.864	0.0001
LPD4	1	-9.629287	1.46960177	-6.552	0.0001
LPD5	1	-24.078656	8.34529863	-2.885	0.0055
LPD6	1	-0.478779	2.78016380	-0.172	0.8639
LPD7	1	1.657518	3.33624420	0.497	0.6212

Table 5.8: Log-Centered Differential-Effects Data


	B
W	r	Log-	Centered Log(Price) × Brand Dummy Variables
e	a	Centered
e	n	Share	LPD1	LPD2	LPD3	LPD4	LPD5	LPD6	LPD7
k	d

1	1	-0.232	4.381	-0.823	-0.844	-0.831	0.000	-0.809	-0.833
1	2	2.313	-0.876	4.115	-0.844	-0.831	0.000	-0.809	-0.833
1	3	-0.520	-0.876	-0.823	4.219	-0.831	0.000	-0.809	-0.833
1	4	-0.520	-0.876	-0.823	-0.844	4.153	0.000	-0.809	-0.833
1	6	-1.619	-0.876	-0.823	-0.844	-0.831	0.000	4.043	-0.833
1	7	0.578	-0.876	-0.823	-0.844	-0.831	0.000	-0.809	4.164
2	1	-0.769	4.206	-0.988	-1.013	-1.027	0.000	0.000	-0.999
2	2	2.855	-1.052	3.953	-1.013	-1.027	0.000	0.000	-0.999
2	3	-0.769	-1.052	-0.988	4.050	-1.027	0.000	0.000	-0.999
2	4	-1.463	-1.052	-0.988	-1.013	4.109	0.000	0.000	-0.999
2	7	0.147	-1.052	-0.988	-1.013	-1.027	0.000	0.000	3.998
3	1	-0.665	4.381	-0.822	-0.844	-0.856	-0.768	0.000	-0.815
3	2	2.108	-0.876	4.109	-0.844	-0.856	-0.768	0.000	-0.815
3	3	-1.763	-0.876	-0.822	4.219	-0.856	-0.768	0.000	-0.815
3	4	-1.763	-0.876	-0.822	-0.844	4.280	-0.768	0.000	-0.815
3	5	1.281	-0.876	-0.822	-0.844	-0.856	3.838	0.000	-0.815
3	7	0.802	-0.876	-0.822	-0.844	-0.856	-0.768	0.000	4.075
4	1	-1.300	3.943	-1.234	-1.234	0.000	0.000	0.000	-1.213
4	2	1.098	-1.314	3.701	-1.234	0.000	0.000	0.000	-1.213
4	3	0.491	-1.314	-1.234	3.701	0.000	0.000	0.000	-1.213
4	7	-0.289	-1.314	-1.234	-1.234	0.000	0.000	0.000	3.639
5	1	-0.432	4.381	-0.822	-0.825	-0.856	0.000	-0.809	-0.809
5	2	1.094	-0.876	4.112	-0.825	-0.856	0.000	-0.809	-0.809
5	3	0.261	-0.876	-0.822	4.124	-0.856	0.000	-0.809	-0.809
5	4	-2.042	-0.876	-0.822	-0.825	4.280	0.000	-0.809	-0.809
5	6	1.216	-0.876	-0.822	-0.825	-0.856	0.000	4.043	-0.809
5	7	-0.096	-0.876	-0.822	-0.825	-0.856	0.000	-0.809	4.043
6	1	-0.129	4.381	-0.862	-0.844	-0.856	0.000	-0.809	-0.809
6	2	-0.129	-0.876	4.309	-0.844	-0.856	0.000	-0.809	-0.809
6	3	-0.822	-0.876	-0.862	4.219	-0.856	0.000	-0.809	-0.809
6	4	-1.227	-0.876	-0.862	-0.844	4.280	0.000	-0.809	-0.809
6	6	1.663	-0.876	-0.862	-0.844	-0.856	0.000	4.043	-0.809
6	7	0.644	-0.876	-0.862	-0.844	-0.856	0.000	-0.809	4.043

variables are shown. In addition, we need (d_j - 1/m), where d_j is the usual brand dummy variable for each brand. Note that all variables sum to zero within each week. Note also that those observations for which log(share) is missing are deleted prior to centering. The estimated values of a₂, a₃, ¼, a_m , b_p1, b_p2,¼, b_pm based on the data in Table 5.8 are identical to those given in Table 5.7.

6 Collinearity in Differential-Effects Models

Bultez and Naert [1975] reported that estimating the parameters of a differential-effects model by equations (5.14) and (5.15) was greatly inconvenienced by the existence of model-induced collinearity. To see their point, consider the data set shown in Table 5.9.

Table 5.9: Hypothetical Data for Differential-Effects Model


	B
W	r	log	X₁ ×Brand Dummies			X₂ × Brand Dummies
e	a
e	n	share
k	d		X₁D₁	X₁D₂	X₁D₃	X₂D₁	X₂D₂	X₂D₃

1	1	log(s₁₁)	X₁₁₁	0	0	X₂₁₁	0	0
1	2	log(s₂₁)	0	X₁₂₁	0	0	X₂₂₁	0
1	3	log(s₃₁)	0	0	X₁₃₁	0	0	X₂₃₁
2	1	log(s₁₂)	X₁₁₂	0	0	X₂₁₂	0	0
2	2	log(s₂₂)	0	X₁₂₂	0	0	X₂₂₂	0
2	3	log(s₃₂)	0	0	X₁₃₂	0	0	X₂₃₂
3	1	log(s₁₃)	X₁₁₃	0	0	X₂₁₃	0	0
3	2	log(s₂₃)	0	X₁₂₃	0	0	X₂₂₃	0
3	3	log(s₃₃)	0	0	X₁₃₃	0	0	X₂₃₃
.	.	.	.	.	.	.	.	.
.	.	.	.	.	.	.	.	.

This data set is for the estimation of regression model (5.16) in which three brands and two independent variables are assumed. (In actual estimation we will need brand and week dummy variables in addition to the variables above.) Collinearity (i.e., high correlations between two or more independent variables) is observed between independent variables for the same brand, e.g., between X₁D₁ and X₂D₁ , between X₁D₂ and X₂D₂ , between X₁D₃ and X₂D₃ , and so forth. The reason for this phenomenon is demonstrated mathematically later in this section, but is easy to understand. Take variables called X₁D₁ and X₂D₁ for example. Those two variables have many zeroes in common for the same observations (weeks). When one takes the correlations between the two variables, those common zeroes artificially inflate the value of the correlation coefficient.

Because of the potential for artificially inflated correlations Bultez and Naert warned against careless usage of differential-effect models. Their warning was, however, somewhat premature. There are two aspects to the problem - the first concerning numerical analysis, and the second concerning the stability of parameters estimates.

Problems arise in numerical analysis when the crossproducts matrix for a regression model becomes singular or so nearly so that it cannot be inverted accurately. But, the crossproducts matrix for regression model (5.15) has a unique structure which is robust against high correlations induced by the model structure. (This is not to say that it is robust against any high correlations.) To simplify the discussion, assume that observations are taken only for three weeks. Then the number of independent variables in regression will be 11 (the intercept term, two week dummy variables, two brand dummy variables, and six variables X₁D₁ through X₂D₃). The crossproduct matrix for this set of variables will look as follows.

æ
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
è

SX_11t

SX_12t

SX_13t

SX_21t

SX_22t

SX_23t

X₁₁₂

X₁₂₂

X₁₃₂

X₂₁₂

X₂₂₂

X₂₃₂

X₁₁₃

X₁₂₃

X₁₃₃

X₂₁₃

X₂₂₃

X₂₃₃

SX_12t

SX_22t

SX_13t

SX_23t

SX_11t

X₁₁₂

X₁₁₃

SX_11t²

SX_11tX_21t

SX_12t

X₁₂₂

X₁₂₃

SX_12t

SX_12t²

SX_12t X_22t

SX_13t

X₁₃₂

X₁₃₃

SX_13t

SX_13t²

SX_13t X_23t

SX_21t

X₂₁₂

X₂₁₃

SX_11t X_21t

SX_21t²

SX_22t

X₂₂₂

X₂₂₃

SX_22t

SX_12tX_22t

SX_22t²

SX_23t

X₂₃₂

X₂₃₃

SX_23t

SX_13t X_23t

SX_33t²

ö
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
ø

In the above matrix summation is always over t (in this case over three weeks).

Collinearity in regression becomes a numerical-analysis problem when the crossproduct matrix such as above is nearly singular and thus the determinant is near zero. Since this matrix is in a block-matrix form, the critical issue is if sub-matrix

æ
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
è

SX_11t²

SX_11t X_21t

SX_12t²

SX_12t X_22t

SX_13t²

SX_13t X_23t

SX_11t X_21t

SX_21t²

SX_12t X_22t

SX_22t²

SX_13t X_23t

SX_33t²

ö
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
ø

is invertible. This matrix may be put in the form of a block-diagonal matrix by simple row-column operations and thus is invertible, if each of the following three matrices is invertible.

æ
ç
è

SX_11t²

SX_11t X_21t

SX_21t²

ö
÷
ø

æ
ç
è

SX_12t²

SX_12t X_22t

SX_22t²

ö
÷
ø

æ
ç
è

SX_13t²

SX_13t X_23t

SX_33t²

ö
÷
ø

This is to say that original correlations between X₁₁ and X₂₁ , X₁₂ and X₂₂ , and X₁₃ and X₂₃ over t are low. This is true even if the apparent (model-induced) correlations between them are high. The important condition for the invertibility of the cross-product matrix as a whole is that the correlations between original variables X_kit and X_hit (h ¹ k) over t are not too high to begin with. (If the correlations between original variables are high, composite measures, such as those based on principal components, will have to be used for any differential-effects market-share model to be effective!) This conclusion does not change if the independent variables are the logarithms of original variables X_ki 's. Thus the numerical-analysis problems created by collinearity in the usual sense are not the real issues in this case.

Even though the matrix will usually be invertible, collinearity can still harm the regression estimates. A further look at the source and remedies for collinearity in these models is helpful. Since Bultez and Naert's [1975] discussion of the problem, their warning about collinearity in differential-effects attraction models has been echoed by Naert and Weverbergh and others.Naert, Philippe A. & Marcel Weverbergh [1981], ``On the Prediction Power of Market Share Attraction Models,'' Journal of Marketing Research, 18 (May), 146-153. Naert, Philippe A. & Marcel Weverbergh [1985], ``Market Share Specification, Estimation and Validation: Toward Reconciling Seemingly Divergent Views,'' Journal of Marketing Research , 22 (November), 453-61. Brodie, Roderick & Cornelius A. de Kluyver [1984], ``Attraction Versus Linear and Multiplicative Market Share Models: An Empirical Evaluation,'' Journal of Marketing Research, 21 (May), 194-201. Ghosh, Avijit, Scott Neslin & Robert Shoemaker [1984], ``A Comparison of Market Share Models and Estimation Procedures,'' Journal of Marketing Research, 21 (May), 202-210. Leeflang, Peter S. H. & Jan C. Reuyl [1984a], ``On the Predictive Power of Market Share Attraction Models,'' Journal of Marketing Research 21 (May), 211-215. Leeflang, Peter S. H. & Jan C. Reuyl [1984b], ``Estimators of the Disturbances in Consistent Sum-Constrained Market Share Models,'' Working Paper, Faculty of Economics, University of Gronigen, P.O. Box 9700 AV Gronigen, The Netherlands.While most of these articles also investigated differential-effects versions of multiplicative and linear-additive market-share models, no mention has been made in the marketing literature of possible collinearities in these model forms.

This section shows that the linear-additive and multiplicative versions of differential-effects market-share models suffer from the same sources of collinearities as the MCI and MNL versions. It is shown that the structural sources of collinearity are largely eliminated by two standardizing transformations - zeta-scores or the exponential transform of a standard z-score - discussed in section 3.8.

5.6.1 Three Differential-Effects Models

The three basic specifications of the differential-effects market-share models - linear-additive (LIN), multiplicative (MULT), and multiplicative competitive-interaction (MCI) or attraction versions - are given in equations (5.20 - 5.22) parallel to the definitions in Naert & Weverbergh's [1984] equations:

LIN

s_it = a_i +

K
å
k = 1

b_ki f_t(X_kit)+ e_it

(5.20)

MULT

s_it = A_it

(5.21)

and MCI

s_it =

A_it

m
å
j = 1

A_jt

(5.22)

where:Note here we are focusing on f_t rather than f_k . We will assume we have agreed on the model type (MCI in this case, that is, f_k is the identity transformation) and our interest here is in the possible influence of transformations within a choice situation on collinearity.

A_it = (a_i +e_it)

K
Õ
k = 1

[ f_t(X_kit) ]^bki .

All of these models are reduced to their corresponding simple-effects versions by assuming:

b_ki = b_kj = b_k " i,j .

The reduced formThe reduced form is simply the variables after they are transformed to be ready for input into a multiple-regression routine.resulting from this simplified estimation procedure allows us to see the similarities among all three specifications of the differential-effects model, as seen in Tables 5.2 and 5.9. Note in Table 5.9 that each differential effect has only one nonzero entry in each time period. The difference between LIN and MULT models is just that the MULT model uses the log of the variable as the nonzero entry and the LIN model uses the raw variable. The difference between the MULT and MCI models is basically that the MCI form incorporates a series of time-period dummy variables from Table 5.2 which insure that the estimated parameters are those of the original nonlinear model in equation (3.1). Another difference, of course, is that the estimates of market share in the MCI model come from inverse log-centering,Nakanishi & Cooper [1982].while in the MULT model the exponential transformation of the estimated dependent variable serves as the market-share estimate. Inverse log-centering and the time-period dummy variables guarantee that the MCI model will provide logically consistent market-share estimates (all estimates being between zero and one, and summing to one over all brands in each time period), while neither LIN or MULT provide logically consistent estimates.

The problem of collinearity can be traced to within-brand effects. There is zero correlation between a time-period dummy variable and a brand-specific dummy variable. Since the time-period dummy variables cannot be a major source of collinearity, then the MULT and MCI models do not differ substantially in their sources of collinearity. Nor do the correlations between effects for different brands contribute substantially to collinearity. For m brands the correlation between brand-specific dummy variables for different brands is -1/(m-1) . With even ten brands there is only 1% overlap in variance between intercepts for different brands. An analogous result holds for the correlations between dummy variables for different time periods. The within-brand effects are analyzed in the next section.

5.6.2 Within-Brand Effects

The special problems of jointly longitudinal and cross-sectional analysis have been discussed in psychometrics, econometrics, as well as the quantitative-analysis areas in education, sociology, and geography. The earliest reference is to Robinson'sRobinson, W. S. [1950], ``Ecological Correlation and the Behavior of Individuals,'' American Sociological Review, 15, 351-357.covariance theorem, which was presented by AlkerAlker, Hayward R. Jr. [1969], ``A Typology of Ecological Fallacies,'' in Mattei Dogan & Stein Rokkan (editors), Quantitative Ecological Analysis in the Social Sciences, Cambridge, MA: The M.I.T. Press, 69-86.as:

r_XY = WR_XY

æ
Ö

1-E_YR²

æ
Ö

1-E_XR²

+ER_XYE_YRE_XR

(5.23)

where:

r_XY is the correlation between column X and column Y in the reduced form of the differential-effects model. In this application X and Y represent within-brand effects such as price and advertising for one brand.

WR_XY is defined to be the pooled within-period correlation of X and Y. In our case this simplifies to a congruence coefficient, giving very high values under certain conditions discussed below.

ER_XY is the between-period or ecological correlation. In our case this is the simple correlation between, say, the log of price and the log of advertising values for a single brand.

E_YR and E_XR are the correlation ratios (i.e., the proportions of variation in X and Y, respectively, that are attributable to between-period differences). In our case these values control how much weight is given to the congruence coefficient versus the simple correlation.

Looking again at Table 5.9 shows that for differential effects within a brand, all the nonzero entries are aligned and all the zero entries are aligned in the reduced form, and there is only one nonzero entry in each time period. This results in very simplified forms for the components of Robinson's covariance theorem. If we let x_t and y_t be the single nonzero entries in period t for column X and Y , respectively, then for our special case:

WR_XY =

T
å
t = 1

x_t y_t

æ
Ö

T
å
t = 1

x_t²

T
å
t = 1

y_t²

This is a congruence coefficient, often used for assessing the agreement between ratio-scaled measures.Tucker, Ledyard R [1951], ``A Method of Synthesis of Factor Analysis Studies,'' Personnel Research Section Report, No. 984, Washington, D.C., Department of the Army. Also see Korth, Bruce & Ledyard R Tucker [1975], ``The Distribution of Chance Coefficients from Simulated Data,'' Psychometrika, 40, 3 (September), 361-372. ecause the mean levels of the variables influence the congruence, x and y of the same sign push WR_XY toward 1.0 much faster than the simple correlation. For prices (greater than $1.00) and advertising expenditures the reduced form would have a series of positive log-values which might well have a very large value for WR_XY. For these same variables in share form (price-share or advertising-share), the reduced form would have matched negative numbers, which still could lead to large values for WR_XY. For variables of consistently opposite signs, WR_XY could push toward -1.0 even in cases of modest simple correlations.

For both raw variables (e.g., price and advertising) and for marketing variables in their share form (e.g., relative price and advertising share) the correlation ratios E_XR² and E_YR² have a maximum value of ¹/_m .

E_XR² =

m²

é
ê
ê
ê
ê
ë

T
å
t = 1

X_jt²

æ
ç
ç
ç
ç
è

T
å
t = 1

X_jt

ö
÷
÷
÷
÷
ø

ù
ú
ú
ú
ú
û

é
ê
ê
ê
ê
ë

T
å
t = 1

X_jt²

æ
ç
ç
ç
ç
è

T
å
t = 1

X_jt

ö
÷
÷
÷
÷
ø

ù
ú
ú
ú
ú
û

(5.24)

So when correlating two effects within a brand we have at best:

r_XY =

æ
ç
è

m-1

ö
÷
ø

WR_XY +

æ
ç
è

ö
÷
ø

ER_XY .

(5.25)

Thus the correlation r_XY is composed of two parts. A small part, at most ¹/_m , is due to the simple correlation of the X and Y values for brand j over time periods. A very large part, at least [(m-1)/ m] is due to the congruence coefficient WR_XY. Thus, for raw-score or share-form marketing variables, pairwise collinearity is likely for any two effects within a brand in differential-effects models. But collinearity is not merely a pairwise problem in these models.For further discussion see Mahajan, Vijay, Arun K. Jain & Michel Bergier [1977], ``Parameter Estimation in Marketing Models in the Presence of Multicollinearity: An Application of Ridge Regression,'' Journal of Marketing Research, 14 (November), 586-591.Collective collinearity for all the within-brand effects is very likely indeed. This is true for the differential-effects versions of the linear-additive model, the multiplicative model as well as the MCI model. Fortunately there exist simple remedies which are the topic of the next section.

5.6.3 Remedies

The remedies for collinearity were hinted at in the Bultez and Naert [1975] article which first discussed the problem. They said, ``... if the variables have zero means'' the correlations in the extended model would be the same as the correlation in the simple model (p. 532). More precisely, it can be said that if the reduced form of the values for brand i for two different variables each have a mean of zero over time periods, then WR_XY is equal to ER_XY, and thus r_XY would be equal to the simple correlation of the reduced forms of the brand i values. This remedy is not a general solution for all variables in a differential-effects model because forming deviation scores within a brand over time ignores competitive effects. One case where this remedy might be appropriate, however, is for a variable reflecting the promotion price of a brand. This variable would reflect current price as a deviation from a brand's historic average price.

As potential remedies, consider zeta-scores and the exponential transformation of standard scores discussed in Chapter 3 (section 3.8). Both transformations standardize the explanatory variables, making the information relative to the competitive context in each time period. There are several advantages to standardizing measures of marketing instruments in each time period. First, one should remember that the dependent measures (share or choice probability) are expressed in a metric which, while normalized rather than standardized, is still focused on representing within time-period relations. Representations of the explanatory variables which have a have similar within time-period focus have the advantage of a compatible metric. In this respect, variables expressed in share form have as much of an advantage as zeta-scores or exp(z-scores). Any of the three would be superior to raw scores in reflecting the explanatory information in a way which aligns with the dependent variable. While raw prices might have a stronger relation with category volume or primary demand, relative prices could have more to do with how the total volume is shared among the competitors.

A second advantage applies to standardizations, rather than normalizations. In the reduced form , the means (of a brand over time periods) of a zeta-score or exp(z-score) are more likely to be closer to zero, than the corresponding means of the reduced form of a normalized variable. Thus WR_XY for a zeta-score or exp(z-score) would be less inflated (closer to the value of the simple correlation ER_XY) than would be the congruence coefficient for two within-brand effects represented in share form.

Table 5.10 provides an empirical demonstration of the effects on collinearity of zeta-scores and exp(z-scores), compared with the raw scores or the share scores. The data concern price and advertising measures representing competition among 11 brands in an Australian household-products category.Carpenter, Cooper, Hanssens, and Midgley [1988].There are 11 differential-price effects, 10 differential-advertising effects, and 10 brand-specific intercepts in a differential-effects market-share model for this category. The tabled values are condition indices reflecting the extent of collinearity or near dependencies among the explanatory variables. A condition index is the ratio of the largest singular value (square root of the eigenvalue) to the smallest singular value of the reduced form of the explanatory variables in the market-share model.Belsley, David A., Edwin Kuh & Roy E. Welsch [1980], Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: John Wiley & Sons, 103-4.The higher the condition index the worse the collinearity in the system of equations. Belsley, Kuh, and Welsch [1980] develop empirical evidence that weak dependencies are associated with condition indices between 5 and 10, moderate to strong relations are associated with indices of 30 to 100, and indices of 100 or more ``appear to be large indeed, causing substantial variance inflation and great potential harm to regression estimates''(p. 153). Note in Table 5.10 for raw scores, X_kit, all three models (LIN, MULT, and MCI) reflect potential problems. These problems are not remedied when marketing instruments are expressed in share form. As a market-share model, which uses the share form of marketing instruments, becomes more comprehensive, by including more brands, the problems would worsen. This is because the price shares and advertising shares would, in general, become smaller, thus making the log of the shares negative numbers of larger and larger absolute value. This would press WR_XY closer to +1.0.

Table 5.10: Condition Indices Australian Household-Products Example


	Transformation of Raw Scores
Model	Raw Scores	Share Form	Zeta-Scores	Exp(Z-Scores)

LIN	3065	313	61	75
MULT	484	3320	22	17
MCI	627	3562	24	23

Standardizing within each competitive set using zeta-scores or exp(z-scores) has a dramatically favorable impact on the collinearity of the system of equations. The condition indices for the MULT and MCI models are less than 25. This is below the level indicating moderate collinearity, and far below the danger point.The absolute standards given by Belsley Kuh and Welsh [1980] for condition indices are probably too conservative. As the number of variables and observations increases we can expect the ratio of the largest and smallest singular values to grow larger. Further study is needed to see what boundaries are acceptable for large data sets.Linear or nonlinear trends in the mean level of the raw variables are major contributors to collinearity. By removing the mean level of the raw variables in each time period, the two remedies illustrated in Table 5.10 both eliminate one major source contributing to high (positive or negative) values in WR_XY. By standardizing the variance over competitors in each time period, both remedies help keep the mean values for each brand over time nearer to zero.

These basic results mean that, if one standardizes variables in a manner appropriate for these multiplicative models, it is practical to use differential-effects market-share models.

5.7 Estimation of Cross-Effects Models

We now come to the estimation problems associated with the fully extended attraction (or cross-effects) model discussed in Chapter 3.

A_i = exp(a_i + e_i)

K
Õ
k = 1

m
Õ
j = 1

f_k(X_kj)^bkij

(5.26)

s_i = A_i /

m
å
j = 1

A_j

As before, the f_k in the above equation may be an identity (for an MCI model) or an exponential (for an MNL model) transformation. The most important property of the above model is, of course, the existence of cross-effect parameters, b_kij ( i, j = 1, 2, ¼,m; k = 1, 2,¼, K). We are now faced with the seemingly insurmountable problem of estimating (m ×m ×K) + m parameters.

Surprisingly, estimating parameters of a cross-effects model is not very difficult, and in some sense easier than estimating parameters of a differential-effects model. McGuire, Weiss, and HoustonMcGuire, Timothy W., Doyle L. Weiss & Frank S. Houston [1977], ``Consistent Multiplicative Market Share Models,'' in Barnett A. Greenberg & Danny N. Bellenger (editors), Contemporary Marketing Thought. 1977 Educators Proceedings (Series # 41), Chicago: American Marketing Association.showed that the following regression models estimate the parameters of (5.26).

MNL Model:

s^*_it = a₁ +

m
å
j = 2

a^¢_j d_j +

m
å
j = 1

K
å
k = 1

m
å
h = 1

b^*_kij d_h X_kjt +e_it

(5.27)

MCI Model:

s^*_it = a₁ +

m
å
j = 2

a^¢_j d_j +

m
å
j = 1

K
å
k = 1

m
å
h = 1

b^*_kij d_h logX_kjt +e_it

(5.28)

where s^*_it is the log-centered value of s_it , the share of brand i in period t . Variable d_j is the usual brand dummy variable, but its value changes depending on where it is used in the above equation. In the first summation, d_j = 1 if j = i, and d_j = 0 otherwise; in the second summation, d_h = 1 if h = j, and d_h = 0 otherwise. It must be pointed out that b^*_kij in models (5.27 - 5.28) are not the same as parameter b_kij in model (5.26), but a deviation of the form

b^*_kij = b_kij -

k.j

where [`(b)]_k.j is the arithmetic mean of b_kij over all brands (i = 1, 2, ¼, m). But it may be shown that the estimated values of b^*_kij 's are sufficient for computing the cross elasticities. Recall from Chapter 3 that the elasticities and cross elasticities of brand i 's share with respect to a change in the k^th variable for brand j is given by

MCI Model:

e_s_{_i.j} = b_kij -

m
å
h = 1

s_h b_khj

MNL Model:

e_s_{_i.j} = ( b_kij -

m
å
h = 1

s_h b_khj) X_kj .

Take the MCI version, for example. Substitute b^*_kij for b_kij in the above equation.

b^*_kij -

m
å
h = 1

s_h b^*_khj

(b_kij-

k.j

m
å
h = 1

s_h (b_khj -

k.j

)

b_kij -

m
å
h = 1

s_h b_kij -

k.j

e_s_{_i.j}

since the sum of s_h over all brands is one. Thus the knowledge of the b^*_kij 's is sufficient to estimate e_s_{_i.j} for both the MCI-type and MNL-type cross-effects models.

Let us apply the regression model proposed by McGuire et al. to the illustrative data in Table 5.1. Since the data necessary for estimation involve 56 variables (including the intercept term), no table of data set-up is shown. Only the estimation results are given in Table 5.11. The model was estimated without the intercept. The notation for independent variables, LP_iD_j , where i and j are appropriate numbers, indicates the effect of log(price) of the i^th brand on brand j 's market share. There is a warning that the model is not full rank , because there are only four observations for brand 5 with a positive market share. Direct-effect parameters, LP_iD_i 's, for brand 1 through 4 are negative and statistically significant, and others are non-significant. Cross-effect parameters are mostly positive and/or statistically non-significant, but one of them, LP7D6, is negative and significant. Although we should refrain from making generalizations from this one set of data, it is perhaps justified to say that, as we move toward more complex models, the limitations of the test data set have become obvious. The number of observations is too small to provide one with stable parameter estimates. Furthermore, there seem to be other factors than price which affect market shares of margarine in this store. It is desirable then to obtain more data, especially from more than one store, along with the information on marketing variables other than price.

Table 5.11: Regression Results for Cross-Effects Model (MCI)


Model: MODEL1
Note : no intercept in model. R-square is redefined.
Dep Variable: LSHARE
Analysis of Variance
		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	52	92.58365	1.78045	8.451	0.0001
Error	32	6.74182	0.21068
U Total	84	99.32547
Root MSE		0.45900	R-Square	0.9321
Dep Mean		-0.00000	Adj R-Sq	0.8218
C.V.		-7.89278E+17
NOTE :	Model is not full rank. Least-squares solutions for the
	parameters are not unique. Some statistics will be
	misleading. A reported DF of 0 or B means that the
	estimate is biased. The following parameters have been
	set to 0, since the variables are a linear combination
	of other variables as shown.
LP4D5 = +2.9223D5 + 0.9531LP1D5 + 0.7226LP2D5 - 0.2564LP3D5
LP5D5 = +5.8144D5 - 0.2300LP1D5
LP6D5 = +6.3121D5 - 0.2992LP1D5 - 0.7777LP2D5 + 0.7946LP3D5
LP7D5 = +5.2704D5 + 0.1566LP1D5 - 0.0181LP2D5 - 0.2200LP3D5

Parameter Estimates
		Parameter	Standard	T for H₀
Variable	DF	Estimate	Error	Parm=0	Prob > \|T\|
D1	1	73.924694	46.10070856	1.604	0.1186
D2	1	-38.091548	46.10070856	-0.826	0.4148
D3	1	38.942488	46.10070856	0.845	0.4045
D4	1	-79.689368	64.73512217	-1.231	0.2273
D5	B	-52.022706	14.10815828	-3.687	0.0008
D6	1	59.130760	69.49275660	0.851	0.4012
D7	1	-17.793812	46.10070856	-0.386	0.7021
LP1D1	1	-5.096306	1.97465203	-2.581	0.0146
LP2D1	1	-0.365029	2.43911085	-0.150	0.8820
LP3D1	1	1.507052	2.27537629	0.662	0.5125
LP4D1	1	-2.353595	1.89200824	-1.244	0.2225
LP5D1	1	0.503063	0.70624068	0.712	0.4814
LP6D1	1	-6.100657	3.98554609	-1.531	0.1357
LP7D1	1	-2.894448	4.25472197	-0.680	0.5012
LP1D2	1	-0.252472	1.97465203	-0.128	0.8991


Parameter Estimates
		Parameter	Standard	T for H₀
Variable	DF	Estimate	Error	Parm=0	Prob > \|T\|
LP2D2	1	-8.625451	2.43911085	-3.536	0.0013
LP3D2	1	2.107563	2.27537629	0.926	0.3613
LP4D2	1	3.041118	1.89200824	1.607	0.1178
LP5D2	1	0.800421	0.70624068	1.133	0.2655
LP6D2	1	1.336924	3.98554609	0.335	0.7395
LP7D2	1	9.615896	4.25472197	2.260	0.0308
LP1D3	1	-0.128008	1.97465203	-0.065	0.9487
LP2D3	1	1.150772	2.43911085	0.472	0.6403
LP3D3	1	-6.671369	2.27537629	-2.932	0.0062
LP4D3	1	-0.446255	1.89200824	-0.236	0.8150
LP5D3	1	0.378551	0.70624068	0.536	0.5957
LP6D3	1	-0.622859	3.98554609	-0.156	0.8768
LP7D3	1	-1.518813	4.25472197	-0.357	0.7235
LP1D4	1	-1.081137	2.35763232	-0.459	0.6496
LP2D4	1	6.997517	3.37627497	2.073	0.0463
LP3D4	1	-3.559763	3.99419070	-0.891	0.3795
LP4D4	1	-6.089339	1.89510791	-3.213	0.0030
LP5D4	1	0.194514	0.74768568	0.260	0.7964
LP6D4	1	12.210535	7.30809600	1.671	0.1045
LP7D4	1	7.680597	4.90866219	1.565	0.1275
LP1D5	B	6.205448	2.34288476	2.649	0.0124
LP2D5	B	3.608572	3.17368514	1.137	0.2640
LP3D5	B	0.569965	3.76236952	0.151	0.8805
LP4D5	0	0	0.00000000	.	.
LP5D5	0	0	0.00000000	.	.
LP6D5	0	0	0.00000000	.	.
LP7D5	0	0	0.00000000	.	.
LP1D6	1	3.523658	2.50575576	1.406	0.1693
LP2D6	1	0.112065	3.39656633	0.033	0.9739
LP3D6	1	-0.322265	4.07062686	-0.079	0.9374
LP4D6	1	1.837399	2.12068866	0.866	0.3927
LP5D6	1	1.098908	0.83529373	1.316	0.1977
LP6D6	1	-1.221249	7.39266721	-0.165	0.8698
LP7D6	1	-17.414894	5.82079344	-2.992	0.0053
LP1D7	1	0.104280	1.97465203	0.053	0.9582
LP2D7	1	1.630654	2.43911085	0.669	0.5086
LP3D7	1	2.086093	2.27537629	0.917	0.3661
LP4D7	1	1.615467	1.89200824	0.854	0.3995
LP5D7	1	-0.313301	0.70624068	-0.444	0.6603
LP6D7	1	-2.643566	3.98554609	-0.663	0.5119
LP7D7	1	1.060157	4.25472197	0.249	0.8048

8 A Multivariate MCI Regression Model

It should be pointed out that the parameter estimates of Table 5.11 may be obtained by applying a simple regression model of the following form to the data for each brand separately.

log(s^*_it) = a_i +

m
å
j = 1

b_pij log(P_jt) +e_it (i = 1, 2, ¼, m)

(5.29)

In the above equation, a_i is simply the intercept term for brand i . The parameters thus estimated are identical to those in Table 5.11, although the significance level of each parameter is usually different from the one in Table 5.11, because the t-statistic and associated degrees of freedom are not the same. If one wishes only parameter estimates, model (5.26) is simpler to calibrate than model (5.13).If we replace log(P_jt) with P_jt , the corresponding MNL model can be estimated.

The fact that (5.29) may be used to estimate the parameters of (5.26) has an extremely important implication. Note that, in estimating (5.29), the data for every brand involve the same set of independent variables, log(P_1t),log(P_2t), ¼, log(P_mt) , plus an intercept term. One may summarize model (5.29) for m brands in the following multivariate regression model.

Y = X B + E

(5.30)

where:

Y = the T ×m matrix with elements {log(s^*_it)} (t = 1, 2, ¼, T ; i = 1, 2, ¼, m)

X = the T ×(1 + m ×K) matrix (J | X₁ | X₂ |¼| X_K)

J = the T ×1 vector (1 1 1 ¼1)¢

X_k = the T ×m matrix with elements {log(X_kit)} (t = 1, 2, ¼, T ; i = 1, 2, ¼, m)

B = the (1 + m ×K) ×m matrix (B₁ | B₂ | ¼|B_m)

B_i = ( a_i |b_1i1 ¼b_1im | b_2i1 ¼b_2im | ¼| b_Ki1 ¼b_Kim)¢

E = the T ×m matrix of elements {e_it} (t = 1, 2, ¼, T ; i = 1, 2, ¼, m) .

Recall our assumptions for the specification-error term are still applicable to the error term, e_it , in the above model. It is well known that under our assumptions on the error term, the OLS procedure, applied to each column of Y in (5.30) separately, yields the best linear-unbiased estimates (BLUE) of the parameters of B.See, for example, Finn, Jeremy D. [1974], A General Model for Multivariate Analysis , New York: Holt, Rinehart & Winston.In other words, it is not necessary to resort to the GLS procedure to obtain minimum-variance estimates of a cross-effects model such as (5.27) or (5.28).

This fact, combined with the availability of equation (5.29) for brand-by-brand estimation, reduces the task of estimating the parameters of a cross-effects model and increases its usefulness as a market-diagnostic tool. When one has a sufficient number of observations (that is, T > 1 + m ×K), it is perhaps best to estimate a cross-effects model first, and then, after examining the pattern of estimated coefficients, determine if a simpler model, such as the simple attraction model or a differential-effects model, is adequate. When the number of observations is barely sufficient for a cross-effects model, one may decide to adopt a strategy to estimate a full cross-effects model first, and then decide to restrict some elements of the B matrix (the parameter matrix) to be zero (cf. Carpenter, Cooper, Hanssens, & Midgley [1988]). In this case, however, the OLS procedure is not applicable and a GLS procedure will have to be used.

5.9 Estimation of Category-Volume Models

So far we have considered the various techniques which may be used to estimate the parameters of market-share models, but the forecasting of brand sales volumes requires more than the knowledge of market shares. Because the sales volume of a given brand in a period is a product of the brand's share and (total) category sales volume for the period, one needs the forecast of category sales volumes.Hereafter we will use category volume instead of industry sales volume , since the former fits better in the context of stores and market shares.

In this section we deal with the estimation of the parameters of category-volume models. Compared with the market-share estimation, the modeling for category sales volumes is a more straightforward application of econometric techniques. The illustrative data in Table 5.1 include the average daily sales volumes of margarine for this store. We will use these data to show some examples of category-volume models.

In this particular data set, brand price is the only marketing variable. We hypothesize that if the overall price level is low, the total volume will be high. We also hypothesize that if sales are extremely high in one week, the sales in the following weeks should be low because the store customers have not used up their stock. In order to represent those two hypotheses, we propose the following model.

Q_t = a + b Q_t-1 + c log

~
P

+ u_t

(5.31)

where:

Q_t = the category volume (in equivalent units) in period t

[P\tilde]_t = the average price level in period t

u_t = an error term

a, b, c = parameters to be estimated.

We let the geometric mean of prices in a period be [P\tilde]_t . The following is the estimation result.

Q_t

508.8

- 0.4652 Q_t-1

- 2.5116 log

~
P

(4.172)

(-1.681)

(-3.620)

R-Square

= 0.5764

T-values are in the parentheses directly below the corresponding parameter estimates. The fit of the model is acceptable, judging from the R²-value of 0.58 . The estimated parameters and their t-values bear out our initial guess that the average price level in the week and the sales volume in the preceding week are influential in determining the category volume.

There is another line of thought concerning the effect of price on category volumes that the prices of different brands have differential effects on category volumes. A brand's price reduction may increase its share, but may not affect category volumes, while another brand's price reduction may increase both its share and category volumes. To incorporate differential effects of brand price, we propose the following model.

Q_t = a + b Q_t-1 +

m
å
i = 1

c_i logP_it + u_t

(5.32)

where the c_i 's are the differential price-effect parameters. The estimation results for this model are given below.

Q_t = 252.62 -0.4947Q_t-1 -0.1646logP_1t -0.4799 logP_2t -0.06799logP_3t -0.1881 logP_4t -0.5646 logP_5t -0.2727 logP_6t +1.0631logP_7t

R-Square = 0.9581

The fit of the model is much improved. Brand 2 and 5 have significant effects on category volumes indicating that when those brands cut prices the customers to this store purchase more than their usual amounts, and that the following week's total volume suffers as a consequence. Note that the brand sales elasticity with respect to price, which measures the overall impact of brand i 's price on its sales volume, is decomposed into two components:

e_Q_{_i}_.P_{_i} = Category-Volume Elasticity + Share Elasticity.

For example, if we assume the differential-effects model, then

e_Q_{_i}_.P_{_i} = c_i + b_p_{_i} (1 - s_i)

where c_i is in model (5.32) and b_p_{_i} is estimated by one of the models (5.14 - 5.17).

With the R² -value of 0.96, equation (5.32) should give reasonably good estimates of category volumes. The positive sign of the estimated parameter for logP_7t poses a theoretical problem, but it probably reflects the effects of some marketing activities within the store which are not included in the model. As a forecasting model for category volume, this model should be used as it is.

Model (5.32) is in the form of a distributed-lag models. It is known that the ordinary-least squares procedure applied to (5.32) yields biased estimates of the model parameters. If there are an adequate number of observations, it is recommended to use time-series analysis procedures for parameter estimation. Weekly data produce a sufficient number of observations in two years for a time-series analysis model. If the number of observations is less than 50, however, it is perhaps best to use the OLS procedure.

These simplest category-volume models are linear in the effects of previous category volume while being linear in the logs of prices. As we incorporate marketing variables other than price, it is advisable to postulate more general, fully interactive models such as:

Q_t = exp(a + u_t) Q_t-1^b

m
Õ
j = 1

P_j^cj

K
Õ
k = 2

exp(b_kj X_kjt) .

(5.33)

The reduced form of such a model may be characterized as being a log-log model in the effects of price and previous category volume, and log-linear in the other marketing variables (such as newspaper features, in-store displays and other marketing instruments which may be binary variables). This general form will be used with the coffee-market example developed in section 5.12.

5.10 Estimation of Share-Elasticities

In Chapter 6 we deal with the market-structure analysis based on the factor analysis of market-share elasticities. The reader may recall that there are two types of market-share elasticities, namely, point- and arc-share elasticities. Since the elasticities obtainable in practice are arc elasticities, one may think of factor-analyzing arc elasticities to investigate the structure of the market and competition. Unfortunately, this is not at all feasible.

Recall the definition of an arc elasticity for variable X_k .

e_s_{_i} =

Ds_i

DX_ki

x_ki

s_i

Ds_i in the above definition is not the total change in s_i , but the change corresponding to the change in X_ki , DX_ki . We have no means of separating the effects of various marketing variables on market shares, unless, of course, we apply some models to observed market shares. Indeed it is the main purpose of the models discussed in this book to identify the effects of marketing variables. Thus, in order to estimate share-elasticities specific to a marketing variable, we propose first to estimate the parameters of a market-share model from a data set (i.e., brand shares and marketing variables), and then use theoretical expressions for point elasticities (see Chapter 3) for the relevant model to obtain elasticities estimates.

A numerical example may clarify this procedure. When we applied the raw-score attraction model to the margarine data in Table 5.1, we have obtained a price-parameter estimate of -8.337. If a brand's share is 0.2, then the point-elasticity estimate is given by -8.337×(1 - 0.2) = -6.67 . Although we are unable to estimate arc elasticities in this manner, point-elasticity estimates will serve as approximations for arc elasticities.

5.11 Problems with Zero Market Shares

Since the dependent variable in log-linear regression is the logarithm of either market shares or the numbers of units sold, it is impossible to compute the value of the dependent variable if observed market shares or numbers of units are zero. In any data collection procedure one may observe a zero market share or number of units sold for some brand-period combination. There are two procedures for handling those data sets which contain zero market shares.

The first is to assign some arbitrarily small values (0.001, say) to zero market shares. But this procedure amounts to assigning a large negative value to log 0, and tends to bias the estimated parameter values. (The smaller the assigned value, the greater the absolute values of estimated parameters.)

The second procedure is to delete from the data set those brand-period combinations for which observed market-shares are zero. Young, Kan H. & Linds Y. Young [1975], ``Estimation of Regressions Involving Logarithmic Transformations of Zero Values in the Dependent Variables,'' The American Statistician , 29 (August), 118-20.Though this procedure may seem arbitrary at first glance, it has some logic of its own. First, if a brand were not bought in a certain period, that would be sufficient basis to infer that the brand was not in the consumers' choice set. Second, since one is usually more interested in estimating accurately the behavior of those brands which command large shares, it may be argued that one need not bother with those brands which often take zero market shares. Third, that zero market shares are not usable for estimation is not a problem limited to log-linear regression procedures. Consider, for example, the case in which the share estimate for brand i in period t is based on the number of consumers who purchased that brand, n_it (i = 1, 2, ¼, m ). Assuming that numbers {n_1t, n_2t, ¼, n_mt} are generated by a multinomial process (see section 5.1.1 on maximum-likelihood estimation), one may wish to use a maximum-likelihood procedure for estimating parameters of attraction models. Note, however, that those observations for which n_it = 0 do not contribute at all to the likelihood function (5.2). In a sense, the maximum-likelihood procedure ignores all brand-period combinations for which n_it = 0 .

There are two drawbacks to the deletion of zero market shares. One is the reduction of the degree of freedom due to the deletion. But this drawback may be compensated by a proper research design in that, if the number of brands per period is reduced by the deletion, the number of periods (or areas) may be increased to obtain an adequate degree of freedom. The second drawback is that the estimated parameters are somewhat biased (in the direction of smaller absolute values). But, we believe that the biases which are introduced by this procedure are far less than those which are introduced by replacing zero shares by an arbitrarily small constant. It may be added that we found in our simulation studies that the true parameter values lie between those estimated after deleting zero-share observations and those estimated after replacing zero shares by an arbitrary constant. This finding leads us to consider another somewhat arbitrary, and so far untested, procedure, which adds a small constant to all brand-period combinations, disregarding if they are zero share or not. In other words, we suggest that the dependent variable, logs_it , is to be replaced by log(s_it + c) , where s_it is the share of brand i in period t and c is the arbitrary constant. We found that, if one selects the value of c properly, the estimated parameters are free of biases which other two procedures tend to create. The appropriate value of c seems to vary from one data set to the next. So far we have been unable to find a logic to determining the correct value of c that is applicable to a particular data set. Here we only indicate that a fruitful course of research may lie in the direction of this estimation procedure.

Zero market shares create particularly difficult problems for the multivariate regression in (5.30). The missing market share for one brand may cause the observation to be deleted from all the regressions. In cases such as this, when it is particularly important to have all the dependent measures present, the EM algorithm discussed by MalhotraMalhotra, Naresh [1987], ``Analyzing Market Research Data with Incomplete Information on the Dependent Variable,'' Journal of Marketing Research , XXIV (February), 74-84. ould be useful.

When imputing values which are missing in the data one should always ask why are the data missing? The imputation literatureFor an excellent recent treatment see Little, Roderick J. A. & Donald B. Rubin [1987], Statistical Analysis with Missing Data . New York: John Wiley & Sons, Inc.treats data missing-at-random (MAR), missing-completely-at-random (MCAR), and missing-by-unknown-mechanisms (MBUM), but rarely do these conditions fit the zero market shares in POS data. If a brand simply is not distributed in one or more of the retail outlets, neither MAR, MCAR, nor MBUM assumptions are appropriate. Even if the brand is distributed, it is not always possible to tell if the zero market share results from an out-of-stock condition or simply from no sales. But, in either case, these conditions are neither random or by unknown mechanisms. One clue comes from the other data associated with a brand. If price and promotional variables are present for the zero-market-share brand, one can assume the brand is distributed, but nothing more. The problem concerns only imputing the value of the dependent measure. If price and promotional measures are also missing, the imputation problem is more severe. Widely differing patterns of distribution would greatly complicate the multivariate regression in (5.30). In such cases it is probably simpler to delete the missing observations in the market-share model, and use the method discussed in section 5.12 for estimating cross effects.

While simply deleting the observation is an acceptable solution to the problem of differing patterns of distribution in market-share models, it is not an acceptable approach to this problem in category-volume models. Zero market share isn't the issue, since the dependent measure is the (log of) total sales volume. But missing values for prices are particularly worrisome, since we cannot take the log of a missing value. In the market-share model for POS data, there is an observation for each brand in each store in each week. For the corresponding category-volume model there is just an observation for each store in each week. The measures in an observation reflect the influence of each brand's prices and promotional activity on total volume. If we were to delete the whole observation whenever a single brand was not in distribution, widely differing distribution patterns over stores could result in the deletion of all observations. We wish to minimize the influence that the missing value has on the parameter corresponding to that measure, but allow the other measures in the observation to have their normal influence in parameter estimation.

While an developing an algorithm to minimize the influence of missing prices is a worthwhile topic for future research, there is a simple approach for achieving a reasonable result in the interim. We merely need to create brand-absence dummy variables, which would take a value of one when then brand is absent and a value of one when present. If we then replace the missing (log) price with a zero, the parameter of the brand-absence measure show the penalty uniquely associated with not distributing the brand. This approach will be illustrated in the next section.

5.12 The Coffee-Market Example

To illustrate the use of these estimation techniques on POS data, consider the ground, caffeinated coffee market. Data, provided by Information Resources, Inc., from BehaviorScan stores in two cities, report price, newspaper feature, in-store display and store-coupon activity for all brands. The small-volume, premium brands were aggregated into an ``All Other Branded'' (AOB) category, and the small ``Private Label'' (PL) brands were aggregated into an ``All Other Private Label'' (AOPL) category. Consequently, twelve brands of coffee were analyzed: Folgers, Regular Maxwell House, Maxwell House Master Blend, Hills Bros., Chock Full O'Nuts, Yuban, Chase & Sanborne, AOB, PL 1, PL 2, PL 3, and AOPL. For eighteen months, each week's data for a brand were aggregated over package weights, and over stores-within-grocery chains in the two cities. These are aggregate data from stores, not discrete-choice data from BehaviorScan consumer panels. Price for each brand was aggregated into average price per pound, net of coupons redeemed. Feature, display and coupon were represented as percent of volume sold on promotions of each type to allow for aggregation over stores with slightly differing promotional environments. The data were divided into a year for calibration of the market-share model, and six months for cross-validation. The average price and market share of each brand appear in Table 5.12.

Table 5.12: Coffee Data - Average Prices and Market Shares


	Average	Average
Brand	Price/lb.	Share

Folgers	$2.33	28.5
Maxwell House	$2.22	24.2
Master Blend	$2.72	7.8
Hills Bros.	$2.13	4.3
Chock Full O Nuts	$2.02	15.3
Yuban	$3.11	0.2
Chase & Sanborne	$2.34	0.3
All Other Branded	$2.64	2.4
Private Label 1	$1.99	3.9
Private Label 2	$1.95	3.6
Private Label 3	$1.93	3.7
All Other Private Labels	$1.95	5.7

5.12.1 The Market-Share Model

With four marketing instruments per brand the full cross-effects model would have 587 parameters (4 ×12 ×12 + 11). To avoid estimating so many parameters an asymmetric market-share model was estimated by procedures similar to those discussed in Carpenter, Cooper, Hanssens, and Midgley [1988].Carpenter et al. suggest forming dynamically weighted, attraction components to deal with the lagged effects of marketing instruments. Chapter 3 discusses alternative methods for specifying the dynamic components, but neither of these approaches was used in this illustration. Store-week data are sufficiently disaggregate that they rarely have the complex time-series properties dealt with in Carpenter et al., so that no dynamically weighted, attraction components were needed.The distinctiveness of marketing efforts were incorporated by using exp(z-scores) for each marketing instrument. A differential-effects model was estimated with a unique parameter for each brand's price, feature, display, and store coupons, and a brand-specific intercept for the qualitative features of each brand using OLS procedures. The brand-specific intercept which was closest to zero (PL 2) was set to zero to avoid singularity. The residuals from this differential-effects model were cross-correlated brand by brand with the transformed contemporaneous explanatory variables for all other brands. The cross-competitive effects which were significant in the residual analysis were entered into the model.The criteria for inclusion of a cross effect were that it had to be based on more than 52 observations and the correlation had to be significant beyond the .05 level.

This specification approach leads to a generalized attraction model:

A_it = exp(a_i + e_1i)

K
Õ
k = 1

[exp(z_kit) ]^bki

Õ
(k^*j^*) eC_i

[exp(z_k*_j*_t) ]^bk^*ij^*

where a_i is brand i 's constant component of attraction, e_1i is specification error, b_ki is brand i 's market-response parameter on the k^th marketing-mix element, exp(z_kit) is brand i 's attraction component for the k^th marketing-mix element (standardized over brands within a store-week), C_i is the set of cross-competitive effects on brand i, exp(z_k*_j*_t) is the standardized attraction component of the cross-competitive influence of brand j^* 's marketing-mix element k^* on brand i , (k^*j^*)eC_i , and b_k*_ij* is the cross-effect parameter for the influence of brand j^* 's attraction component k^* on brand i 's market share.

For the final model the residuals from the OLS estimation were used to estimate the error variances for each brand. The weights for a regression were formed as

w_i =

(1-

)

^
s

These weights compensate for heteroscedasticity of error variances over brands, but do not treat the possibility of nonzero error covariances. The results for the calibration period of 52 weeks appear in Table 5.13.

The resulting model has an R² of .93 with 140 parameters estimated and 2,051 residual degrees of freedom (F₂₀₅₁¹⁴⁰ = 181). Since the model is estimated without an intercept, R² is redefined as is noted on the regression output. In models estimated without an intercept R² is like the congruence coefficient discussed in section 5.6. If the mean of the dependent measure is equal to zero, the lack of an intercept doesn't matter, and R² has the normal interpretation as the proportion of linearly accountable variation in the reduced form of the dependent measure. The dependent measure in the OLS-estimation phase does have a mean of zero (and an R² of .92) but rescaling by the weights affects the mean of the dependent measure. So while it is obvious that the cross-effects model fits extremely well, it is not strictly proper to interpret .93 as the proportion of explained variation.Because reweighting changes the interpretation of R² , to assess the incremental contribution of the cross effects, it is simpler to compare the OLS differential-effects model to the OLS cross-effects model. In this case the OLS differential-effects model has an R² of .82, so that the cross effects represent a substantial improvement over the good-fitting differential-effects model.

We cross validate these models by combining the parameter values in Table 5.13 with fresh data to form a single composite prediction variable, and then correlate the predicted dependent measure with the actual dependent measure for the new observations; 26 weeks of fresh data were used in cross validation. The squared cross-validity correlation is .85 using the parameters in Table 5.13. This is an excellent result for a relationship that uses just one composite variable to predict over 1,000 observations (F₁₀₁₂¹ = 5808). The OLS differential-effects model has a squared cross-validity correlation of .79, indicating that the cross effects do enhance the model in a stable manner.

Table 5.13: Regression Results for Cross-Effects Model (MCI)


Coffee Data Base For Pittsfield And Marion Markets
Ground-Caffeinated Coffee Brands Only
MCI Regression
Model: Coffee
Dep Variable: LCSHARE Log-Centered Share
Analysis Of Variance
		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	140	11556.72	82.55	181.54	0.01
Error	2051	932.62	0.45
U Total	2191	12489.34
Root MSE		0.67	R-Square	0.93
Dep Mean		0.18	Adj R-Sq	0.92
C.V.		383.34
Note:	No intercept term is used. R-Square is redefined.

Parameter Estimates
		Parm	Std	T For H₀:	Prob >
Variable	DF	Est	Err	Parm=0	\|T\|
Folg Intercept	1	2.54	0.17	15.07	0.01
Folg Price Z-Score	1	-0.96	0.07	-13.07	0.01
Folg Featv Z-Score	1	0.06	0.04	1.52	0.13
Folg Dispv Z-Score	1	0.16	0.05	3.56	0.01
Folg Coupv Z-Score	1	-0.13	0.05	-2.53	0.01
RMH Intercept	1	1.92	0.12	15.50	0.01
RMH Price Z-Score	1	-0.58	0.06	-10.02	0.01
RMH Featv Z-Score	1	0.00	0.03	0.12	0.91
RMH Dispv Z-Score	1	0.06	0.03	1.74	0.08
RMH Coupv Z-Score	1	0.11	0.04	2.92	0.01
MHMB Intercept	1	1.79	0.17	10.27	0.01
MHMB Price Z-Score	1	-0.24	0.07	-3.21	0.01
MHMB Featv Z-Score	1	0.19	0.04	5.33	0.01
MHMB Dispv Z-Score	1	0.22	0.05	4.79	0.01
MHMB Coupv Z-Score	1	-0.08	0.06	-1.43	0.15
HlBr Intercept	1	-0.50	0.11	-4.49	0.01
HlBr Price Z-Score	1	0.04	0.07	0.57	0.57
HlBr Featv Z-Score	1	0.48	0.05	8.96	0.01
HlBr Dispv Z-Score	1	0.23	0.05	4.57	0.01
HlBr Coupv Z-Score	1	1.52	0.19	7.97	0.01
CFON Intercept	1	0.61	0.11	5.37	0.01
CFON Price Z-Score	1	-1.33	0.09	-14.50	0.01
CFON Featv Z-Score	1	0.12	0.05	2.27	0.02
CFON Dispv Z-Score	1	-0.04	0.04	-0.94	0.35
CFON Coupv Z-Score	1	-0.22	0.07	-3.35	0.01
Yub Intercept	1	-0.15	0.21	-0.71	0.48
Yub Price Z-Score	1	-0.77	0.09	-8.70	0.01
Yub Featv Z-Score	1	0.21	0.21	0.98	0.33
Yub Dispv Z-Score	1	0.70	0.25	2.82	0.01
Yub Coupv Z-Score	1	0.15	0.22	0.70	0.49
C_S Intercept	1	-0.42	0.17	-2.48	0.01
C_S Price Z-Score	1	-0.27	0.14	-2.01	0.05
C_S Featv Z-Score	1	-0.07	0.31	-0.22	0.83
C_S Dispv Z-Score	1	1.19	0.33	3.65	0.01
C_S Coupv Z-Score	1	0.78	0.24	3.21	0.01


Parameter Estimates, Continued
		Parm	Std	T For H₀:	Prob >
Variable	DF	Est	Err	Parm=0	\|T\|
AOB Intercept	1	0.50	0.12	4.00	0.01
AOB Price Z-Score	1	-0.49	0.06	-8.28	0.01
AOB Featv Z-Score	1	-0.24	0.06	-3.75	0.01
AOB Dispv Z-Score	1	0.13	0.04	2.87	0.01
AOB Coupv Z-Score	1	0.16	0.09	1.84	0.07
PL1 Intercept	1	0.28	0.16	1.75	0.08
PL1 Price Z-Score	1	-1.07	0.09	-11.64	0.01
PL1 Featv Z-Score	1	-0.06	0.04	-1.47	0.14
PL1 Dispv Z-Score	1	-0.06	0.04	-1.62	0.10
PL1 Coupv Z-Score	1	0.03	0.03	0.79	0.43
PL2 Price Z-Score	1	-1.11	0.17	-6.68	0.01
PL2 Featv Z-Score	1	0.06	0.14	0.43	0.67
PL2 Dispv Z-Score	1	0.12	0.13	0.91	0.36
PL2 Coupv Z-Score	1	0.41	0.42	0.97	0.33
PL3 Intercept	1	-0.30	0.22	-1.36	0.17
PL3 Price Z-Score	1	-1.00	0.15	-6.53	0.01
PL3 Featv Z-Score	1	0.02	0.06	0.28	0.78
PL3 Dispv Z-Score	1	0.35	0.41	0.84	0.40
PL3 Coupv Z-Score	1	0.05	0.05	0.95	0.34
AOPL Intercept	1	0.25	0.15	1.68	0.09
AOPL Price Z-Score	1	-0.21	0.06	-3.47	0.01
AOPL Featv Z-Score	1	0.07	0.03	2.62	0.01
AOPL Dispv Z-Score	1	-0.04	0.05	-0.68	0.50
AOPL Coupv Z-Score	1	0.02	0.04	0.43	0.67
Crs Of RMH Price Effect On Folg	1	-0.27	0.07	-3.96	0.01
Crs Of MHMB Price Effect On Folg	1	-0.10	0.08	-1.29	0.20
Crs Of HlBr Price Effect On Folg	1	0.06	0.06	0.98	0.33
Crs Of CFON Price Effect On Folg	1	0.05	0.06	0.92	0.36
Crs Of Yub Price Effect On Folg	1	-0.32	0.06	-5.85	0.01
Crs Of AOB Price Effect On Folg	1	-0.31	0.06	-5.20	0.01
Crs Of RMH Featv Effect On Folg	1	-0.13	0.03	-3.75	0.01
Crs Of Yub Featv Effect On Folg	1	-0.04	0.18	-0.24	0.81
Crs Of RMH Dispv Effect On Folg	1	-0.09	0.04	-2.40	0.02
Crs Of MHMB Dispv Effect On Folg	1	0.12	0.05	2.72	0.01
Crs Of Yub Dispv Effect On Folg	1	0.01	0.21	0.04	0.97
Crs Of AOB Dispv Effect On Folg	1	0.02	0.04	0.44	0.66
Crs Of RMH Coupv Effect On Folg	1	0.03	0.04	0.68	0.50
Crs Of MHMB Coupv Effect On Folg	1	0.03	0.05	0.47	0.64
Crs Of HlBR Coupv Effect On Folg	1	1.04	0.17	6.06	0.01
Crs Of Yub Coupv Effect On Folg	1	0.30	0.18	1.66	0.10
Crs Of AOPL Coupv Effect On Folg	1	-0.06	0.04	-1.70	0.09
Crs Of Folg Price Effect On RMH	1	-0.10	0.06	-1.54	0.12
Crs Of Yub Price Effect On RMH	1	-0.05	0.04	-1.31	0.19
Crs Of AOB Price Effect On RMH	1	-0.22	0.04	-4.83	0.01
Crs Of AOPL Price Effect On RMH	1	0.17	0.04	4.63	0.01
Crs Of Folg Featv Effect On RMH	1	-0.00	0.03	-0.09	0.93
Crs Of Yub Featv Effect On RMH	1	0.19	0.17	1.12	0.26
Crs Of AOB Featv Effect On RMH	1	-0.12	0.05	-2.24	0.03
Crs Of Folg Dispv Effect On RMH	1	-0.04	0.04	-0.92	0.36
Crs Of HlBr Dispv Effect On RMH	1	-0.08	0.03	-2.37	0.02
Crs Of Yub Dispv Effect On RMH	1	-0.49	0.20	-2.46	0.01
Crs Of HlBr Coupv Effect On RMH	1	0.31	0.15	2.09	0.04
Crs Of CFON Coupv Effect On RMH	1	-0.05	0.05	-0.87	0.39


Parameter Estimates, Continued
		Parm	Std	T For H₀:	Prob >
Variable	DF	Est	Err	Parm=0	\|T\|
Crs Of Yub Coupv Effect On RMH	1	0.54	0.18	3.01	0.01
Crs Of AOB Coupv Effect On RMH	1	0.20	0.07	2.76	0.01
Crs Of Yub Price Effect On MHMB	1	-0.10	0.05	-2.10	0.04
Crs Of AOB Price Effect On MHMB	1	-0.29	0.06	-4.92	0.01
Crs Of AOPL Price Effect On MHMB	1	0.38	0.04	9.73	0.01
Crs Of RMH Featv Effect On MHMB	1	-0.04	0.03	-1.27	0.21
Crs Of Yub Featv Effect On MHMB	1	0.52	0.17	3.02	0.01
Crs Of AOB Featv Effect On MHMB	1	-0.12	0.06	-2.19	0.03
Crs Of HlBr Dispv Effect On MHMB	1	-0.09	0.03	-2.69	0.01
Crs Of Yub Dispv Effect On MHMB	1	-0.43	0.22	-2.01	0.04
Crs Of AOPL Dispv Effect On MHMB	1	-0.06	0.05	-1.01	0.31
Crs Of RMH Coupv Effect On MHMB	1	0.08	0.04	2.19	0.03
Crs Of HlBr Coupv Effect On MHMB	1	0.50	0.16	3.04	0.01
Crs Of Yub Coupv Effect On MHMB	1	0.42	0.16	2.56	0.01
Crs Of AOB Coupv Effect On MHMB	1	0.14	0.07	1.89	0.06
Crs Of AOPL Coupv Effect On MHMB	1	-0.00	0.04	-0.14	0.89
Crs Of MHMB Price Effect On HlBr	1	0.19	0.07	2.71	0.01
Crs Of AOB Price Effect On HlBr	1	0.29	0.05	5.82	0.01
Crs Of MHMB Featv Effect On HlBr	1	-0.05	0.07	-0.78	0.44
Crs Of MHMB Dispv Effect On HlBr	1	-0.00	0.08	-0.02	0.99
Crs Of CFON Dispv Effect On HlBr	1	0.03	0.04	0.78	0.43
Crs Of AOB Dispv Effect On HlBr	1	-0.04	0.05	-0.76	0.44
Crs Of RMH Price Effect On CFON	1	0.31	0.08	3.70	0.01
Crs Of MHMB Price Effect On CFON	1	-0.69	0.06	-10.81	0.01
Crs Of HlBr Price Effect On CFON	1	-0.17	0.07	-2.48	0.01
Crs Of Folg Featv Effect On CFON	1	0.10	0.06	1.72	0.09
Crs Of AOB Featv Effect On CFON	1	0.01	0.06	0.11	0.91
Crs Of AOB Dispv Effect On CFON	B	-0.03	0.05	-0.70	0.49
Crs Of Folg Coupv Effect On CFON	0	-0.07	0.08	-0.90	0.37
Crs Of MHMB Coupv Effect On CFON	1	-0.63	0.14	-4.39	0.01
Crs Of HlBr Coupv Effect On CFON	1	0.01	0.19	0.05	0.96
Crs Of Folg Price Effect On Yub	1	0.10	0.06	1.58	0.12
Crs Of Folg Dispv Effect On Yub	1	-0.12	0.08	-1.48	0.14
Crs Of MHMB Dispv Effect On Yub	1	0.49	0.10	4.92	0.01
Crs Of Folg Coupv Effect On Yub	1	-0.07	0.06	-1.27	0.21
Crs Of Folg Price Effect On AOB	1	0.52	0.10	5.43	0.01
Crs Of RMH Price Effect On AOB	1	0.94	0.09	10.61	0.01
Crs Of HlBr Price Effect On AOB	1	0.35	0.08	4.36	0.01
Crs Of CFON Price Effect On AOB	1	-0.00	0.08	-0.03	0.98
Crs Of Yub Price Effect On AOB	1	0.33	0.05	6.06	0.01
Crs Of AOPL Price Effect On AOB	1	0.91	0.07	13.87	0.01
Crs Of Folg Featv Effect On AOB	1	0.01	0.04	0.25	0.80
Crs Of Yub Featv Effect On AOB	1	0.09	0.05	1.74	0.08
Crs Of Folg Dispv Effect On AOB	1	-0.14	0.05	-2.88	0.01
Crs Of Yub Dispv Effect On AOB	1	-0.18	0.06	-2.97	0.01
Crs Of RMH Coupv Effect On AOB	1	0.15	0.04	3.32	0.01
Crs Of CFON Coupv Effect On AOB	1	-0.19	0.07	-2.91	0.01
Crs Of Yub Coupv Effect On AOB	1	0.06	0.13	0.46	0.65
Crs Of Folg Price Effect On AOPL	1	-0.21	0.09	-2.30	0.02
Crs Of RMH Price Effect On AOPL	1	-0.48	0.07	-6.66	0.01
Crs Of MHMB Price Effect On AOPL	1	0.09	0.03	2.66	0.01
Crs Of AOB Price Effect On AOPL	1	-0.08	0.06	-1.33	0.18

These results differ in minor fashion from those previously summarized by Cooper.Cooper, Lee G. [1988b], ``Competitive Maps: The Structure Underlying Asymmetric Cross Elasticities,'' Management Science , 34, 6 (June), 707-23.There are two sources of difference. First, the article is based on the OLS results. Second, the brand-specific effects estimated in that article are based on z-scores, rather than the more traditional brand-specific intercepts adopted in this book. Only the parameter values for the brand-specific effect are substantially affected by the differences between the two approaches. A brand-by-brand summary follows.

Folgers has the largest brand-specific intercept indicating a relatively high baseline level of attraction. If all brands were at the market average for prices and all other marketing instruments, so that only the differences in brand intercepts were reflected in the market share, Folgers would be predicted to capture 36% of the market. This is what we will call a baseline market share .Baseline shares can differ substantially from the average shares reported in Table 5.12. Average shares are a straightforward statistical concept, but baseline shares reflect something of a brand's fundamental franchise, all other things being equal. But all other things are rarely equal. Market power can come from the way a brand uses its marketing instruments (i.e., its promotion policy) as well as from its fundamental franchise. Baseline share figures are reported for each of the brands. These can be usefully compared to the average-share figures, but should not be thought of as a prediction of long-run market share.Folgers has a very strong and significant price parameter. Being priced above the market average will sharply reduce its baseline market share, while price reductions will sharply increase share. There is a positive but insignificant feature effect. There is a strong positive effect for in-store displays. The effect of store coupons is negative and statistically extreme. While we would normally expect store-coupon promotions to have a positive effect, we should note two things. First, the average number of pounds-per-week of Folgers sold on store coupons is 1,175 compare to 2,018 pounds sold on in-store displays and 1,397 pounds sold per week of newspaper features. So there is some indication in these data that this might not be a spurious coefficient. Second, the price measure is net of coupons redeemed. While this reflects the influence of manufacturers coupons as well as store coupons, it does mean that some of the benefits of store coupons are folded into the price effect. There are four significant cross-price effects impacting Folgers. Regular Maxwell House, Maxwell House Master Blend, Yuban, and the AOB category all have significantly less price impact on Folgers than reflected in the differential-effects model. Folgers has significantly more of a price effect on the AOB category and significantly less price impact on the AOPL brands than would otherwise be expected. For features, only the increased competitive impact of Regular Maxwell House is significant. For displays, Regular Maxwell House has more of an effect, while Master Blend has less of an effect than otherwise expected. Folgers' displays exert more pressure on the AOB category than otherwise expected. Hills Bros. coupons put significantly less pressure on Folgers than expected from differential effects alone.

Regular Maxwell House also has a strong, positive brand-specific intercept, which translates into a baseline market share of 19%. It has significant price and coupon effects. Regular Maxwell House has significant competitive price effects on Chock Full O'Nuts and the AOB category, but it exerts significantly less competitive pressure on Folgers and AOPL with its price. AOPL has a significant competitive price effect, while the AOB category exerts significantly less price pressure. RMH features attack Folgers, and features for the AOB category exert significant pressure on RMH. RMH displays exert significant competitive pressure on Folgers, while Hills Bros. and Yuban attack RMH with their displays. RMH coupons have less competitive effect on Master Blend and the AOB category than would otherwise be expected, and coupons for Hill Bros., Yuban and the AOB category have significantly less impact on RMH in return.

Maxwell House Master Blend has a significant intercept which translates into a baseline share of 17%. Price, feature, and display effects are significant in the expected directions. The coupon effect is insignificant and wrong signed. Master Blend receives more price pressure from AOPL, but less from Yuban and the AOB category than would otherwise be expected. In return Master Blend exerts more price pressure on Hills Bros. and AOPL, and less pressure on CFON and Folgers than the differential-effects models could reflect. AOB features are more competitive and Yuban features are less competitive due to their significant cross effects on Master Blend. Master Blend displays are less competitive with both Folgers and Yuban than otherwise expected, while displays for Hills Bros. and Yuban exert extra pressure on Master Blend. Store coupons for Regular Maxwell House, Hills Bros. and Yuban all have less effect than otherwise expected. Store coupons for Master Blend do exert pressure on Chock Full O'Nuts.

Hills Bros.' intercept translates into a baseline share of 2%. It shows strong effects for features, displays, and coupons. The self-price effect is not significant, but it does have a significant competitive price effect on the AOB category. It has less price effect on CFON than otherwise expected. Master Blend and the AOB category exert stable competitive price effects on Hills Bros. There are no feature cross effects, but Hill Bros. has significant competitive display effects on Regular Maxwell House and Maxwell House Master Blend (as already noted).

Chock Full O'Nuts has a small baseline share (5%), but strong price and feature effects. Its use of these instruments helps it maintain the third largest average market share (15%). The Regular Maxwell House has a strong, competitive price effect on Chock Full O'Nuts. But both Master Blend and Hills Bros. exert significantly less price pressure on CFON. There are no significant feature or display cross effects, but CFON's store coupons exert extra pressure on the AOB category and Master Blend's store coupons exert extra pressure on CFON.

Yuban has a baseline share of 2%, but its high price results in a much smaller average share. It has significant price and display effects. Yuban exerts less price pressure on Folgers and Master Blend, but more pressure on the AOB category than otherwise expected. Features for Yuban have less impact on Master Blend than reflected in simpler models. Yuban displays have significant competitive effect on both Maxwell House brands and the AOB category, while Master Blends displays are less competitive in return. The display effect of both Maxwell House brands is reversed in the only two coupon effects concerning Yuban. This is such a small brand in these markets that it probably should have been folded into the AOB category. Its stronger position on the West Coast may have led the authors astray.

Chase & Sanborne also has a baseline share of 2%. Its average share is even less, due to its high price and the infrequency of promotions. Its price, display, and coupon effects are statistically significant. There are no cross effects involving Chase & Sanborne.

The premium brands in the AOB category collectively have a baseline share of 5%. There are strong price and display effects, but the feature effect is statistically extreme in the expected direction. With aggregates of brands such as AOB, it may be hard to get a clear signals from all the parameters. AOB exerts additional competitive price pressure on Hills Bros., but seems to complement Folgers and both Maxwell House brands. The AOB category receives extra price pressure from Folgers, Regular Maxwell House, Hills Bros., Yuban, and AOPL. Features for the AOB category have an extra competitive effect on both Maxwell House brands. Store coupons for AOB and Regular Maxwell House have less effect on each other than otherwise expected, but store coupons for CFON do hurt the AOB category.

The private-label brands (PL 1, PL 2, PL 3 and AOPL) collectively have a baseline share of 13%. All four have significant price effects, and AOPL has a significant feature effect. AOPL exerts price pressure on both Maxwell House brands and the AOB category. While Master Blend returns the press, both Folgers and Regular Maxwell House are less price competitive than otherwise expected. There are no cross effects for features, displays, or store coupons for the private label brands.This was in part dictated by the criterion for a minimum of 53 observations before a significant residual correlation could qualify as a cross effect. This excluded all but the AOPL brand. In the category-volume model presented later in this chapter and in the brand planning exercise in Chapter 7 all the private label brands are aggregated together. If this had been done in the market-share model, more cross effects involving these brands might have been identified. If market-share analysis is done as an iterative process (as was discussed early in this book), this refinement could be undertaken.

That price is a major instrument in this market is reflected in having 11 of 12 self-price effects significant. Four self-feature effects, six self-display effects, three self-coupon effects, and seven brand-specific intercepts were significant.

Residual analysis seems to be a practical means for identifying cross effects. The criterion identified 29 cross-price effects, of which 22 were statistically significant in the final model. There were 12 cross-feature effects, 4 of which were significant in the final model; 18 display effects were identified and half of these were significant in the final model. Of the 20 cross-coupon effects identified in the residuals from the differential-effects model, 10 were significant in the final model.

Reading through a regression output like this is a tedious but useful step in developing an initial understanding of market and competitive structure. But two more elements are needed before responsible brand planning can take place. First, parameters have to be converted to elasticities before an overall picture of the structure can be achieved (see Chapter 6). And second, a category-volume model must be calibrated before a market simulator can be developed. This is the topic of the next section.

5.12.2 The Category-Volume Model

A category-volume model of the style in equation (5.33) is reported in Table 5.14.Only data from grocery chains 1 - 3 are used in this model so that the results would correspond to the competitive maps developed in Chapter 6 and the market simulator developed in Chapter 7.The private-label brands were aggregated into a single

Table 5.14: Regression Results for Category-Volume Model


Dep Variable: LTWVOL
Analysis of Variance
		Sum of	Mean
Source	DF	Squares	Square	F Value	Prob > F
Model	31	42.88	1.38	38.29	0.01
Error	124	4.48	0.04
C Total	155	47.36
Root MSE		0.19	R-Square	0.91
Dep Mean		7.55	Adj R-Sq	0.88
C.V.		2.52

Parameter Estimates
		Parm	Std	T for H₀
Variable	DF	Est	Err	Parm=0	Prob > \|T\|
INTERCEP	1	6.73	0.84	7.98	0.01
BA4-HLBR	1	-0.13	0.35	-0.36	0.72
LPR1-Folg	1	-0.74	0.38	-1.96	0.05
LPR2-RMH	1	-0.73	0.40	-1.83	0.07
LPR3-MHMB	1	0.56	0.51	1.09	0.28
LPR4-HLBR	1	-0.13	0.40	-0.33	0.74
LPR5-CFON	1	-2.09	0.42	-4.97	0.01
LPR6-Yub	1	-0.32	0.73	-0.43	0.67
LPR7-CAS	1	0.77	1.02	0.75	0.45
LPR8-AOB	1	3.25	0.25	13.08	0.01
LPRPL-APL	1	-0.67	0.45	-1.50	0.14
D1-Folg	1	0.62	0.14	4.47	0.01
D2-RMH	1	0.50	0.10	4.79	0.01
D3-MHMB	1	0.29	0.12	2.53	0.01
D4-HLBR	1	0.13	0.06	1.98	0.05
D5-CFON	1	-0.05	0.09	-0.50	0.62
D8-AOB	1	0.38	0.12	3.13	0.01
DPL-APL	1	0.05	0.10	0.48	0.63
C1-Folg	1	-0.13	0.18	-0.70	0.49
C2-RMH	1	0.08	0.10	0.81	0.42
C3-MHMB	1	0.04	0.40	0.10	0.92
C4-HLBR	1	-2.06	0.95	-2.16	0.03
C5-CFON	1	0.30	0.23	1.28	0.20
C8-AOB	1	-0.68	0.59	-1.14	0.26
CPL-APL	1	0.17	0.12	1.44	0.15
F1-Folg	1	-0.08	0.12	-0.67	0.50
F2-RMH	1	0.01	0.09	0.07	0.95
F3-MHMB	1	0.03	0.08	0.39	0.70
F4-HLBR	1	-0.01	0.10	-0.14	0.89
F5-CFON	1	0.06	0.09	0.63	0.53
F8-AOB	1	0.56	0.12	4.68	0.01
FPL-APL	1	0.01	0.10	0.06	0.95

PL brand. A preliminary model showed that lagged volume had no significant effect (t = -.96), that there were no features, displays, or coupons in Chains 1 - 3 for either Yuban or Chase & Sanborne (so that these effects were deleted). Only Hills Bros. had a distribution pattern that required a brand-absence coefficient (BA4).

The overall fit of the model is quite good (R² = .91).This would be boosted to .99 by the inclusion of chain-specific intercepts. But this category-volume model is destined for use in the market simulator to be used in Chapter 7. We feel that the generality of the planning frame used in that chapter is enhanced by predicting volume for a generic chain rather than chain by chain.The strongest price influences on total volume come from discounts for Folgers, Maxwell House, and Chock Full O'Nuts. Discounts for these brands clearly expand the weekly volume. As prices for the aggregate AOB category increase, total volume increases - perhaps reflecting supply conditions or prestige effects for these premium brands. Displays for Folgers, both Maxwell House brands, Hills Bros., and AOB drive up category volume. Hills Bros. store coupons seem to contract total volume, reflecting the infrequent (and apparently counter-cyclical) store-couponing policy for this brand. The only significant feature effect is associated with the AOB category.

5.12.3 Combining Share and Category Volume

The choice of measures incorporated into both the market-share and category-volume models was dictated in large part by the need for a diagnostically useful market simulator. To the extent that the variables inside these markets can explain market behavior, we obtain a way of translating market history into elasticities. Chapter 6 develops methods for mapping the market and competitive structure implied by the elasticities - as well as methods for visualizing the sources driving changes in competitive structure. In Chapter 7 the market-share and category volume models are combined into a market simulator for evaluating the consequences of marketing actions for all brands.

5.13 Large-Scale Competitive Analysis

This section addresses two questions. The first concerns whether or not market-share analysis can be done on a large enough scale to be practical. Simply stated, the issue is how large is too large ? The second issue centers on the fixation managers seem to have concerning the signs of parameters developed using best linear-unbiased estimation. Simply stated, the issue is is BLUE always best ? Both of these topics will be discussed using experience arising from the implementation of market-share models on optical-scanner (POS) records of weekly store sales from Nielsen Micro-Scantrack databases and IRI store-level databases.

There are 15 steps which have been integrated into a SAS^(R) macro program to perform the analytical tasks in estimating asymmetric market-share models.

Form the flat file containing variables [Sales plus Marketing Instruments] and observations [Brands × Stores × Weeks].
Choose the model form (MCI or MNL) and the transformations of variables (zeta-scores, exp(z-scores), or raw scores).
Form the differential-effects file containing the expanded set of variables [Sales + (Instruments + 1) × Brands] for the same observations.
Form the differential-effects covariance matrix and store.
Estimate the differential-effects model.
Find the brand intercept nearest zero and delete.
Re-estimate the differential-effects model.
Compute the residuals and sort by brand.
Cross correlate each brand's residuals with the marketing instruments of every competitor.
Tally the significant cross correlations.
Form the differential cross-effect variables.
Compute and store complete covariances (differential effects and cross-competitive effects).
Simultaneously re-estimate the parameters for all the effects in the calibration data.
Estimate or GLS weights and re-estimate parameters.
Cross validate on fresh data.

5.13.1 How Large Is Too Large?

The size implications of two applications are summarized in Table 5.15. The two applications reported there involve data from IRI and A.C. Nielsen. The IRI data are those just summarized for the ground, caffeinated coffee market. The Micro-Scantrack data involve a mature category of a frequently purchased, branded good. There were around 30 brands which were represented at the brand-size level - leading to 66 competitors in the model. The IRI data tracked four marketing instruments: prices, newspaper features, store coupons, and in-store displays. These data predate the size grading of newspaper features now standard with IRI data. The Nielsen data tracked five marketing instruments: prices, major ads, line ads, coupon ads, and in-store displays. Including the brand-specific intercepts, the Step 3 differential-effects file for the IRI example has 60 variables, while the Nielsen application contains 396 differential-effect variables. With seven grocery chains reporting 52 weeks of sales, the IRI example has about 2200 observations in the calibration data set. The Nielsen example has up to 155 stores reporting each week, which translates to about 113,000 observations in 26 weeks.

Step 10 involves a user-controlled, statistical criterion for which residual correlations are translated into cross-competitive effects. In the IRI application any correlation with more than 52 observations and a significance level more extreme than .05 was selected. This produced 81 cross effects involving all marketing instruments and leading to a Step 12 covariance matrix around 140 × 140. Using the same criterion on the Nielsen example led to the identification of around 4,000 potential cross-competitive effects. This would require the computation of a 4,400 × 4,400 covariance matrix, which is too large to compute in SAS^(R) on an IBM 3083. Making the required number of observations much larger and the required significance level wildly extreme still lead to around 700 potential cross-competitive effects. Finally only the 200 statistically most extreme, cross-competitive effects were selected. These most-extreme effects all involved prices.

The comparison of timing results are somewhat exaggerated by the differences in the mainframes involved. The IBM 3090 model 200 on which the smaller example was run is a enormously capable computer.

Table 5.15: Computer Resources for Two Applications


IRI		Nielsen
Chain-Level Data		Micro Scantrack Data

12	Brands	66	Brand-Sizes
4	Instruments	5	Instruments
	Price		Price
	Features		Major Ads
			Line Ads
	Store Coupons		Coupon Ads
	Displays		Displays
60	Differential Effects	396	Differential Effects
7	Chains/Week	Up to 155	Stores/Week
52 Weeks	~ 2200 Obs.	26 Weeks	~ 113000 Obs.

Cross Effects

Obs > 50	p < .05	Obs > 50	p < .05
79	Cross Effects	~ 4000	Cross Effects
		Pick 200 Most Extreme

Timing

On IBM 3090		On IBM 3083
~ 32 CPU Seconds		~ 120 CPU Minutes
Steps 1 - 15		Steps 1 - 10
		~ 120 CPU Minutes
		Steps 11 - 12
		~ 10 CPU Minutes
		Step 13

While neither the vector or parallel capabilities of this machine were really involved in this illustration, the size of the problem did not tax the resources of the 3090. All 15 steps in the analysis took around 32 CPU seconds. The IBM 3083 used in the large application is an extended architecture (XA) machine, but the time and space required still reflected a substantial strain on the machine resources. The first ten steps required two hours of CPU time, most of which was spent forming the large ( ~ 400 ×400 ) covariance matrix. Forming the extended covariance matrix, including 200 cross effects, required another two hours of CPU time. Once the covariance matrix was stored, however, trying out different specifications in search of a final model only took about 10 CPU minutes per run. The estimation step was not run on the large example.

The huge number of initial cross effects in the 66-competitor example makes it clear that we can get too large unless careful judgment is exercised. The size of the analysis is quite sensitive to the number of competitors for which a full differential-effects specification is attempted. This application would have been more manageable if the 30 brands were considered the basis of the differential-effects specification, and size had been treated as a simple variable in most cases.

The 66-competitor illustration is near the limit of practicality using the system of models employed here. For comparison, however, it is useful to assess the resources needed to estimate this size illustration using the analytical methods developed by ShuganShugan, Steven M. [1987], ``Estimating Brand Positioning Maps from Supermarket Scanning Data,'' Journal of Marketing Research , XXIV (February), 1-18.for data such as these. Shugan's method requires the computation of many simple regressions. If a very fast machine required only 40 nanoseconds to compute a regression, it would take 2 ×10⁸³ CPU seconds to complete the 66-competitor illustration. This means that if a super computer had begun at the moment of the creation of the universe, it would still not be done. In fact, the age of the universe could be taken to the seventh power and computation would still be incomplete.

5.13.2 Is BLUE Always Best?

Best linear-unbiased estimation provides the robust foundation on which the competitive-analysis system relies for its parameter estimates. But, as every analyst knows, some parameters can turn up with the ``wrong signs.'' Price parameters which are positive are difficult to explain except perhaps in prestige product classes. Negative parameters for promotions or advertising are difficult to explain - particularly to the managers running the promotions.

It seems to be left to the analyst to explain such events, as managers seem to presume that they are the consequences or quirks of the models. Analysts assume that the explanation is in the data, and the managers typically know the market conditions reflected in the data far better than the analysts.

There are several basic problems with this scenario. First is a problem of salience - are wrong-signed parameters more salient than they should be? The second problem concerns orientation. In simple constant-elasticity models the parameters are the elasticities. But complex market-response models recognize that elasticities vary as market conditions change. Management needs to know how markets respond to a firm's marketing efforts, but that knowledge is reflected far better in elasticities than in parameters. Third, there is an organizational problem. In the tension between management science and management, analysts should be more responsible for the models and managers more responsible for the data and how results are interpreted. But what one side does not understand should be the responsibility of both sides to figure out. Management scientists must develop and apply techniques across a number of managerial domains. They should not be expected to know the data of a domain with the kind of intimacy needed to manage. The second and third problems are addressed in more depth in Chapter 7, so that only the first is considered further here.

The problem of salience asks if wrong-signed parameter estimates get more attention than their frequency should command. Tables 5.16 and 5.17 summarize the parameter estimates for the two illustrations.

Table 5.16: Summary of BLUE Parameters - IRI Data


	Differential-Effects Model			Cross-Effects Model
		R² = .83	F₂₁₈₄⁵⁹ = 180		R² = .93	F₂₀₅₁¹⁴⁰ = 181

Marketing	Right	No.	Wrong Sign	Right	No.	Wrong Sign
Instruments	Sign	Signif.	p < .05	Sign	Signif.	p < .05
Prices	11/12	9/12	0/12	11/12	11/12	0/12
Features	7/12	3/12	1/12	9/12	4/12	1^*/12
Displays	9/12	8/12	0/12	9/12	6/12	0/12
Coupons	8/12	1/12	1/12	9/12	3/12	2/12
Totals	35/48	21/48	2/48	38/48	24/48	3^*/48
* One aggregate brand.

Table 5.17: Summary of BLUE Parameters - Nielsen Data


	Cross-Effects Model
		R² = .67	F₁₁₃₀₀₀⁴⁴⁶ = 503

Marketing			Wrong Sign
Instruments	Right Sign	Significant	p < .05
Prices	62/66	55/66	4/66
Major Ads	50/66	35/66	1/66
Line Ads	57/66	29/66	1/66
Coupon Ads	43/66	21/66	7/66
Displays	55/66	47/66	2/66
Totals	267/330	187/330	15/330

In Table 5.16 we see that in the differential-effects model 21 of 48 parameters are statistically significant in the expected direction, while only 2 of 48 parameters are statistically extreme with the wrong sign. Moving to the cross-effects model, 24 of 48 differential effects are statistically significant in the expected direction, in spite of the inclusion of 81 cross effects. In the cross-effects model there are 3 of 48 differential-effect parameters which are statistically extreme in the unexpected direction, and one of these relates to a brand aggregate. Since brand aggregates are not expected to behave as regularly as brands, these parameters probably present no problems for the management scientist or the manager. This is certainly not different than one might expect by random chance. Yet it is very likely that these parameters will be the ones questioned by managers. The analyst is forced to track the stability of the pattern of coefficients between the differential-effects model and the cross-effects model, as well as check the possible sources of collinearity of the variables or lack of variability in the instruments in question. But because of the strong prior hypotheses of managers about the directions of marketing effects, the focus is often on the two unusual parameters, rather than the 24 significant differential effects or the 45 significant cross effects which seem to be driving the market. The burden of explanation is on the analysts who may know little about the market data from which these parameters arise.

The problem is tractable perhaps, when only a few parameters require special explanation. But with large-scale applications the number of parameters to follow can reasonably grow large. Table 5.17 summarizes the cross-effects model for the 66-competitor example. While 187 of 330 differential effects are significant in the expected direction, 15 of 330 have the wrong sign and p < .05. 15 of 330 beyond the .05 level is well within expectation, but explaining the source of these potentially anomalous effects is at least time consuming and diverting from the main task of understanding market response.

Given the strong prior hypotheses of managers, there is another approach to parameter estimation which merits study. Quadratic programming would allow us to specify a set of inequality constraints on the parameters which would correspond to the prior hypotheses of managers. Consider an estimation scheme in which the differential-effect parameters estimated in Steps 5 and 7 would be bounded by a quadratic program to conform to the prior hypotheses. The residual analysis in Steps 8 - 11 would proceed as before. But at Step 13 the cross-competitive effect parameters would be estimated against the full set of residuals, rather than recombined with the differential effects in a BLUE scheme for overall recalibration against market shares. This approach gives primacy to the explanatory power of the differential effects. Whatever they can explain which is consistent with prior hypotheses is given to them. The cross-competitive effects are used to explain the systematic part of whatever is left over.

Whenever one considers moving away from BLUE schemes, caution and study are advised. But given the strong priors regarding the effects of marketing instruments, this avenue of research should be pursued.

5.14 Appendix for Chapter 5

5.14.1 Generalized Least Squares Estimation

Nakanishi and Cooper [1974] showed that the total covariance matrix of errors S_e is approximately the sum of the variance-covariance matrix among sampling errors, S_e_₂ , and the variance-covariance matrix among specification errors, S_e_₁ . For the simplified estimation procedures the estimate of S_e_{_2t} comes from

^
S

e_2t

n_t

(

^
P

-1
t

- J)

(5.34)

where n_t is the number of individuals (purchases) in time period t , [^(P)]^-1_t is an (m_t ×m_t) diagonal matrix with entries equal to the inverse of the market shares estimated by the OLS procedure for the m_t brands in this period, and J is a conformal matrix of ones.

The variance-covariance matrix of specification errors, S_e_₁ , is assumed to be constant in each time period and is estimated by [^(s)]_e_₁²I where

^
s

2
e₁

Q -

T
å
t = 1

^
S

e_2t

+tr[(

T
å
t = 1

Z_t^¢Z_t)^-1(

T
å
t = 1

Z_t^¢

^
S

e_2t

Z_t)]

T
å
t = 1

m_t - gK - T

(5.35)

where Q is the sum of squares of the OLS errors, and Z_t is an (m_t ×[K+T]) matrix containing the logs of the K explanatory variables with the T time-period dummy variables concatenated to it. This formula for [^(s)]_e_₁² is considerably simpler than the one in Nakanishi and Cooper [1974, p. 308)] and also corrects a typographical error in that equation.

The total variance-covariance matrix [^(S)]_e is a block-diagonal matrix in which each block is the sum of [^(S)]_e_{_2t} + [^(S)]_e_₁ .

Chapter 5 Parameter Estimation

5.1 Calibrating Attraction Models

5.1.1 Maximum-Likelihood Estimation

5.1.2 Log-Linear Estimation

5.2 Log-Linear Regression Techniques

5.2.1 Organization of Data for Estimation

5.2.2 Reading Regression-Analysis Outputs

5.2.3 The Analysis-of-Covariance Representation

5.3 Properties of the Error Term

5.3.1 Assumptions on the Specification-Error Term

5.3.2 Survey Data

5.3.3 POS Data

5.4 *Generalized Least-Squares Estimation

5.4.1 Application of GLS to the Margarine Data

5.5 Estimation of Differential-Effects Models

6 Collinearity in Differential-Effects Models

5.6.1 Three Differential-Effects Models

5.6.2 Within-Brand Effects

5.6.3 Remedies

5.7 Estimation of Cross-Effects Models

8 A Multivariate MCI Regression Model

5.9 Estimation of Category-Volume Models

5.10 Estimation of Share-Elasticities

5.11 Problems with Zero Market Shares

5.12 The Coffee-Market Example

5.12.1 The Market-Share Model

5.12.2 The Category-Volume Model

5.12.3 Combining Share and Category Volume

5.13 Large-Scale Competitive Analysis

5.13.1 How Large Is Too Large?

5.13.2 Is BLUE Always Best?

5.14 Appendix for Chapter 5

5.14.1 Generalized Least Squares Estimation

Chapter 5
Parameter Estimation