Equations need to be viewed in Intenet Explorer. Symbol fonts are not available in Firefox.

Chapter 5
Parameter Estimation

5.1  Calibrating Attraction Models

In Chapter 3 we presented market-share attraction models in detail. As we tried to describe realistically the market and competitive structures, more and more complex models had to be introduced - ending at a cross-effects model which has a unique role for each piece of information (e.g., each price or each feature) on the brand to which it refers as well as on every other competitor. From the practical point of view, however, these complex models are not useful unless it is possible for one to calibrate them from the actual market performance of brands. Calibration establishes the value or importance of each of these roles in determining the market performance of each brand. In this chapter we will review the techniques to estimate the parameters of attraction models. We will begin with the most basic models, i.e., the simple-effects form of MCI and MNL models, and then proceed to more complex models such as differential-effects and cross-effects models. To remind the reader, the general specification of simple-effects attraction models is given below.

Ai = exp(ai + ei) K
Õ
k = 1 
fk(Xki)bk
(5.1)
si = Ai / m
å
j = 1 
Aj

where:

si = the market share of brand i
Ai = the attraction of brand i
m = the number of brands
Xki = the value of the kth explanatory variable (Xk) for brand i (e.g., prices, product attributes, expenditures for advertising, distribution, sales force)
K = the number of explanatory variables
fk = a positive, monotone transformation of Xk
ei = the specification-error term
ai, bk (i = 1, 2, ¼,m; k = 1, 2, ¼, K) = parameters to be estimated.

We may choose either MCI or MNL models, depending on whether fk is an identity transformation or an exponential transformation. We will often use the MNL model below in order to simplify our presentation, but the corresponding derivations for the MCI model would be straightforward. Before presenting the use of regression analysis, we will first discuss other estimation techniques applicable to model (5.1).

5.1.1  Maximum-Likelihood Estimation

The maximum-likelihood approach to parameter estimation assumes that the data are obtained from a random sample (sample size n) of individuals who are asked to choose one brand from a set of brands (i.e., choice set ).See Haines, George, H., Jr., Leonard S. Simon & Marcus Alexis [1972], ``Maximum Likelihood Estimation of Central-City Food Trading Areas,'' Journal of Marketing Research , IX (May), 154-59. Also see McFadden [1974]. The resultant data consist of the number of individuals who selected object i , ni (i = 1, 2, ¼, m). This describes a typical multinomial choice process. In order for us to use this type of data, we must modify the definition of the model (5.1) slightly. We assume that the probability, pi , rather than the market share si , that an individual chooses brand i , is specified asSee sections 2.8 and 4.1 for discussions of when market shares and choice probabilities are interchangeable.

pi = Ai / m
å
j = 1 
Aj     .

Clearly pi is a function of the parameters of the model, that is, the a 's and b 's. We may write the likelihood for a set of observed choices n1, n2,¼, nm as

L(a1, a2, ¼, am; b1, b2,¼, bK) = m
Õ
i = 1 
pini
(5.2)

and the logarithm of the likelihood function as

logL(a1, a2, ¼, am; b1, b2,¼, bK) = m
å
i = 1 
ni logpi     .

By maximizing L or logL with respect to the parameters of the model, we obtain the maximum-likelihood estimates of them. The maximum-likelihood technique may be extended to the cases where observations are taken at more than one choice situation (multiple time periods, locations, customer groups, etc.) provided that an independent sample of individuals is drawn at each choice situation. For example, if a series of independent samples is drawn over time, the log-likelihood function may be written as

logL(a1, a2, ¼, am; b1, b2,¼, bK) = T
å
t = 1 
  m
å
i = 1 
nit logpit

where nit and pit are the number of individuals who chose brand i in period t and the probability that an individual chooses brand i in period t , respectively, and T is the number of periods under observation.

The maximum-likelihood procedure is a useful technique for parameter estimation in that the properties of estimated parameters are well known,See Haines, et al. [1972].ut we choose not to use it in this book for several reasons. First, since the likelihood and log-likelihood functions are nonlinear in parameters a 's and b 's, the maximum-likelihood procedure requires a nonlinear mathematical-programming algorithm to obtain parameter estimates. Besides being cumbersome to use, such an algorithm does not ensure that the global maximum for the likelihood function is always found. Second, we will be using POS data primarily in calibrating the model. Since POS data generated at a store include multiple purchases in a period made by the same customers, the observed ni 's may not follow the assumptions of a multinomial distribution which underlie the likelihood function. Third, we will be in most cases using observed market shares, that is, the proportions of purchases of brand i , pi , based on an unknown but large total number of purchases.Neither multiple purchases in a single shopping trip, nor purchases of a brand on each of multiple shopping trips within a single reporting period (e.g., a week), fit well with the multinomial-sampling assumptions. Yet both such occurrences can be common in POS data. When analyzing POS data at the store-week level the market shares are not subject to the sampling variation with which maximum-likelihood procedures deal so well. Only the specification error requires special treatment. Section 5.4 presents generalized least-squares (GLS) procedures to cope with the issues.The regression techniques developed in the next section are more easily adaptable to this type of data than the maximum-likelihood procedure.

5.1.2  Log-Linear Estimation

We will be presenting estimation procedures based on regression analysis in the next section, but the fact that logit models could be estimated by first applying a log-linear transformation and then applying a regression procedure has been known for a long time. We will review some of these procedures before we turn to the approach which we believe is the most convenient.

Over thirty-five years ago BerksonBerkson, Joseph [1953], ``A Statistically Precise and Relatively Simple Method of Estimating the Bioassay with Quantal Response, Based on the Logistic Function,'' Journal of the American Statistical Association , 48 (September), 565-99.showed that a logistic model of binary choice becomes linear in parameters by the so-called logit transformation . Suppose that each individual in a sample (of size n) independently chooses object 1 with probability p1 , given by

p1 = 1
1 + b0 exp(- K
å
k = 1 
bk Xk1)
 

where:

p1 = the probability that object 1 is chosen in a binary choice
Xk1 = the kth characteristic of object 1
b0, b1, ¼, bK = the parameters to be estimated.

If the logit transformation is applied to the above model, we have

log æ
ç
è
  p1

1 - p1

  ö
÷
ø
= - logb0 + K
å
k = 1 
bk Xk1
(5.3)

That equation (5.3) is linear in parameters logb0 and bk (k = 1, 2, ¼, K) suggests the use of regression analysis. But, since the probability p1 is unobservable, it must be replaced in the left-hand side of (5.3) by p1 which is the proportion of individuals in the sample who selected object 1. The final estimating equation is in the following form.

log æ
ç
è
  p1t

1 - p1t

  ö
÷
ø
= - logb0+ K
å
k = 1 
bk Xk1t + et     .
(5.4)

The subscript t indicates the tth subgroup from which the p1 's are calculated. The error term et is the difference between logit transforms of p1 and p1 , and known to be a function of p1 and the sample size per subgroup from which p1 is calculated.To examine the property of the error term, first expand the left-hand side of (5.4) by the Taylor expansion, keep the first two terms, and apply the mean-value theorem to obtain

e = log æ
ç
è
  p1

1 - p1

  ö
÷
ø
- log æ
ç
è
  p1

1 - p1

  ö
÷
ø
= æ
ç
è
  p1*

1 - p1*

  ö
÷
ø
(p1-p1)

where p1* is a value between p1 and p1 . The error term is clearly a function of p1 and therefore heteroscedastic (i.e. unequal variance). If we assume a simple binomial process for each individual selecting object 1 and a reasonably large sample size (n > 100 , say), the variance of e is approximately equal to 1/np1(1 - p1) . The use of a generalized least-squares procedure is called for.

Berkson's method has been extended to the estimation of parameters of multinomial logit (MNL) models by Theil.Theil, Henri [1969], ``A Multinomial Extension of the Linear Logit Model,'' International Economics Review , 10 (October), 251-59.Assume a multinomial choice process in which each individual independently selects object i with probability pi from a set of m objects in a single trial, and let pi be specified by an MNL model

Ai = exp(a+ K
å
k = 1 
bk Xki + ei)
pi = Ai / m
å
j = 1 
Aj

This model differs from (5.1) in that a single parameter a is specified instead of m parameters, a1, a2, ¼, am . Theil noted that

log æ
ç
è
  pi

p1

  ö
÷
ø
= log æ
ç
è
  Ai

A1

  ö
÷
ø
= K
å
k = 1 
bk(Xki -Xk1) + (ei -e1)

where 1 is an arbitrarily chosen object, and suggested the following estimation equation which is linear in parameters b1, b2,¼, bK .

log æ
ç
è
  pit

p1t

  ö
÷
ø
= K
å
k = 1 
bk (Xkit - Xk1t) + eit*
(5.5)

where pi is the proportion of individuals who chose object i in sample, and eit* is the combined error term. Subscript t indicates the tth subsample. It is obvious that equation (5.4) is a special case of (5.5) for which the number of objects in the choice set, m , equals 2. The total degrees of freedom for this estimation equation is (m - 1)T where T is the number of subsamples. It is known that the variances of eit* 's are unequal, and McFadden [1974] studied a method for correcting for this problem. The estimation technique which we will propose in the next section is a variant of Theil's method. It is true that both Theil's method and our method yield identical estimates of parameters and their properties are also identical, but we believe that our method has an advantage in its ease of interpretation.

5.2  Log-Linear Regression Techniques

As we have noted in Chapter 2, model (5.1) becomes linear in its parameters by applying the log-centering transformation. Take the MNL model, for example. First, take the logarithm of both sides of (5.1).

 
logsi =
ai + K
å
k = 1 
bk Xki +ei
 

 

- log[ m
å
j = 1 
exp(aj + K
å
k = 1 
bk Xkj +ej) ]     .
 

If we sum the above equation over i (i = 1, 2, ¼, m) and divide by m , we have

 
log ~
s
 
=
 

a
 

+ K
å
k = 1 
bk _
X
 

k 
+

e
 

 
 

 

- log[ m
å
j = 1 
exp(aj + K
å
k = 1 
bk Xkj +ej) ]
 

where [s\tilde] is the geometric mean of si and [`(a)], [`X]k and [`(e)] are the arithmetic means of ai , Xki and ei , respectively, over i . Subtracting the above equation from the preceding one, we obtain the following form which is linear in its parameters.

log æ
ç
ç
ç
ç
è
  si
  ~
s
 
 
  ö
÷
÷
÷
÷
ø
= (ai -

a
 

) + K
å
k = 1 
bk (Xki - _
X
 

k 
) + (ei -

e
 

)     .

Similarly, the application of the log-centering transformation to the MCI model results in

log æ
ç
ç
ç
ç
è
  si
  ~
s
 
 
  ö
÷
÷
÷
÷
ø
= (ai -

a
 

) + K
å
k = 1 
bk log(Xki/ ~
X
 

k 
) + (ei -

e
 

)

where [X\tilde]k is the geometric mean of Xki . Since those two equations are linear in parameters ai* = (ai - [`(a)]) (i = 1, 2, ¼, m) and bk (k = 1, 2, ¼, K ), one may estimate those parameters by regression analysis.

Suppose that we obtain market-share data for T choice situations . In the following, we often let subscript t indicate the observations in period t , but this is simply an example. Needless to say, the data do not have to be limited to time-series data, and choice situations may be stores, areas, customer groups, or combinations such as store-weeks. Applying the log-centering transformation to the market shares and the marketing variables for each situation t creates the following variables:

 
sit* = log(sit/ [s\tilde]t) (i = 1, 2, ¼, m)
[s\tilde]t = the geometric mean of sit
Xkit* = log(Xkit / [X\tilde]kt) ( i = 1, 2, ¼,m; k = 1, 2, ¼, K)
[X\tilde]kt = the geometric mean of Xkit .

Using the above notation, the regression models actually used to estimate the parameters are specified as follows.

MNL Model:

sit* = a1 + m
å
j = 2 
aj¢dj + K
å
k = 1 
bk (Xkit - _
X
 

kt 
) + eit*
(5.6)

MCI Model:

sit* = a1 + m
å
j = 2 
aj¢ dj + K
å
k = 1 
bk Xkit* + eit*
(5.7)

where eit* = (eit - [`(e)]t) and [`(e)]t is the arithmetic mean of eit over i in period t . Variable dj is a dummy (binary-valued) variable which takes value of 1 if j = i and 0 otherwise. Note that estimated values of ai¢ (i = 2, 3, ¼, m) from (5.6 - 5.7) are not the estimates of original parameters ai , but the estimates of difference (ai - a1) where brand 1 is an arbitrarily chosen brand. Thus we have shown that the parameters of attraction model (5.1) are estimable by simple log-linear regression models (5.6 - 5.7). However, as was surmised from the discussion of Berkson's and Theil's methods, the error term eit* in those regression models may not have an equal variance for all i and t . We will turn to this problem in a later section.

In earlier workNakanishi, Masao & Lee G. Cooper [1982], ``Simplified Estimation Procedures for MCI Models,'' Marketing Science , 1, 3 (Summer), 314-22.we showed that the regression models (5.6 - 5.7) are in turn equivalent to the following regression models.

MNL Model:

logsit = a1 + m
å
j = 2 
aj¢ dj + T
å
u = 2 
gu Du + K
å
k = 1 
bk Xkit + eit
(5.8)

MCI Model:

logsit = a1 + m
å
j = 2 
aj¢ dj + T
å
u = 2 
gu Du + K
å
k = 1 
bk logXkit+ eit
(5.9)

Variable Du is another dummy variable which takes value of 1 if u = t and 0 otherwise. The corresponding models (5.6 - 5.7) and (5.8 - 5.9) yield an identical set of estimates of ai¢'s and bk 's , and in this sense they are redundant. But one of the advantages of (5.8 - 5.9) is that it is not necessary to apply the log-centering transformation to market shares and marketing variables before regression analysis can be performed, and therefore reduces the need for pre-processing of data. If the number of choice situations, T , is reasonably small, it is perhaps easier to use (5.8 - 5.9). If T is so large that the specification of dummy variables Du (u = 2, 3, ¼, T) becomes cumbersome, then the use of (5.6 - 5.7) is recommended. In addition, the properties of the error term eit in (5.8 - 5.9) are easier to analyze than those of eit* in (5.6 - 5.7).

5.2.1  Organization of Data for Estimation

Leaving theoretical issues aside for a while, let us look at the actual procedures one must follow for parameter estimation. Given a standardized statistical-program package, such as SAS(R), the first thing one must do is to arrange the data so that the regression analysis program in such a package may handle regression models (5.6 - 5.7) and (5.8 - 5.9).

Suppose that we have market-share data for m brands in T choice situations (periods, areas, customer groups, etc.), and accompanying marketing activities data. Market-share data may be in the form of percentages (or proportions) or in absolute units. If one ignores for the moment the heteroscedasticity (i.e., unequal variances and nonzero covariances) problems associated with the error terms in regression models (5.6 - 5.9), whether the market-share data are in absolute units or in percentages is immaterial, because the log-centering transformation yields identical parameter estimates regardless of whether it is applied to proportions or the actual numbers of units sold.This property of log-centering is called the homogeneity of the 0th degree. The estimated values of a1 in (5.8 - 5.9) are the only terms affected by the choice between proportions and actual numbers, but it does not influence the values of market shares estimated by the inverse log-centering transformation. able 5.1 is an example of market-share data generated by a POS system.

These data were actually obtained at a single store in 14 weeks (i.e., T = 14 ). There are five national and two regional brands of margarine ( m = 7 ). Brand 2 is the same as brand 1 and brand 4 is the same as brand 3, but in larger packages. All brands are half-pound (225g) packages except brands 2 and 4 which are one-pound (450g) packages. The market shares do not sum to one presumably due to private-label brands not listed here. Market shares are volume shares computed by first converting the numbers of units sold to weight volumes and then computing the weight-volume share of each brand. Inspection of the table will show that the market is obviously very price-sensitive.

We will now try to estimate the price elasticity of market shares based on attraction model (5.1). Also given are average daily sales volumes of margarine in this store expressed in units of half-pound package equivalents. The first step in estimation is to create a data set which includes dummy variables dj (j = 2, 3, ¼, m) and Du (u = 2, 3, ¼, T) so that regression model (5.8 - 5.9) may be used. We chose (5.8 - 5.9) because the number of periods T is reasonably small (= 14). Table 5.2 shows a partial listing of the data set arranged for estimation with the REG procedure in the SAS(R) statistical package.

Market share and price data are taken from Table 5.1, and the logarithms of shares and prices are added. In addition two sets of dummy variables - week dummies and brand dummies - are put in the data set. The dummy variables (D1-D5) for only the first five weeks are reported to save space. If the reader examines the pattern of two sets of dummy variables, their meaning should be self-explanatory. The dummy

Table 5.1: POS Data Example (Margarine)

 
                Ave.
  Brands Daily
Weeks 1 2 3 4 5 6 7 Vol.
 
1 share 4 51 3 3 0 1 9 83
price 192 139.5 158 146 163 128 148  
2 share 2 75 2 1 0 0 5 103
price 192 140 158 170 163 128 148  
3 share 3 48 1 1 21 0 13 98
price 192 138.5 158 170 100 138 133  
4 share 4 44 24 - 0 0 11 72
price 192 139 139 170 163 148 128  
5 share 5 23 10 1 - 26 7 84
price 192 139 141 170 163 128 128  
6 share 6 6 3 2 0 36 13 61
price 192 176 158 170 163 128 128  
7 share 4 5 5 3 - 12 20 74
price 192 179 163 170 163 128 128  
8 share 3 2 2 2 41 8 11 107
price 192 169 185 161 100 134 128  
9 share 8 5 3 10 - 21 17 57
price 192 168 188 129.5 163 138 128  
10 share 19 3 1 47 - 5 8 77
price 178 179 188 120 163 138 128  
11 share 12 2 2 19 0 18 15 65
price 178 179 188 136.5 163 138 128  
12 share 6 47 1 5 0 10 9 87
price 180 139.5 188 149 163 141 128  
13 share 2 23 1 13 26 6 5 120
price 192 139 188 137 100 138 128  
14 share 28 15 10 19 3 3 6 107
price 132 139 144 134 109 143 128  
a Brand 2 is the 1 lb. package of brand 1.
b Brand 4 is the 1 lb. package of brand 3.
c Market Share in %.
d Price per 1/2 pound in Yen.

Table 5.2: Data Set for Estimation

 
  B S   P   Week Brand
W r h Log r Log Dummies Dummies
e a a   i                          
e n r Share c Price D D D D D d d d d d d d
k d e   e   1 2 3 4 5 1 2 3 4 5 6 7
 
1 1 4 1.38629 192 5.25750 1 0 0 0 0 1 0 0 0 0 0 0
1 2 51 3.93183 139 4.93806 1 0 0 0 0 0 1 0 0 0 0 0
1 3 3 1.09861 158 5.06260 1 0 0 0 0 0 0 1 0 0 0 0
1 4 3 1.09861 146 4.98361 1 0 0 0 0 0 0 0 1 0 0 0
1 5 0 . 163 5.09375 1 0 0 0 0 0 0 0 0 1 0 0
1 6 1 0.00000 128 4.85203 1 0 0 0 0 0 0 0 0 0 1 0
1 7 9 2.19722 148 4.99721 1 0 0 0 0 0 0 0 0 0 0 1
2 1 2 0.69315 192 5.25750 0 1 0 0 0 1 0 0 0 0 0 0
2 2 75 4.31749 140 4.94164 0 1 0 0 0 0 1 0 0 0 0 0
2 3 2 0.69315 158 5.06260 0 1 0 0 0 0 0 1 0 0 0 0
2 4 1 0.00000 170 5.13580 0 1 0 0 0 0 0 0 1 0 0 0
2 5 0 . 163 5.09375 0 1 0 0 0 0 0 0 0 1 0 0
2 6 0 . 128 4.85203 0 1 0 0 0 0 0 0 0 0 1 0
2 7 5 1.60944 148 4.99721 0 1 0 0 0 0 0 0 0 0 0 1
3 1 3 1.09861 192 5.25750 0 0 1 0 0 1 0 0 0 0 0 0
3 2 48 3.87120 138 4.93087 0 0 1 0 0 0 1 0 0 0 0 0
3 3 1 0.00000 158 5.06260 0 0 1 0 0 0 0 1 0 0 0 0
3 4 1 0.00000 170 5.13580 0 0 1 0 0 0 0 0 1 0 0 0
3 5 21 3.04452 100 4.60517 0 0 1 0 0 0 0 0 0 1 0 0
3 6 0 . 138 4.92725 0 0 1 0 0 0 0 0 0 0 1 0
3 7 13 2.56495 133 4.89035 0 0 1 0 0 0 0 0 0 0 0 1
4 1 4 1.38629 192 5.25750 0 0 0 1 0 1 0 0 0 0 0 0
4 2 44 3.78419 139 4.93447 0 0 0 1 0 0 1 0 0 0 0 0
4 3 24 3.17805 139 4.93447 0 0 0 1 0 0 0 1 0 0 0 0
4 4 . . 170 5.13580 0 0 0 1 0 0 0 0 1 0 0 0
4 5 0 . 163 5.09375 0 0 0 1 0 0 0 0 0 1 0 0
4 6 0 . 148 4.99721 0 0 0 1 0 0 0 0 0 0 1 0
4 7 11 .39790 128 4.85203 0 0 0 1 0 0 0 0 0 0 0 1
5 1 5 .60944 192 5.25750 0 0 0 0 1 1 0 0 0 0 0 0
5 2 23 .13549 139 4.93447 0 0 0 0 1 0 1 0 0 0 0 0
5 3 10 .30259 141 4.94876 0 0 0 0 1 0 0 1 0 0 0 0
5 4 1 0.00000 170 5.13580 0 0 0 0 1 0 0 0 1 0 0 0
5 5 . . 163 5.09375 0 0 0 0 1 0 0 0 0 1 0 0
5 6 26 3.25810 128 4.85203 0 0 0 0 1 0 0 0 0 0 1 0
5 7 7 1.94591 128 4.85203 0 0 0 0 1 0 0 0 0 0 0 1

variables for weeks graphically reflect that the influence of a particular week is constant over brands. The dummy variables for brands graphically reflect that the baseline level of attraction for each brand is constant over weeks, and thus independent of variations in market conditions.

5.2.2  Reading Regression-Analysis Outputs

Now we are in a position to estimate the parameters of attraction model (5.1), in which the only marketing variable is price. Letting Pit be the price of brand i in week t , there is only one attraction component for the MCI version of (5.1) which may be written as

Ait = exp(ai + eit) Pitbp

which in turn shows that the regression model (5.8) is applicable here.

logsit = a1 + m
å
j = 2 
aj¢ dj + T
å
u = 2 
gu Du + bp logPit+ eit     .

Table 5.3 gives the estimation results from the SAS(R) REG procedure.

The dependent variable is, of course, the logarithm of market share. The first part of the output gives the analysis of variance results. The most important summary statistic for us is, of course, the R2 figure of 0.735 (or the adjusted R2 value of 0.65) which suggests that almost 75% of the total variance in the dependent variable (log of share) has been explained by the independent (=exploratory) variables (log of price, in this case) and dummy variables d2 through d7 and D2 through D14 . The F-test with the ``Prob > F'' figure of 0.0001 shows that the R2 value is high enough for us to put our reliance on the regression results.This test is really against a null hypothesis that all the parameters are zero. There is less than a one-in-ten-thousand chance that this null hypothesis is true. So we can be confident that something systematic is going on, but it takes a much closer look to understand the sources and meaning of these systematic influences.Note that the total degrees of freedom (i.e., the available number of observations -1) is not 97 but 83. This is because there are observations in the data set (see Table 5.2) for which the market share is zero. Since one cannot take the logarithm of zero, the program treats those observations as missing, decreasing the total degrees of freedom. The problems associated with zero market shares will be discussed in section 5.11.

The second part of the output gives the parameter estimates; the intercept gives the estimate of a1 ; D2 through D7 give estimates of

Table 5.3: Regression Results for MCI Equation (5.8)

 
Model: MODEL1      
Dep Variable: LSHARE      
Analysis of Variance
    Sum of Mean    
Source DF Squares Square F Value Prob > F
Model 20 77.33391 3.86670 8.765 0.0001
Error 63 27.79373 0.44117    
C Total 83 105.12764      
           
Root MSE 0.66421 R-Square 0.7356  
Dep Mean 1.92529 Adj R-Sq 0.6517  
C.V. 34.49902      
 
Parameter Estimates
    Parameter Standard T for H0  
Variable DF Estimate Error Parm=0 Prob > |T|
INTRCPT 1 44.798271 4.25812533 10.521 0.0001
D2 1 -0.623847 0.29148977 -2.140 0.0362
D3 1 -1.485840 0.26424009 -5.623 0.0001
D4 1 -1.866469 0.30893368 -6.042 0.0001
D5 1 -3.550847 0.61502980 -5.773 0.0001
D6 1 -1.971343 0.36375236 -5.419 0.0001
D7 1 -2.253214 0.37405428 -6.024 0.0001
DD2 1 0.254732 0.40530020 0.629 0.5319
DD3 1 0.117670 0.38957828 0.302 0.7636
DD4 1 0.620464 0.43444539 1.428 0.1582
DD5 1 0.269731 0.38377375 0.703 0.4847
DD6 1 0.634560 0.38485999 1.649 0.1042
DD7 1 0.644783 0.38546807 1.673 0.0993
DD8 1 0.243568 0.37504599 0.649 0.5184
DD9 1 0.778571 0.38417509 2.027 0.0469
DD10 1 0.424670 0.38363952 1.107 0.2725
DD11 1 0.742352 0.38454418 1.930 0.0581
DD12 1 0.547800 0.38402005 1.426 0.1587
DD13 1 0.274498 0.37351312 0.735 0.4651
DD14 1 -0.214251 0.37808396 -0.567 0.5729
LPRICE 1 -8.337254 0.81605692 -10.217 0.0001

a2¢, ¼, a7¢ ; DD2 through DD14 give the estimates of g2, g3, ¼, g14 ; the value next to LPRICE gives the estimate of bp , and so forth. From this table several important facts concerning the competitive structure of margarine in this store are learned.

First, the estimated price parameter is a large negative value, -8.34 , indicating that the customers of this store are highly price-sensitive. The statistical significance for the estimate is shown by the T-value and ``Prob > |T|'' column, both of which show that the estimate is highly significant.To be precise it is significantly different from zero. It should also be noted that the reported probability levels are for two-tailed tests. While nondirectional hypotheses are appropriate for time-period and brand dummy variables, we often have directional hypotheses about the influences of prices or other marketing instruments. The reported probabilities should be cut in half to assess the level of significance of one-sided tests.Recall from Chapter 2 that the parameter value is not the same as the share elasticity for a specific brand. In the case of an MCI model, the latter is given by bp(1 - sit) . For example, if a brand has a 20% share, its share elasticity with respect to price is approximately -8.34 ×(1 - 0.2) = -6.67 , indicating a 10% price cut should lead to a 66.7% increase in share (from 20% to 33%).

Second, the estimates of brand specific parameters, a2¢,¼, am¢ , are all negative and statistically significant. The true values of a2¢, ¼, am¢ are estimated by adding the corresponding regression estimates to the estimated value of a1 . Since a1 is estimated at 44.8, we know that brand 1 has the strongest attraction if other things are equal. Brand 5 has the weakest attraction with a1 + a5 = (44.8-3.55) = 41.25 . This implies that, other things being equal, brand 1 is 35 times (= exp 3.55) as attractive as brand 5. It is rather interesting to note that brand 2 (which is one-pound package of brand 1) has approximately one-half the attraction (exp-.62 » 0.54) of brand 1. Even within a brand a weaker size has to resort to lower unit prices than the stronger size to gain a larger share .

Third, the estimates of g2, g3, ¼, gT are with few exceptions (weeks 6, 7, and 11) statistically insignificant. This normally suggests that dummy variables D2, D3, ¼ , DT may be deleted from the regression model, which in turn suggests that a multiplicative model of market share (discussed in Chapter 2) probably would have done as well as the attraction (MCI) model in analyzing the data in Table 5.1. However, we chose an attraction model not only because of how well it fits the data but because it represents a more logically consistent view of the market

Table 5.4: Regression Results for MNL Equation (5.9)

 
Model: MODEL1      
Dep Variable: LSHARE      
Analysis of Variance
    Sum of Mean    
Source DF Squares Square F Value Prob > F
Model 20 77.22749 3.86137 8.719 0.0001
Error 63 27.90015 0.44286    
C Total 83 105.12764      
           
Root MSE 0.66548 R-Square 0.7346  
Dep Mean 1.92529 Adj R-Sq 0.6504  
C.V. 34.56501      
 
Parameter Estimates
    Parameter Standard T for H0  
Variable DF Estimate Error Parm=0 Prob > |T|
INTRCPT 1 11.250720 1.01638598 11.069 0.0001
D2 1 -0.743850 0.29829963 -2.494 0.0153
D3 1 -1.582301 0.26788475 -5.907 0.0001
D4 1 -1.980421 0.31598491 -6.267 0.0001
D5 1 -3.087742 0.58245966 -5.301 0.0001
D6 1 -2.074613 0.37148467 -5.585 0.0001
D7 1 -2.309865 0.37915203 -6.092 0.0001
DD2 1 0.240284 0.40596127 0.592 0.5560
DD3 1 0.133747 0.39036655 0.343 0.7330
DD4 1 0.648161 0.43501731 1.490 0.1412
DD5 1 0.301956 0.38439750 0.786 0.4351
DD6 1 0.665472 0.38586824 1.725 0.0895
DD7 1 0.680742 0.38658443 1.761 0.0831
DD8 1 0.282518 0.37617724 0.751 0.4554
DD9 1 0.829837 0.38524729 2.154 0.0351
DD10 1 0.486656 0.38459756 1.265 0.2104
DD11 1 0.773457 0.38552149 2.006 0.0491
DD12 1 0.555236 0.38479525 1.443 0.1540
DD13 1 0.315302 0.37443996 0.842 0.4029
DD14 1 -0.236656 0.37918145 -0.624 0.5348
PRICE 1 -0.053868 0.00528884 -10.185 0.0001

and competition.The parameters for the time periods merely serve the role of insuring that the other parameters are identical to those of the original nonlinear model. This structure guarantees that the model will produce market-share estimates which are always non-negative and always sum to one over estimates for all alternatives in a choice situation. ince our purpose is to estimate the parameters of an attraction model correctly, it is not justified for us to drop those dummy variables from the regression equation.

Table 5.4 gives the estimation results by equation (5.8) of the MNL version of attraction model (5.1). The independent variables are the same as those of (5.9), except that price itself is used instead of the logarithm of price. The overall pattern of estimated parameters is very similar to those from (5.9). The estimated value of the price elasticity parameter, bp , is -0.054 . Recall that the share elasticity with respect to a marketing variable (price in this case) is given by bp Pit(1 -sit) . If sit is 0.2 and price is 150 yen for a brand, the price elasticity is approximately -6.5 , which agrees well with the estimated elasticity value from equation (5.9).

5.2.3  The Analysis-of-Covariance Representation

It may added that regression models (5.8 - 5.9) are equivalent to an analysis-of-covariance (ANCOVA) model of the following form.

MNL Model:

log(sit) = m+ mi + mt + K
å
k = 1 
bk Xkit +eit

or

MCI Model:

log(sit) = m+ mi + mt + K
å
k = 1 
bk log(Xkit)+ eit

where:

m = the grand mean
mi = the brand main effects (i = 1, 2, ¼, m)
mt = the period main effects (t = 1, 2, ¼, T).

There is no brand-by-period interaction term because there is one observation per brand-period combination. The ANCOVA models yield parameter estimates that are identical to those obtained from models (5.8 - 5.9). This ANCOVA representation clarifies the characteristics of (5.8); an attraction model requires that the period main effects be taken out before the parameters of marketing variables are to be estimated. If we ignore the properties of the error term (discussed in the next section), the ANCOVA model may be convenient to use in practice since it does not require cumbersome specification of brand and period dummy variables.

5.3  Properties of the Error Term

We have deferred the discussion of the analysis of the error term up to this point, though it has been suggested that the error terms in regression models (5.6 - 5.7) and (5.8 - 5.9) are known to have unequal variances and non-zero covariances in some cases and may require special care in estimation. Before we show this, we will have to make some assumptions as to the composition of the error term with respect to the sources of error.

It is important to recognize two sources of errors inherent in the estimation of market-share models. The variability due to sampling is clearly one source of error, but there is another source of error we must consider. Recall that attraction model (5.1) includes an error term, ei , which arises due to the omission of some relatively minor factors from its specification of explanatory variables, the Xkit 's, in (5.1). We will call this source of error the specification error . Considering those sources of error, the error terms in regression models (5.8 - 5.9) may be expressed as

eit = e1it + e2it

where ei1t is the specification-error term and e2it is the sampling-error term.To be precise, the error term in attraction model (5.1) should be written as e1it , but we will not change the notation at this point for the reasons that will become apparent later. he error term in regression model (5.6) is given by subtracting [`(e)]t , the means of eit over i in period t , from eit . Hence we may write

eit* = e1it* + e2it* = (e1it -

e
 


1t 
) + (e2it -

e
 


2t 
)

where [`(e)]1t and [`(e)]2t are respective means of e1it and e2it over i in period t .

5.3.1  Assumptions on the Specification-Error Term

We will make the following assumptions regarding the specification-error term, e1it , throughout the remainder of this book.

  1. e1it is normally distributed with mean 0 and variance si2 ,
  2. the covariance between e1it and e1jt is sij for all t ,
  3. there is no correlation between e1it and e1ju if u ¹ t ,
  4. e1it is uncorrelated with the sampling-error term, e2it .

We have so far made no assumption about the sampling-error term (except that it is uncorrelated with the specification-error term) because the method of data collection greatly affects the properties of sampling errors. Two basic methods of data collection will be distinguished.

One is the survey method in which a sample is randomly drawn from a universe of consumers/buyers. In this case the unit of analysis is the individuals in the sample. One may ask the respondent which brand he/she selected or how many times he/she purchased each brand in a period. Individual selections or purchases are then aggregated over the sample to yield market-share estimates. It may be noted that the so-called consumer panels - diary or optical-scanner - share essentially the same characteristics as the survey method as a data collection technique because the unit of analysis is an individual consumer or household.

Another basic method concerns data gathered from POS system. It should be emphasized that POS-generated market-share data are based on all purchases made in a store in a period and not on the responses obtained from a sample of customers to the store. This means that we need not be concerned with the normal sources of sampling variations (i.e., sampling variations among customers within a store). Our only concern is with sampling variations between stores, since POS data currently available to syndicated users are usually based on a sample of stores. We will deal with each type of data collection method in turn.

5.3.2  Survey Data

Let us assume that a series of samples of consumers or buyers is obtained by a simple random sampling. We assume that an independent sample is drawn for each period (or choice situation). Since the following analysis is limited within a period, time subscript t is dropped for simplicity. As noted above, one may ask the respondent either which brand he/she chose or how many times he/she bought each brand in a period. We will have to treat those two questioning techniques separately.

First consider the case in which each respondent is asked which single brand he/she chose from a set of available brands (= choice set). In this case we may assume that the aggregated responses to the question follow a multinomial choice process. Formally stated, given a sample size n and the probability that a respondent chose brand i is pi (i = 1, 2, ¼,m) (m is the number of available brands), the joint probability that brand i is chosen by ni individuals (i = 1, 2, ¼, m) is given by

P(n1, n2, ¼, nm) = n!

n1! n2! ¼nm!

  m
Õ
i = 1 
pini     .

The market-share estimates are pi = ni/n (i = 1, 2, ¼, m). These estimates are subject to sampling variations.

Let us now turn to the properties of the sampling-error term

e2i = logpi - logpi        (i = 1, 2, ¼,m)

when market-share estimates, the pi 's, are generated by the multinomial process described above. It is well known that for a reasonably large sample size (n > 30 , say), pi is approximately normally distributed with mean pi and variance pi(1 - pi)/n . Given this approximate distribution, we want to know how e2i is distributed. We will use the same technique as that used by Berkson. First, expand logpi by the Taylor expansion around logpi and retain only the first two terms. Then apply the mean-value theorem to obtain

logpi = logpi + æ
ç
è
  pi - pi

pi*

  ö
÷
ø
 

where pi* is a value between pi and pi . This shows that for a reasonably large sample size, logpi is approximately normally distributed with mean logpi and variance pi(1 -pi)/npi*2 . The approximation will improve with the increase in sample size, n . Thus the sampling error is also approximately normally distributed with mean zero and variance pi(1 - pi)/npi*2 . Furthermore, due to the nature of a multinomial process, it is known that e2i and e2j ( j ¹ i) in the same period are correlated and have an approximate covariance -pipj/npi* pj* where pj* is a value between pj and pj . For a reasonably large sample size, we may take

 
Var(e2i)
=
(1-pi)/npi (i = 1, 2, ¼, m)
Cov(e2i, e2j)
=
-1/n                      (j ¹ i)     .
 
(5.10)

Clearly the variance of the error term is a function in pi and takes a minimum value 1/n for pi = 0.5 and a large value for very small values of pi . For example, if pi = 0.01 , the variance of e2i is approximately equal to 99/n . This phenomenon is called heteroscedasticity in the variance of e2i . But we must also be concerned with the covariance between e2i and e2j to the extent 1/n is not negligible.

The above properties of the error term are based on the assumptions that each respondent is asked which brand he/she chose in a given choice situation. The properties change considerably if the respondent is asked how many units of each brand he/she purchased in a period. The individual responses are aggregated over the sample to yield the number of units of brand i bought by the entire sample, xi (i = 1, 2, ¼, m ). The estimate of market share of brand i is given by [^s]i = xi/ x where x is the sum of the xi 's over i . What are the properties of the error term when the logarithm of [^s]i is used as the dependent variable in regression model (5.8 - 5.9) or the log-centered value of [^s]i is used in (5.6 - 5.7)? The answer depends on the assumption we make on the process which generates the xi 's. In general the derivation of the properties of the error term is a complicated task since [^s]i is a ratio of two random variables xi and x , the latter including the former as a part of it. Luckily for us, however, the estimated value of parameters of (5.6 - 5.7) will not change if we used the log-centered value of [`x]i , the mean of xi , in place of the log-centered value of [^s]i in (5.6 - 5.7), since

log æ
ç
ç
ç
ç
ç
è
 
  _
x
 

i 
 

  ~
x
 
 
  ö
÷
÷
÷
÷
÷
ø
= log æ
ç
ç
ç
ç
ç
ç
ç
è
 
  ^
s
 

i 
 

  ~
  ^
s
 
 

 

 
  ö
÷
÷
÷
÷
÷
÷
÷
ø
 

where [x\tilde] and [[^s]\tilde] are the geometric means of xi and [^s]i over i in a given period. This in turn suggests that in regression model (5.8 - 5.9) we may use log(xi) as the dependent variable without changing the estimated values of parameters other than a1 . This reduces our task in analyzing the properties of the error term considerably.

Suppose that the xi 's are generated by an arbitrary multivariate process with means m1, m2, ¼, mm and covariance matrix Q with elements {qij}. Note that the true market share is given by si = mi/ m where m is the sum of the mi 's over i . The sample mean of xi , [`x]i , is an estimate of mi . We obtain the linear approximation of logxi by the usual method, that is,

log _
x
 

i 
= logmi + 1

xi*

( _
x
 

i 
- mi)

where xi* is a value between [`x]i and mi . If we replace log([^s]i) in the equations leading to (5.8 - 5.9) by log[`x]i , the sampling-error term becomes

e2i = log _
x
 

i 
- logmi     .

When the sample size is reasonably large, the approximate variances and covariances among the e2i 's are given by

 
Var(e2i)
=
qii/nm2i
(i = 1, 2, ¼, m)
Cov(e2i,e2j)
=
qij/nmimj
(j ¹ i)     .
 

These results agree with those for the multinomial process, if we note that mi = pi and [`x]i = pi in the latter process. The variance and covariances of the sampling error term are clearly functions of mi and may take a large value if mi or mj are near zero. The existence of heteroscedasticity is obvious.

We now combine the above results with our assumptions on the specification-error term. Under the assumptions of a multinomial choice process and a single choice per individual, the approximate variances and covariances among the ei 's in a same period are given by

 
Var(ei)
=
si2 +Var(e2i)
(i = 1, 2,¼, m)
Cov(ei,ej)
=
sij +Cov(e2i,e2j)
(j ¹ i)
 

where Var(e2i) and Cov(e2i, e2j) are given either by (5.10). Because of the heteroscedasticity of the error term, it is known that the estimated parameters of regression models (5.6 - 5.9) based on the ordinary least-squares (OLS) procedure do not have the smallest variance among the class of linear regression estimators. Nakanishi and Cooper [1974] suggested the use of a two-stage generalized least-squares (GLS) procedure in the case of a multinomial choice situation to reduce the estimation errors associated with regression models (5.6 - 5.9). The interested reader is referred to Appendix 5.14 for more details of this GLS procedure.

5.3.3  POS Data

When the market-share estimates are obtained from POS systems, it is not necessary for us to consider the sampling errors within a store, but, if our market-share data are obtained by aggregating market-share figures for a number of stores, we should expect that there are variations between stores. This presents us the heteroscedasticity problem similar to what we encountered with survey data. But there are additional problems as well. Each store tends to offer its customers a uniquely packaged marketing activities. If we aggregate market-share figures from several stores, we will somehow have to aggregate marketing variables over the stores. As discussed in Chapter 4, aggregation is safe if the causal condition (i.e., promotional variables) are homogeneous over the stores - as might be the case when stores within a grocery chain are combined. One should avoid the ambiguity which results from aggregation, either by explicitly recognizing each individual store or by aggregating only over stores (within grocery chains) with relatively homogeneous promotion policies. We will take this approach in the remainder of this book.

Stated more formally, let siht be the market share of brand i in store h in period t , and Xkiht be the value of the kth marketing variable in store h in period t . Regression model (5.6 - 5.7) may be rewritten with the new notation as

MNL Model:

s*iht = a1 + m
å
j = 2 
a¢j dj + K
å
k = 1 
bk (Xkiht - _
X
 

kht 
) + e*iht
(5.11)

MCI Model:

s*iht = a1 + m
å
j = 2 
a¢j dj + K
å
k = 1 
bk log(Xkiht/ ~
X
 

kht 
) + e*iht
(5.12)

where s*iht is the log-centered value of siht in store h in period t , and [`X]kht and [X\tilde]kht are the arithmetic mean and geometric mean of Xkiht over i in store h in period t .

The main advantage of a disaggregated model such as (5.11 - 5.12) is that we do not have to deal with sampling errors in estimation. Similar expressions may be obtained for (5.7) or (5.9), but in actual applications there will be too many dummy variables which have to be included in the model. It will be necessary to specify (H ×T - 1) dummy variables, where H is the number of stores, which replaces the (T - 1) period dummy variables in (5.8 - 5.9). With only a moderate number of stores and periods it may become impractical to try to include all necessary dummy variables for estimation, in which case the use of models (5.11 - 5.12) is recommended.

5.4  *Generalized Least-Squares Estimation

In the preceding section we noted that the error terms in regression models for estimating parameters of market-share models tend to be heteroscedastic, i.e., have unequal variances and nonzero covariances. If market-share figures are computed from POS data, the error terms in regression models (5.6 - 5.12) involve only what we call specification errors . Let S be the variance-covariance matrix of specification errors with variances si2 (i = 1, 2, ¼, m) on the main diagonal and covariances sij (j ¹ i) as off-diagonal elements. Because matrix S is heteroscedastic, Bultez and NaertBultez, Alain V. & Philippe A. Naert [1975], ``Consistent Sum-Constrained Models,'' Journal of the American Statistical Association, 70, 351 (September) 529-35.proposed an iterative GLS procedure. The steps of an iterative GLS procedure are as follows.

  1. The OLS procedure is used to estimate the parameters in one of the regression models (5.6 - 5.12), and S is estimated from the residual errors.One can simply sort the OLS residuals by brand and time period, compute the variance of each brand's residuals and compute the covariance between ordered residuals for each pair of brands.
  2. The data for each period are re-weighted by the estimated [^(S)]-1/2 .
  3. The first two steps are repeated until the estimated values of the regression parameters converge.

There is one minor problem in applying this iterative procedure. It may be remembered that, in regression model (5.6), the log-centering transformation is applied to the dependent variable, the variance-covariance matrix for the e*it 's is given by

S* = (I - J/m)S(I - J/m)

where I is an identity matrix and J is a matrix, all elements of which are equal to 1. The dimensions of both I and J is m ×m , where m is the number of available brands. [^(S)]* computed from OLS residuals is therefore singular and not invertible. Since regression models (5.8 - 5.9) are equivalent to (5.6 - 5.7), the residuals estimated from the former are identical to those estimated from the latter, and hence the estimated covariance matrices are also identical. In general, if both brand-dummy variables and period- (or store-) dummy variables are inserted in a regression model, the estimated residual covariance matrix becomes singular. This certainly is an impediment to the GLS estimation procedure which requires the inverses of estimated covariance matrices.

There are three methods of circumventing this problem. One is to delete one row and corresponding column from [^(S)]* and invert it. One observation (which corresponds to the deleted row/column of [^(S)]*) per period is deleted and the parameters are estimated on the remaining data. The drawback of this technique is that estimated parameters will be transformations of original parameters, and hence will have to be transformed back to the original, a process which is rather cumbersome. A second method is to set to zero those off-diagonal elements of an estimated residual covariance matrix which are nearly zero. Though theoretically less justifiable, it has its merit in simplicity. Usually it is sufficient to set just a few elements to zero before the inverse may be obtained.If

one wishes to be more formal in this method, one may set to zero those elements which are not significantly different from zero statistically. On the other hand, by setting all off-diagonal elements to zero we obtain an easily implemented, weighted least-squares () procedures which compensates only for differences the variance of specification errors between brands.The third method is to find the generalized inverse of [^(S)]* .

5.4.1  Application of GLS to the Margarine Data

As an illustration of the GLS technique consider the data set given in Table 5.1. The OLS estimation technique applied to regression model (5.8) yielded the parameter estimates in Table 5.3. Residual errors were then computed from the above OLS results and S was estimated. The estimated S and its inverse are shown below. Those elements of the estimated S which were less than 0.3 were set to zero before the matrix was inverted.

Covariance Matrix

 
B 1 2 3 4 5 6 7
 
1 0.183164 -.050222 -.009048 -.063581 0.102546 -.142236 0.020543
2 -.050222 0.386247 -.233128 -.057280 -.156411 -.261190 0.186987
3 -.009048 -.233128 0.302828 0.062024 -.066156 -.020426 -.086928
4 -.063581 -.057280 0.062024 0.234230 -.304044 -.024887 -.078644
5 0.102546 -.156411 -.066156 -.304044 0.359436 0.074360 0.020021
6 -.142236 -.261190 -.020426 -.024887 0.074360 0.880167 -.444807
7 0.020543 0.186987 -.086928 -.078644 0.020021 -.444807 0.289530

Inverse Covariance Matrix

 
B 1 2 3 4 5 6 7
 
1 -62.656 13.7604 -13.3493 14.4918 47.1276 -65.075 -108.934
2 13.760 -0.4766 3.5862 -8.6329 -13.2922 12.164 17.728
3 -13.349 3.5862 1.5801 -2.0956 6.5368 -12.807 -22.087
4 14.492 -8.6329 -2.0956 -4.2420 -14.5748 13.098 23.917
5 47.128 -13.2922 6.5368 -14.5748 -36.9378 45.266 76.131
6 -65.075 12.1642 -12.8072 13.0983 45.2664 -61.315 -102.342
7 -108.934 17.7275 -22.0868 23.9170 76.1315 -102.342 -165.360

The square-root of the above inverse matrix was pre-multiplied by the data matrix for each week, and the estimates of the following form are obtained.

(a1, a2, ¼, am , bp)¢ = [ T
å
t = 1 
(Xt¢ ^
S
 
-1
 
Xt) ]-1 [ T
å
t = 1 
(Xt¢ ^
S
 
-1
 
yt) ]

where Xt is the independent variable matrix and yt is the vector of the dependent variable for period t . The re-estimated parameter values are shown in Table 5.5.

Table 5.5: GLS Estimates for Table 5.3

 
  Parameter   Parameter
Variable Estimate Variable Estimate
 
Intercept 45.4977    
D2 -0.6529 DD6 0.4764
D3 -1.505 DD7 0.4892
D4 -1.8942 DD8 0.0709
D5 -3.4476 DD9 0.6449
D6 -2.0313 DD10 0.5546
D7 -2.2964 DD11 0.6610
DD2 -0.1283 DD12 0.2626
DD3 -0.1412 DD13 0.0022
DD4 0.4260 DD14 -0.3082
DD5 0.1464 LOG(PRICE) -8.4395

Table 5.5 gives the so-called two-stage GLS estimates. If necessary, residual errors and S may be computed from the above results again and another GLS estimates may be obtained. But, since the parameter estimates in Table 5.3 are extremely close to those in Table 5.5, further iterations seem unnecessary. In fact it has been our experience that OLS and GLS estimates are very similar in many cases. The OLS procedure appears satisfactory in many applications.

So far we have reviewed estimation techniques applicable to relatively simple attraction models (5.1). We have shown in Chapter 3 that attraction models may be extended to include differential effects and cross effects between brands. In the following sections we will discuss more advanced issues related to the parameter estimation of differential-effects and cross-effects (fully extended) models.

5.5  Estimation of Differential-Effects Models

The differential-effects version of attraction model (5.1) is expressed as follows.

 
Ai
=
exp(ai +ei) K
Õ
k = 1 
fk(Xki)bki
si
=
Ai / m
å
j = 1 
Aj
 
(5.13)

where either an identity or exponential transformation may be chosen for fk , depending on whether an MCI or MNL model is desired. The chief difference between (5.1) and (5.13) is the fact that parameter bki has an additional subscript i , suggesting that the effectiveness (and hence the elasticity) of a marketing variable may differ from one brand to the next. This is certainly a plausible model in some situations and worth calibrating.

The estimation of parameters bki (i = 1, 2, ¼, m) is not extremely complicated. Only a slight modification of regression models (5.6 - 5.9) achieves the result. Using the previous definitions for dummy variables dj and Du , the differential-effects versions of regression models (5.6 - 5.7) are given by

MNL Model:

s*it = m
å
j = 2 
aj(dj - 1

m

) + K
å
k = 1 
  m
å
j = 1 
bki (dj - 1

m

)Xkit +e*it
(5.14)

MCI Model:

s*it = m
å
j = 2 
aj(dj - 1

m

) + K
å
k = 1 
  m
å
j = 1 
bki (dj - 1

m

) logXkit +e*it     .
(5.15)

In regression models (5.14 - 5.15) the independent variables are replaced by each variable multiplied by (dj - 1/m), which equals (1 - 1/m) if j = i , and -1/m otherwise. Thus the number of independent variables is (m×K) + m - 1 . Note that regression models (5.14 - 5.15) will have to be estimated without the intercept term. Most regression programs provide us with this option.If an intercept term is included, its estimated value will be zero. We cannot obtain the estimate of a1 from (5.14) or (5.15), but this poses no problem in computing market shares since the estimated value of ai is actually the difference between true ai and a1 .Rather than automatically assigning a1 as the brand intercept to drop, one can run the regression with all brand intercepts (which will be a singular model) and find the intercept closest to zero as the one to drop.Similarly regression models (5.8 and 5.9) may be modified as follows for their respective differential-effect versions.

MNL Model:

logsit = a1 + m
å
j = 2 
a¢j dj + T
å
u = 2 
gu Du + K
å
k = 1 
  m
å
j = 1 
bki dj Xkit +eit
(5.16)

MCI Model:

logsit = a1 + m
å
j = 2 
a¢j dj + T
å
u = 2 
gu Du + K
å
k = 1 
  m
å
j = 1 
bki dj logXkit +eit
(5.17)

Regression models (5.14 - 5.15) and (5.16 - 5.17) yield identical estimates of parameters a 's (except a1) and b 's. If the number of periods (or choice situations) is large, (5.14 - 5.15) will be preferred.

The reader may feel that the following regression models are more straightforward modifications of (5.6 - 5.7), but it is not the case.

MNL Model:

s*it = a1+ m
å
j = 2 
a¢j dj+ K
å
k = 1 
  m
å
j = 1 
bki dj(Xkit - _
X
 

kt 
) + e*it
(5.18)

MCI Model:

s*it = a1+ m
å
j = 2 
a¢j dj+ K
å
k = 1 
  m
å
j = 1 
bki dj Xkit* + e*it
(5.19)

Models (5.18 - 5.19) do not represent an attraction model, but a log-linear market-share model in which the share of brand i is specified as

si = exp(ai + ei) K
Õ
k = 1 
fk(X*ki)bki

where X*ki is a centered value of Xki , that is, ( Xkit -[`X]kt) if fk is an exponential transformation and (Xkit/ [X\tilde]kt) if fk is an identity transformation. While these models themselves may have desirable features as market-share models, models (5.18 - 5.19) are not the estimating equations for (5.13).The difference here is that (5.14 - 5.15) log-center the differential-effect variable, while (5.18 - 5.19) log-center the simple-effect variable and then multiply these log-centered variables by the brand-specific dummy variables.

Let us see what those modifications mean from the illustrative data of Table 5.1. The independent variable in this case is price. In order to estimate regression model (5.17) (for an MCI version), data must be arranged as in Table 5.6. Only the dependent variable and a part of explanatory variables (log(price) × brand dummy variables) are shown. The week and brand dummy variables are the same style as in Table 5.2.

The estimation results are shown in Table 5.7. The fit of model, as measured by R2 , improved from 0.736 to 0.826. The gain from adding six more independent variables (LPD1 through LPD7 instead of LOG(PRICE)) may be measured by the incremental F-ratio 4.9386 ( = (86.8406-77.3339)/(6 × .32083)), which is significant at the .99 level (df = 6, 57). This shows that the differential-effect model is a significant improvement over the explanatory power of the simple-effects model. The estimated parameter values are markedly different from one brand to the next. Looking at the price-parameter estimates, we note that a larger size tends to be more price sensitive than a smaller size even within a brand. Brands 2 and 4 have greater (in absolute values) values than brands 1 and 2. Brand 5 is most price sensitive with the estimated value of -24.08, but this may reflect the fact that this brand's share was zero and hence not available for estimation for 10 weeks out of 14. We shall discuss this issue in a later section. Two brands, 6 and 7, are not price sensitive. Their price parameters are not statistically different from zero as indicated by their respective ``Prob. > |T|'' values. As to the estimates of a 's, we may note that they are negatively correlated with price-parameter estimates over brands, but we will not attempt to make generalizations on the basis of this single example.

The arrangement of data for estimating model (5.15) is given in Table 5.8. Only the dependent variable and the price × brand dummy

Table 5.6: Data Set for Differential-Effects Model

 
W B Log Log(Price) × Brand Dummy Variables
e r                
e n Share LPD1 LPD2 LPD3 LPD4 LPD5 LPD6 LPD7
k d                
 
1 1 1.38629 5.2575 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 2 3.93183 0.0000 4.9381 0.0000 0.0000 0.0000 0.0000 0.0000
1 3 1.09861 0.0000 0.0000 5.0626 0.0000 0.0000 0.0000 0.0000
1 4 1.09861 0.0000 0.0000 0.0000 4.9836 0.0000 0.0000 0.0000
1 5 . 0.0000 0.0000 0.0000 0.0000 5.0938 0.0000 0.0000
1 6 0.00000 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520 0.0000
1 7 2.19722 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4.9972
2 1 0.69315 5.2575 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
2 2 4.31749 0.0000 4.9416 0.0000 0.0000 0.0000 0.0000 0.0000
2 3 0.69315 0.0000 0.0000 5.0626 0.0000 0.0000 0.0000 0.0000
2 4 0.00000 0.0000 0.0000 0.0000 5.1358 0.0000 0.0000 0.0000
2 5 . 0.0000 0.0000 0.0000 0.0000 5.0938 0.0000 0.0000
2 6 . 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520 0.0000
2 7 1.60944 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4.9972
3 1 1.09861 5.2575 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
3 2 3.87120 0.0000 4.9309 0.0000 0.0000 0.0000 0.0000 0.0000
3 3 0.00000 0.0000 0.0000 5.0626 0.0000 0.0000 0.0000 0.0000
3 4 0.00000 0.0000 0.0000 0.0000 5.1358 0.0000 0.0000 0.0000
3 5 3.04452 0.0000 0.0000 0.0000 0.0000 4.6052 0.0000 0.0000
3 6 . 0.0000 0.0000 0.0000 0.0000 0.0000 4.9273 0.0000
3 7 2.56495 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4.8904
4 1 1.38629 5.2575 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
4 2 3.78419 0.0000 4.9345 0.0000 0.0000 0.0000 0.0000 0.0000
4 3 3.17805 0.0000 0.0000 4.9345 0.0000 0.0000 0.0000 0.0000
4 4 . 0.0000 0.0000 0.0000 5.1358 0.0000 0.0000 0.0000
4 5 . 0.0000 0.0000 0.0000 0.0000 5.0938 0.0000 0.0000
4 6 . 0.0000 0.0000 0.0000 0.0000 0.0000 4.9972 0.0000
4 7 2.39790 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520
5 1 1.60944 5.2575 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
5 2 3.13549 0.0000 4.9345 0.0000 0.0000 0.0000 0.0000 0.0000
5 3 2.30259 0.0000 0.0000 4.9488 0.0000 0.0000 0.0000 0.0000
5 4 0.00000 0.0000 0.0000 0.0000 5.1358 0.0000 0.0000 0.0000
5 5 . 0.0000 0.0000 0.0000 0.0000 5.0938 0.0000 0.0000
5 6 3.25810 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520 0.0000
5 7 1.94591 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520
6 1 1.79176 5.2575 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
6 2 1.79176 0.0000 5.1705 0.0000 0.0000 0.0000 0.0000 0.0000
6 3 1.09861 0.0000 0.0000 5.0626 0.0000 0.0000 0.0000 0.0000
6 4 0.69315 0.0000 0.0000 0.0000 5.1358 0.0000 0.0000 0.0000
6 5 . 0.0000 0.0000 0.0000 0.0000 5.0938 0.0000 0.0000
6 6 3.58352 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520 0.0000
6 7 2.56495 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 4.8520

Table 5.7: Regression Results for Differential-Effects Model (MCI)

 
Model: MODEL1      
Dep Variable: LSHARE      
Analysis of Variance
    Sum of Mean    
Source DF Squares Square F Value Prob > F
Model 26 86.84061 3.34002 10.411 0.0001
Error 57 18.28703 0.32083    
C Total 83 105.12764      
Root MSE 0.56641 R-Square 0.8260  
Dep Mean 1.92529 Adj R-Sq 0.7467  
C.V. 29.41967      
 
Parameter Estimates
    Parameter Standard T for H0  
Variable DF Estimate Error Parm=0 Prob > |T|
INTRCPT 1 36.797056 9.03858643 4.071 0.0001
D2 1 28.212012 11.56852482 2.439 0.0179
D3 1 1.325003 11.79902587 0.112 0.9110
D4 1 12.426886 11.09247984 1.120 0.2673
D5 1 77.155688 41.01572380 1.881 0.0651
D6 1 -32.861595 16.87525706 -1.947 0.0564
D7 1 -43.161568 18.20494075 -2.371 0.0211
DD2 1 0.144666 0.34886522 0.415 0.6799
DD3 1 0.160885 0.34227101 0.470 0.6401
DD4 1 0.783674 0.38370743 2.042 0.0458
DD5 1 0.560437 0.33938645 1.651 0.1042
DD6 1 1.070890 0.34384160 3.114 0.0029
DD7 1 1.087786 0.34488085 3.154 0.0026
DD8 1 0.479316 0.33693519 1.423 0.1603
DD9 1 0.997026 0.34999923 2.849 0.0061
DD10 1 0.689770 0.35708659 1.932 0.0584
DD11 1 1.035196 0.35245369 2.937 0.0048
DD12 1 0.565334 0.35135768 1.609 0.1131
DD13 1 0.176690 0.34733555 0.509 0.6129
DD14 1 0.107222 0.36223872 0.296 0.7683
LPD1 1 -6.837585 1.72929552 -3.954 0.0002
LPD2 1 -12.511968 1.47178224 -8.501 0.0001
LPD3 1 -7.357565 1.51269846 -4.864 0.0001
LPD4 1 -9.629287 1.46960177 -6.552 0.0001
LPD5 1 -24.078656 8.34529863 -2.885 0.0055
LPD6 1 -0.478779 2.78016380 -0.172 0.8639
LPD7 1 1.657518 3.33624420 0.497 0.6212

Table 5.8: Log-Centered Differential-Effects Data

 
  B                
W r Log- Centered Log(Price) × Brand Dummy Variables
e a Centered              
e n Share LPD1 LPD2 LPD3 LPD4 LPD5 LPD6 LPD7
k d                
 
1 1 -0.232 4.381 -0.823 -0.844 -0.831 0.000 -0.809 -0.833
1 2 2.313 -0.876 4.115 -0.844 -0.831 0.000 -0.809 -0.833
1 3 -0.520 -0.876 -0.823 4.219 -0.831 0.000 -0.809 -0.833
1 4 -0.520 -0.876 -0.823 -0.844 4.153 0.000 -0.809 -0.833
1 6 -1.619 -0.876 -0.823 -0.844 -0.831 0.000 4.043 -0.833
1 7 0.578 -0.876 -0.823 -0.844 -0.831 0.000 -0.809 4.164
2 1 -0.769 4.206 -0.988 -1.013 -1.027 0.000 0.000 -0.999
2 2 2.855 -1.052 3.953 -1.013 -1.027 0.000 0.000 -0.999
2 3 -0.769 -1.052 -0.988 4.050 -1.027 0.000 0.000 -0.999
2 4 -1.463 -1.052 -0.988 -1.013 4.109 0.000 0.000 -0.999
2 7 0.147 -1.052 -0.988 -1.013 -1.027 0.000 0.000 3.998
3 1 -0.665 4.381 -0.822 -0.844 -0.856 -0.768 0.000 -0.815
3 2 2.108 -0.876 4.109 -0.844 -0.856 -0.768 0.000 -0.815
3 3 -1.763 -0.876 -0.822 4.219 -0.856 -0.768 0.000 -0.815
3 4 -1.763 -0.876 -0.822 -0.844 4.280 -0.768 0.000 -0.815
3 5 1.281 -0.876 -0.822 -0.844 -0.856 3.838 0.000 -0.815
3 7 0.802 -0.876 -0.822 -0.844 -0.856 -0.768 0.000 4.075
4 1 -1.300 3.943 -1.234 -1.234 0.000 0.000 0.000 -1.213
4 2 1.098 -1.314 3.701 -1.234 0.000 0.000 0.000 -1.213
4 3 0.491 -1.314 -1.234 3.701 0.000 0.000 0.000 -1.213
4 7 -0.289 -1.314 -1.234 -1.234 0.000 0.000 0.000 3.639
5 1 -0.432 4.381 -0.822 -0.825 -0.856 0.000 -0.809 -0.809
5 2 1.094 -0.876 4.112 -0.825 -0.856 0.000 -0.809 -0.809
5 3 0.261 -0.876 -0.822 4.124 -0.856 0.000 -0.809 -0.809
5 4 -2.042 -0.876 -0.822 -0.825 4.280 0.000 -0.809 -0.809
5 6 1.216 -0.876 -0.822 -0.825 -0.856 0.000 4.043 -0.809
5 7 -0.096 -0.876 -0.822 -0.825 -0.856 0.000 -0.809 4.043
6 1 -0.129 4.381 -0.862 -0.844 -0.856 0.000 -0.809 -0.809
6 2 -0.129 -0.876 4.309 -0.844 -0.856 0.000 -0.809 -0.809
6 3 -0.822 -0.876 -0.862 4.219 -0.856 0.000 -0.809 -0.809
6 4 -1.227 -0.876 -0.862 -0.844 4.280 0.000 -0.809 -0.809
6 6 1.663 -0.876 -0.862 -0.844 -0.856 0.000 4.043 -0.809
6 7 0.644 -0.876 -0.862 -0.844 -0.856 0.000 -0.809 4.043

variables are shown. In addition, we need (dj - 1/m), where dj is the usual brand dummy variable for each brand. Note that all variables sum to zero within each week. Note also that those observations for which log(share) is missing are deleted prior to centering. The estimated values of a2, a3, ¼, am , bp1, bp2,¼, bpm based on the data in Table 5.8 are identical to those given in Table 5.7.

6  Collinearity in Differential-Effects Models

Bultez and Naert [1975] reported that estimating the parameters of a differential-effects model by equations (5.14) and (5.15) was greatly inconvenienced by the existence of model-induced collinearity. To see their point, consider the data set shown in Table 5.9.

Table 5.9: Hypothetical Data for Differential-Effects Model

 
  B              
W r log X1 ×Brand Dummies X2 × Brand Dummies
e a              
e n share            
k d   X1D1 X1D2 X1D3 X2D1 X2D2 X2D3
 
1 1 log(s11) X111 0 0 X211 0 0
1 2 log(s21) 0 X121 0 0 X221 0
1 3 log(s31) 0 0 X131 0 0 X231
2 1 log(s12) X112 0 0 X212 0 0
2 2 log(s22) 0 X122 0 0 X222 0
2 3 log(s32) 0 0 X132 0 0 X232
3 1 log(s13) X113 0 0 X213 0 0
3 2 log(s23) 0 X123 0 0 X223 0
3 3 log(s33) 0 0 X133 0 0 X233
. . . . . . . . .
. . . . . . . . .

This data set is for the estimation of regression model (5.16) in which three brands and two independent variables are assumed. (In actual estimation we will need brand and week dummy variables in addition to the variables above.) Collinearity (i.e., high correlations between two or more independent variables) is observed between independent variables for the same brand, e.g., between X1D1 and X2D1 , between X1D2 and X2D2 , between X1D3 and X2D3 , and so forth. The reason for this phenomenon is demonstrated mathematically later in this section, but is easy to understand. Take variables called X1D1 and X2D1 for example. Those two variables have many zeroes in common for the same observations (weeks). When one takes the correlations between the two variables, those common zeroes artificially inflate the value of the correlation coefficient.

Because of the potential for artificially inflated correlations Bultez and Naert warned against careless usage of differential-effect models. Their warning was, however, somewhat premature. There are two aspects to the problem - the first concerning numerical analysis, and the second concerning the stability of parameters estimates.

Problems arise in numerical analysis when the crossproducts matrix for a regression model becomes singular or so nearly so that it cannot be inverted accurately. But, the crossproducts matrix for regression model (5.15) has a unique structure which is robust against high correlations induced by the model structure. (This is not to say that it is robust against any high correlations.) To simplify the discussion, assume that observations are taken only for three weeks. Then the number of independent variables in regression will be 11 (the intercept term, two week dummy variables, two brand dummy variables, and six variables X1D1 through X2D3). The crossproduct matrix for this set of variables will look as follows.

  æ
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
è
 
9
3
3
3
3
SX11t
SX12t
SX13t
SX21t
SX22t
SX23t
3
3
0
1
1
X112
X122
X132
X212
X222
X232
3
0
3
1
1
X113
X123
X133
X213
X223
X233
3
1
1
3
0
0
SX12t
0
0
SX22t
0
3
1
1
0
3
0
0
SX13t
0
0
SX23t
SX11t
X112
X113
0
0
SX11t2
0
0
SX11tX21t
0
0
SX12t
X122
X123
SX12t
0
0
SX12t2
0
0
SX12t X22t
0
SX13t
X132
X133
0
SX13t
0
0
SX13t2
0
0
SX13t X23t
SX21t
X212
X213
0
0
SX11t X21t
0
0
SX21t2
0
0
SX22t
X222
X223
SX22t
0
0
SX12tX22t
0
0
SX22t2
0
SX23t
X232
X233
0
SX23t
0
0
SX13t X23t
0
0
SX33t2
  ö
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
ø
 

In the above matrix summation is always over t (in this case over three weeks).

Collinearity in regression becomes a numerical-analysis problem when the crossproduct matrix such as above is nearly singular and thus the determinant is near zero. Since this matrix is in a block-matrix form, the critical issue is if sub-matrix

  æ
ç
ç
ç
ç
ç
ç
ç
ç
ç
ç
è
 
SX11t2
0
0
SX11t X21t
0
0
0
SX12t2
0
0
SX12t X22t
0
0
0
SX13t2
0
0
SX13t X23t
SX11t X21t
0
0
SX21t2
0
0
0
SX12t X22t
0
0
SX22t2
0
0
0
SX13t X23t
0
0
SX33t2
  ö
÷
÷
÷
÷
÷
÷
÷
÷
÷
÷
ø
 

is invertible. This matrix may be put in the form of a block-diagonal matrix by simple row-column operations and thus is invertible, if each of the following three matrices is invertible.

  æ
ç
è
 
SX11t2
SX11t X21t
SX11t X21t
SX21t2
  ö
÷
ø
  æ
ç
è
 
SX12t2
SX12t X22t
SX12t X22t
SX22t2
  ö
÷
ø
  æ
ç
è
 
SX13t2
SX13t X23t
SX13t X23t
SX33t2
  ö
÷
ø
 

This is to say that original correlations between X11 and X21 , X12 and X22 , and X13 and X23 over t are low. This is true even if the apparent (model-induced) correlations between them are high. The important condition for the invertibility of the cross-product matrix as a whole is that the correlations between original variables Xkit and Xhit (h ¹ k) over t are not too high to begin with. (If the correlations between original variables are high, composite measures, such as those based on principal components, will have to be used for any differential-effects market-share model to be effective!) This conclusion does not change if the independent variables are the logarithms of original variables Xki 's. Thus the numerical-analysis problems created by collinearity in the usual sense are not the real issues in this case.

Even though the matrix will usually be invertible, collinearity can still harm the regression estimates. A further look at the source and remedies for collinearity in these models is helpful. Since Bultez and Naert's [1975] discussion of the problem, their warning about collinearity in differential-effects attraction models has been echoed by Naert and Weverbergh and others.Naert, Philippe A. & Marcel Weverbergh [1981], ``On the Prediction Power of Market Share Attraction Models,'' Journal of Marketing Research, 18 (May), 146-153. Naert, Philippe A. & Marcel Weverbergh [1985], ``Market Share Specification, Estimation and Validation: Toward Reconciling Seemingly Divergent Views,'' Journal of Marketing Research , 22 (November), 453-61. Brodie, Roderick & Cornelius A. de Kluyver [1984], ``Attraction Versus Linear and Multiplicative Market Share Models: An Empirical Evaluation,'' Journal of Marketing Research, 21 (May), 194-201. Ghosh, Avijit, Scott Neslin & Robert Shoemaker [1984], ``A Comparison of Market Share Models and Estimation Procedures,'' Journal of Marketing Research, 21 (May), 202-210. Leeflang, Peter S. H. & Jan C. Reuyl [1984a], ``On the Predictive Power of Market Share Attraction Models,'' Journal of Marketing Research 21 (May), 211-215. Leeflang, Peter S. H. & Jan C. Reuyl [1984b], ``Estimators of the Disturbances in Consistent Sum-Constrained Market Share Models,'' Working Paper, Faculty of Economics, University of Gronigen, P.O. Box 9700 AV Gronigen, The Netherlands.While most of these articles also investigated differential-effects versions of multiplicative and linear-additive market-share models, no mention has been made in the marketing literature of possible collinearities in these model forms.

This section shows that the linear-additive and multiplicative versions of differential-effects market-share models suffer from the same sources of collinearities as the MCI and MNL versions. It is shown that the structural sources of collinearity are largely eliminated by two standardizing transformations - zeta-scores or the exponential transform of a standard z-score - discussed in section 3.8.

5.6.1  Three Differential-Effects Models

The three basic specifications of the differential-effects market-share models - linear-additive (LIN), multiplicative (MULT), and multiplicative competitive-interaction (MCI) or attraction versions - are given in equations (5.20 - 5.22) parallel to the definitions in Naert & Weverbergh's [1984] equations:

LIN

sit = ai + K
å
k = 1 
bki ft(Xkit)+ eit
(5.20)

MULT

sit = Ait
(5.21)

and MCI

sit = Ait
  m
å
j = 1 
Ajt
 
(5.22)

where:Note here we are focusing on ft rather than fk . We will assume we have agreed on the model type (MCI in this case, that is, fk is the identity transformation) and our interest here is in the possible influence of transformations within a choice situation on collinearity.

Ait = (ai +eit) K
Õ
k = 1 
[ ft(Xkit) ]bki     .

All of these models are reduced to their corresponding simple-effects versions by assuming:

bki = bkj = bk "  i,j     .

The reduced formThe reduced form is simply the variables after they are transformed to be ready for input into a multiple-regression routine.resulting from this simplified estimation procedure allows us to see the similarities among all three specifications of the differential-effects model, as seen in Tables 5.2 and 5.9. Note in Table 5.9 that each differential effect has only one nonzero entry in each time period. The difference between LIN and MULT models is just that the MULT model uses the log of the variable as the nonzero entry and the LIN model uses the raw variable. The difference between the MULT and MCI models is basically that the MCI form incorporates a series of time-period dummy variables from Table 5.2 which insure that the estimated parameters are those of the original nonlinear model in equation (3.1). Another difference, of course, is that the estimates of market share in the MCI model come from inverse log-centering,Nakanishi & Cooper [1982].while in the MULT model the exponential transformation of the estimated dependent variable serves as the market-share estimate. Inverse log-centering and the time-period dummy variables guarantee that the MCI model will provide logically consistent market-share estimates (all estimates being between zero and one, and summing to one over all brands in each time period), while neither LIN or MULT provide logically consistent estimates.

The problem of collinearity can be traced to within-brand effects. There is zero correlation between a time-period dummy variable and a brand-specific dummy variable. Since the time-period dummy variables cannot be a major source of collinearity, then the MULT and MCI models do not differ substantially in their sources of collinearity. Nor do the correlations between effects for different brands contribute substantially to collinearity. For m brands the correlation between brand-specific dummy variables for different brands is -1/(m-1) . With even ten brands there is only 1% overlap in variance between intercepts for different brands. An analogous result holds for the correlations between dummy variables for different time periods. The within-brand effects are analyzed in the next section.

5.6.2  Within-Brand Effects

The special problems of jointly longitudinal and cross-sectional analysis have been discussed in psychometrics, econometrics, as well as the quantitative-analysis areas in education, sociology, and geography. The earliest reference is to Robinson'sRobinson, W. S. [1950], ``Ecological Correlation and the Behavior of Individuals,'' American Sociological Review, 15, 351-357.covariance theorem, which was presented by AlkerAlker, Hayward R. Jr. [1969], ``A Typology of Ecological Fallacies,'' in Mattei Dogan & Stein Rokkan (editors), Quantitative Ecological Analysis in the Social Sciences, Cambridge, MA: The M.I.T. Press, 69-86.as:

rXY = WRXY   æ
Ö
 

1-EYR2
 

    æ
Ö
 

1-EXR2
 

+ERXYEYREXR
(5.23)

where:

rXY is the correlation between column X and column Y in the reduced form of the differential-effects model. In this application X and Y represent within-brand effects such as price and advertising for one brand.
WRXY is defined to be the pooled within-period correlation of X and Y. In our case this simplifies to a congruence coefficient, giving very high values under certain conditions discussed below.
ERXY is the between-period or ecological correlation. In our case this is the simple correlation between, say, the log of price and the log of advertising values for a single brand.
EYR and EXR are the correlation ratios (i.e., the proportions of variation in X and Y, respectively, that are attributable to between-period differences). In our case these values control how much weight is given to the congruence coefficient versus the simple correlation.

Looking again at Table 5.9 shows that for differential effects within a brand, all the nonzero entries are aligned and all the zero entries are aligned in the reduced form, and there is only one nonzero entry in each time period. This results in very simplified forms for the components of Robinson's covariance theorem. If we let xt and yt be the single nonzero entries in period t for column X and Y , respectively, then for our special case:

WRXY =
  T
å
t = 1 
xt yt

    æ
Ö
 
  T
å
t = 1 
xt2 T
å
t = 1 
yt2

 

 
    .

This is a congruence coefficient, often used for assessing the agreement between ratio-scaled measures.Tucker, Ledyard R [1951], ``A Method of Synthesis of Factor Analysis Studies,'' Personnel Research Section Report, No. 984, Washington, D.C., Department of the Army. Also see Korth, Bruce & Ledyard R Tucker [1975], ``The Distribution of Chance Coefficients from Simulated Data,'' Psychometrika, 40, 3 (September), 361-372. ecause the mean levels of the variables influence the congruence, x and y of the same sign push WRXY toward 1.0 much faster than the simple correlation. For prices (greater than $1.00) and advertising expenditures the reduced form would have a series of positive log-values which might well have a very large value for WRXY. For these same variables in share form (price-share or advertising-share), the reduced form would have matched negative numbers, which still could lead to large values for WRXY. For variables of consistently opposite signs, WRXY could push toward -1.0 even in cases of modest simple correlations.

For both raw variables (e.g., price and advertising) and for marketing variables in their share form (e.g., relative price and advertising share) the correlation ratios EXR2 and EYR2 have a maximum value of 1/m .

EXR2 =
  1

m2

  é
ê
ê
ê
ê
ë
  T
å
t = 1 
  Xjt2

T

- æ
ç
ç
ç
ç
è
 
  T
å
t = 1 
Xjt

T

  ö
÷
÷
÷
÷
ø
2


 
  ù
ú
ú
ú
ú
û
 

  1

m

  é
ê
ê
ê
ê
ë
  T
å
t = 1 
  Xjt2

T

- 1

m

  æ
ç
ç
ç
ç
è
 
  T
å
t = 1 
Xjt

T

  ö
÷
÷
÷
÷
ø
2


 
  ù
ú
ú
ú
ú
û
 
£ 1

m

    .
(5.24)

So when correlating two effects within a brand we have at best:

rXY = æ
ç
è
  m-1

m

  ö
÷
ø
WRXY + æ
ç
è
  1

m

  ö
÷
ø
ERXY     .
(5.25)

Thus the correlation rXY is composed of two parts. A small part, at most 1/m , is due to the simple correlation of the X and Y values for brand j over time periods. A very large part, at least [(m-1)/ m] is due to the congruence coefficient WRXY. Thus, for raw-score or share-form marketing variables, pairwise collinearity is likely for any two effects within a brand in differential-effects models. But collinearity is not merely a pairwise problem in these models.For further discussion see Mahajan, Vijay, Arun K. Jain & Michel Bergier [1977], ``Parameter Estimation in Marketing Models in the Presence of Multicollinearity: An Application of Ridge Regression,'' Journal of Marketing Research, 14 (November), 586-591.Collective collinearity for all the within-brand effects is very likely indeed. This is true for the differential-effects versions of the linear-additive model, the multiplicative model as well as the MCI model. Fortunately there exist simple remedies which are the topic of the next section.

5.6.3  Remedies

The remedies for collinearity were hinted at in the Bultez and Naert [1975] article which first discussed the problem. They said, ``... if the variables have zero means'' the correlations in the extended model would be the same as the correlation in the simple model (p. 532). More precisely, it can be said that if the reduced form of the values for brand i for two different variables each have a mean of zero over time periods, then WRXY is equal to ERXY, and thus rXY would be equal to the simple correlation of the reduced forms of the brand i values. This remedy is not a general solution for all variables in a differential-effects model because forming deviation scores within a brand over time ignores competitive effects. One case where this remedy might be appropriate, however, is for a variable reflecting the promotion price of a brand. This variable would reflect current price as a deviation from a brand's historic average price.

As potential remedies, consider zeta-scores and the exponential transformation of standard scores discussed in Chapter 3 (section 3.8). Both transformations standardize the explanatory variables, making the information relative to the competitive context in each time period. There are several advantages to standardizing measures of marketing instruments in each time period. First, one should remember that the dependent measures (share or choice probability) are expressed in a metric which, while normalized rather than standardized, is still focused on representing within time-period relations. Representations of the explanatory variables which have a have similar within time-period focus have the advantage of a compatible metric. In this respect, variables expressed in share form have as much of an advantage as zeta-scores or exp(z-scores). Any of the three would be superior to raw scores in reflecting the explanatory information in a way which aligns with the dependent variable. While raw prices might have a stronger relation with category volume or primary demand, relative prices could have more to do with how the total volume is shared among the competitors.

A second advantage applies to standardizations, rather than normalizations. In the reduced form , the means (of a brand over time periods) of a zeta-score or exp(z-score) are more likely to be closer to zero, than the corresponding means of the reduced form of a normalized variable. Thus WRXY for a zeta-score or exp(z-score) would be less inflated (closer to the value of the simple correlation ERXY) than would be the congruence coefficient for two within-brand effects represented in share form.

Table 5.10 provides an empirical demonstration of the effects on collinearity of zeta-scores and exp(z-scores), compared with the raw scores or the share scores. The data concern price and advertising measures representing competition among 11 brands in an Australian household-products category.Carpenter, Cooper, Hanssens, and Midgley [1988].There are 11 differential-price effects, 10 differential-advertising effects, and 10 brand-specific intercepts in a differential-effects market-share model for this category. The tabled values are condition indices reflecting the extent of collinearity or near dependencies among the explanatory variables. A condition index is the ratio of the largest singular value (square root of the eigenvalue) to the smallest singular value of the reduced form of the explanatory variables in the market-share model.Belsley, David A., Edwin Kuh & Roy E. Welsch [1980], Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, New York: John Wiley & Sons, 103-4.The higher the condition index the worse the collinearity in the system of equations. Belsley, Kuh, and Welsch [1980] develop empirical evidence that weak dependencies are associated with condition indices between 5 and 10, moderate to strong relations are associated with indices of 30 to 100, and indices of 100 or more ``appear to be large indeed, causing substantial variance inflation and great potential harm to regression estimates''(p. 153). Note in Table 5.10 for raw scores, Xkit, all three models (LIN, MULT, and MCI) reflect potential problems. These problems are not remedied when marketing instruments are expressed in share form. As a market-share model, which uses the share form of marketing instruments, becomes more comprehensive, by including more brands, the problems would worsen. This is because the price shares and advertising shares would, in general, become smaller, thus making the log of the shares negative numbers of larger and larger absolute value. This would press WRXY closer to +1.0.

Table 5.10: Condition Indices Australian Household-Products Example

 
  Transformation of Raw Scores
Model Raw Scores Share Form Zeta-Scores Exp(Z-Scores)
 
LIN 3065 313 61 75
MULT 484 3320 22 17
MCI 627 3562 24 23

Standardizing within each competitive set using zeta-scores or exp(z-scores) has a dramatically favorable impact on the collinearity of the system of equations. The condition indices for the MULT and MCI models are less than 25. This is below the level indicating moderate collinearity, and far below the danger point.The absolute standards given by Belsley Kuh and Welsh [1980] for condition indices are probably too conservative. As the number of variables and observations increases we can expect the ratio of the largest and smallest singular values to grow larger. Further study is needed to see what boundaries are acceptable for large data sets.Linear or nonlinear trends in the mean level of the raw variables are major contributors to collinearity. By removing the mean level of the raw variables in each time period, the two remedies illustrated in Table 5.10 both eliminate one major source contributing to high (positive or negative) values in WRXY. By standardizing the variance over competitors in each time period, both remedies help keep the mean values for each brand over time nearer to zero.

These basic results mean that, if one standardizes variables in a manner appropriate for these multiplicative models, it is practical to use differential-effects market-share models.

5.7  Estimation of Cross-Effects Models

We now come to the estimation problems associated with the fully extended attraction (or cross-effects) model discussed in Chapter 3.

Ai = exp(ai + ei) K
Õ
k = 1 
  m
Õ
j = 1 
fk(Xkj)bkij
(5.26)
si = Ai / m
å
j = 1 
Aj

As before, the fk in the above equation may be an identity (for an MCI model) or an exponential (for an MNL model) transformation. The most important property of the above model is, of course, the existence of cross-effect parameters, bkij ( i, j = 1, 2, ¼,m; k = 1, 2,¼, K). We are now faced with the seemingly insurmountable problem of estimating (m ×m ×K) + m parameters.

Surprisingly, estimating parameters of a cross-effects model is not very difficult, and in some sense easier than estimating parameters of a differential-effects model. McGuire, Weiss, and HoustonMcGuire, Timothy W., Doyle L. Weiss & Frank S. Houston [1977], ``Consistent Multiplicative Market Share Models,'' in Barnett A. Greenberg & Danny N. Bellenger (editors), Contemporary Marketing Thought. 1977 Educators Proceedings (Series # 41), Chicago: American Marketing Association.showed that the following regression models estimate the parameters of (5.26).

MNL Model:

s*it = a1 + m
å
j = 2 
a¢j dj + m
å
j = 1 
  K
å
k = 1 
  m
å
h = 1 
b*kij dh Xkjt +eit
(5.27)

MCI Model:

s*it = a1 + m
å
j = 2 
a¢j dj + m
å
j = 1 
  K
å
k = 1 
  m
å
h = 1 
b*kij dh logXkjt +eit
(5.28)

where s*it is the log-centered value of sit , the share of brand i in period t . Variable dj is the usual brand dummy variable, but its value changes depending on where it is used in the above equation. In the first summation, dj = 1 if j = i, and dj = 0 otherwise; in the second summation, dh = 1 if h = j, and dh = 0 otherwise. It must be pointed out that b*kij in models (5.27 - 5.28) are not the same as parameter bkij in model (5.26), but a deviation of the form

b*kij = bkij -

b
 


k.j 
 

where [`(b)]k.j is the arithmetic mean of bkij over all brands (i = 1, 2, ¼, m). But it may be shown that the estimated values of b*kij 's are sufficient for computing the cross elasticities. Recall from Chapter 3 that the elasticities and cross elasticities of brand i 's share with respect to a change in the kth variable for brand j is given by

MCI Model:

esi.j = bkij - m
å
h = 1 
sh bkhj

MNL Model:

esi.j = ( bkij - m
å
h = 1 
sh bkhj) Xkj     .

Take the MCI version, for example. Substitute b*kij for bkij in the above equation.

 
b*kij - m
å
h = 1 
sh b*khj
=
(bkij-

b
 


k.j 
)- m
å
h = 1 
sh (bkhj -

b
 


k.j 
)
 

 

=
bkij - m
å
h = 1 
sh bkij -

b
 


k.j 
+

b
 


k.j 
 
 

 

=
esi.j
 

since the sum of sh over all brands is one. Thus the knowledge of the b*kij 's is sufficient to estimate esi.j for both the MCI-type and MNL-type cross-effects models.

Let us apply the regression model proposed by McGuire et al. to the illustrative data in Table 5.1. Since the data necessary for estimation involve 56 variables (including the intercept term), no table of data set-up is shown. Only the estimation results are given in Table 5.11. The model was estimated without the intercept. The notation for independent variables, LPiDj , where i and j are appropriate numbers, indicates the effect of log(price) of the ith brand on brand j 's market share. There is a warning that the model is not full rank , because there are only four observations for brand 5 with a positive market share. Direct-effect parameters, LPiDi 's, for brand 1 through 4 are negative and statistically significant, and others are non-significant. Cross-effect parameters are mostly positive and/or statistically non-significant, but one of them, LP7D6, is negative and significant. Although we should refrain from making generalizations from this one set of data, it is perhaps justified to say that, as we move toward more complex models, the limitations of the test data set have become obvious. The number of observations is too small to provide one with stable parameter estimates. Furthermore, there seem to be other factors than price which affect market shares of margarine in this store. It is desirable then to obtain more data, especially from more than one store, along with the information on marketing variables other than price.

Table 5.11: Regression Results for Cross-Effects Model (MCI)

 
Model: MODEL1
Note : no intercept in model. R-square is redefined.
Dep Variable: LSHARE
Analysis of Variance
    Sum of Mean    
Source DF Squares Square F Value Prob > F
Model 52 92.58365 1.78045 8.451 0.0001
Error 32 6.74182 0.21068    
U Total 84 99.32547      
Root MSE 0.45900 R-Square 0.9321  
Dep Mean -0.00000 Adj R-Sq 0.8218  
C.V. -7.89278E+17      
NOTE : Model is not full rank. Least-squares solutions for the
  parameters are not unique. Some statistics will be
  misleading. A reported DF of 0 or B means that the
  estimate is biased. The following parameters have been
  set to 0, since the variables are a linear combination
  of other variables as shown.
LP4D5 = +2.9223*D5 + 0.9531*LP1D5 + 0.7226*LP2D5 - 0.2564*LP3D5
LP5D5 = +5.8144*D5 - 0.2300*LP1D5
LP6D5 = +6.3121*D5 - 0.2992*LP1D5 - 0.7777*LP2D5 + 0.7946*LP3D5
LP7D5 = +5.2704*D5 + 0.1566*LP1D5 - 0.0181*LP2D5 - 0.2200*LP3D5
 
Parameter Estimates
    Parameter Standard T for H0  
Variable DF Estimate Error Parm=0 Prob > |T|
D1 1 73.924694 46.10070856 1.604 0.1186
D2 1 -38.091548 46.10070856 -0.826 0.4148
D3 1 38.942488 46.10070856 0.845 0.4045
D4 1 -79.689368 64.73512217 -1.231 0.2273
D5 B -52.022706 14.10815828 -3.687 0.0008
D6 1 59.130760 69.49275660 0.851 0.4012
D7 1 -17.793812 46.10070856 -0.386 0.7021
LP1D1 1 -5.096306 1.97465203 -2.581 0.0146
LP2D1 1 -0.365029 2.43911085 -0.150 0.8820
LP3D1 1 1.507052 2.27537629 0.662 0.5125
LP4D1 1 -2.353595 1.89200824 -1.244 0.2225
LP5D1 1 0.503063 0.70624068 0.712 0.4814
LP6D1 1 -6.100657 3.98554609 -1.531 0.1357
LP7D1 1 -2.894448 4.25472197 -0.680 0.5012
LP1D2 1 -0.252472 1.97465203 -0.128 0.8991

 
Parameter Estimates
    Parameter Standard T for H0  
Variable DF Estimate Error Parm=0 Prob > |T|
LP2D2 1 -8.625451 2.43911085 -3.536 0.0013
LP3D2 1 2.107563 2.27537629 0.926 0.3613
LP4D2 1 3.041118 1.89200824 1.607 0.1178
LP5D2 1 0.800421 0.70624068 1.133 0.2655
LP6D2 1 1.336924 3.98554609 0.335 0.7395
LP7D2 1 9.615896 4.25472197 2.260 0.0308
LP1D3 1 -0.128008 1.97465203 -0.065 0.9487
LP2D3 1 1.150772 2.43911085 0.472 0.6403
LP3D3 1 -6.671369 2.27537629 -2.932 0.0062
LP4D3 1 -0.446255 1.89200824 -0.236 0.8150
LP5D3 1 0.378551 0.70624068 0.536 0.5957
LP6D3 1 -0.622859 3.98554609 -0.156 0.8768
LP7D3 1 -1.518813 4.25472197 -0.357 0.7235
LP1D4 1 -1.081137 2.35763232 -0.459 0.6496
LP2D4 1 6.997517 3.37627497 2.073 0.0463
LP3D4 1 -3.559763 3.99419070 -0.891 0.3795
LP4D4 1 -6.089339 1.89510791 -3.213 0.0030
LP5D4 1 0.194514 0.74768568 0.260 0.7964
LP6D4 1 12.210535 7.30809600 1.671 0.1045
LP7D4 1 7.680597 4.90866219 1.565 0.1275
LP1D5 B 6.205448 2.34288476 2.649 0.0124
LP2D5 B 3.608572 3.17368514 1.137 0.2640
LP3D5 B 0.569965 3.76236952 0.151 0.8805
LP4D5 0 0 0.00000000 . .
LP5D5 0 0 0.00000000 . .
LP6D5 0 0 0.00000000 . .
LP7D5 0 0 0.00000000 . .
LP1D6 1 3.523658 2.50575576 1.406 0.1693
LP2D6 1 0.112065 3.39656633 0.033 0.9739
LP3D6 1 -0.322265 4.07062686 -0.079 0.9374
LP4D6 1 1.837399 2.12068866 0.866 0.3927
LP5D6 1 1.098908 0.83529373 1.316 0.1977
LP6D6 1 -1.221249 7.39266721 -0.165 0.8698
LP7D6 1 -17.414894 5.82079344 -2.992 0.0053
LP1D7 1 0.104280 1.97465203 0.053 0.9582
LP2D7 1 1.630654 2.43911085 0.669 0.5086
LP3D7 1 2.086093 2.27537629 0.917 0.3661
LP4D7 1 1.615467 1.89200824 0.854 0.3995
LP5D7 1 -0.313301 0.70624068 -0.444 0.6603
LP6D7 1 -2.643566 3.98554609 -0.663 0.5119
LP7D7 1 1.060157 4.25472197 0.249 0.8048

8  A Multivariate MCI Regression Model

It should be pointed out that the parameter estimates of Table 5.11 may be obtained by applying a simple regression model of the following form to the data for each brand separately.

log(s*it) = ai + m
å
j = 1 
bpij log(Pjt) +eit (i = 1, 2, ¼, m)
(5.29)

In the above equation, ai is simply the intercept term for brand i . The parameters thus estimated are identical to those in Table 5.11, although the significance level of each parameter is usually different from the one in Table 5.11, because the t-statistic and associated degrees of freedom are not the same. If one wishes only parameter estimates, model (5.26) is simpler to calibrate than model (5.13).If we replace log(Pjt) with Pjt , the corresponding MNL model can be estimated.

The fact that (5.29) may be used to estimate the parameters of (5.26) has an extremely important implication. Note that, in estimating (5.29), the data for every brand involve the same set of independent variables, log(P1t),log(P2t), ¼, log(Pmt) , plus an intercept term. One may summarize model (5.29) for m brands in the following multivariate regression model.

Y = X B + E
(5.30)

where:

Y = the T ×m matrix with elements {log(s*it)} (t = 1, 2, ¼, T ; i = 1, 2, ¼, m)
X = the T ×(1 + m ×K) matrix (J | X1 | X2 |¼| XK)
J = the T ×1 vector (1 1 1 ¼1)¢
Xk = the T ×m matrix with elements {log(Xkit)} (t = 1, 2, ¼, T ; i = 1, 2, ¼, m)
B = the (1 + m ×K) ×m matrix (B1 | B2 | ¼|Bm)
Bi = ( ai |b1i1 ¼b1im | b2i1 ¼b2im | ¼| bKi1 ¼bKim)¢
E = the T ×m matrix of elements {eit} (t = 1, 2, ¼, T ; i = 1, 2, ¼, m) .

Recall our assumptions for the specification-error term are still applicable to the error term, eit , in the above model. It is well known that under our assumptions on the error term, the OLS procedure, applied to each column of Y in (5.30) separately, yields the best linear-unbiased estimates (BLUE) of the parameters of B.See, for example, Finn, Jeremy D. [1974], A General Model for Multivariate Analysis , New York: Holt, Rinehart & Winston.In other words, it is not necessary to resort to the GLS procedure to obtain minimum-variance estimates of a cross-effects model such as (5.27) or (5.28).

This fact, combined with the availability of equation (5.29) for brand-by-brand estimation, reduces the task of estimating the parameters of a cross-effects model and increases its usefulness as a market-diagnostic tool. When one has a sufficient number of observations (that is, T > 1 + m ×K), it is perhaps best to estimate a cross-effects model first, and then, after examining the pattern of estimated coefficients, determine if a simpler model, such as the simple attraction model or a differential-effects model, is adequate. When the number of observations is barely sufficient for a cross-effects model, one may decide to adopt a strategy to estimate a full cross-effects model first, and then decide to restrict some elements of the B matrix (the parameter matrix) to be zero (cf. Carpenter, Cooper, Hanssens, & Midgley [1988]). In this case, however, the OLS procedure is not applicable and a GLS procedure will have to be used.

5.9  Estimation of Category-Volume Models

So far we have considered the various techniques which may be used to estimate the parameters of market-share models, but the forecasting of brand sales volumes requires more than the knowledge of market shares. Because the sales volume of a given brand in a period is a product of the brand's share and (total) category sales volume for the period, one needs the forecast of category sales volumes.Hereafter we will use category volume instead of industry sales volume , since the former fits better in the context of stores and market shares.

In this section we deal with the estimation of the parameters of category-volume models. Compared with the market-share estimation, the modeling for category sales volumes is a more straightforward application of econometric techniques. The illustrative data in Table 5.1 include the average daily sales volumes of margarine for this store. We will use these data to show some examples of category-volume models.

In this particular data set, brand price is the only marketing variable. We hypothesize that if the overall price level is low, the total volume will be high. We also hypothesize that if sales are extremely high in one week, the sales in the following weeks should be low because the store customers have not used up their stock. In order to represent those two hypotheses, we propose the following model.

Qt = a + b Qt-1 + c log ~
P
 

t 
+ ut
(5.31)

where:

Qt = the category volume (in equivalent units) in period t
[P\tilde]t = the average price level in period t
ut = an error term
a, b, c = parameters to be estimated.

We let the geometric mean of prices in a period be [P\tilde]t . The following is the estimation result.

 
Qt
=
508.8
- 0.4652 Qt-1
- 2.5116 log ~
P
 

t 
 
 

 

 

 

(4.172)
(-1.681)
(-3.620)
 

 

 

 

R-Square
= 0.5764
 

 

 

T-values are in the parentheses directly below the corresponding parameter estimates. The fit of the model is acceptable, judging from the R2-value of 0.58 . The estimated parameters and their t-values bear out our initial guess that the average price level in the week and the sales volume in the preceding week are influential in determining the category volume.

There is another line of thought concerning the effect of price on category volumes that the prices of different brands have differential effects on category volumes. A brand's price reduction may increase its share, but may not affect category volumes, while another brand's price reduction may increase both its share and category volumes. To incorporate differential effects of brand price, we propose the following model.

Qt = a + b Qt-1 + m
å
i = 1 
ci logPit + ut
(5.32)

where the ci 's are the differential price-effect parameters. The estimation results for this model are given below.

Qt = 252.62 -0.4947Qt-1 -0.1646logP1t -0.4799 logP2t -0.06799logP3t -0.1881 logP4t -0.5646 logP5t -0.2727 logP6t +1.0631logP7t

R-Square = 0.9581

The fit of the model is much improved. Brand 2 and 5 have significant effects on category volumes indicating that when those brands cut prices the customers to this store purchase more than their usual amounts, and that the following week's total volume suffers as a consequence. Note that the brand sales elasticity with respect to price, which measures the overall impact of brand i 's price on its sales volume, is decomposed into two components:

eQi.Pi = Category-Volume Elasticity + Share Elasticity.

For example, if we assume the differential-effects model, then

eQi.Pi = ci + bpi (1 - si)

where ci is in model (5.32) and bpi is estimated by one of the models (5.14 - 5.17).

With the R2 -value of 0.96, equation (5.32) should give reasonably good estimates of category volumes. The positive sign of the estimated parameter for logP7t poses a theoretical problem, but it probably reflects the effects of some marketing activities within the store which are not included in the model. As a forecasting model for category volume, this model should be used as it is.

Model (5.32) is in the form of a distributed-lag models. It is known that the ordinary-least squares procedure applied to (5.32) yields biased estimates of the model parameters. If there are an adequate number of observations, it is recommended to use time-series analysis procedures for parameter estimation. Weekly data produce a sufficient number of observations in two years for a time-series analysis model. If the number of observations is less than 50, however, it is perhaps best to use the OLS procedure.

These simplest category-volume models are linear in the effects of previous category volume while being linear in the logs of prices. As we incorporate marketing variables other than price, it is advisable to postulate more general, fully interactive models such as:

Qt = exp(a + ut) Qt-1b m
Õ
j = 1 
Pjcj K
Õ
k = 2 
exp(bkj Xkjt)     .
(5.33)

The reduced form of such a model may be characterized as being a log-log model in the effects of price and previous category volume, and log-linear in the other marketing variables (such as newspaper features, in-store displays and other marketing instruments which may be binary variables). This general form will be used with the coffee-market example developed in section 5.12.

5.10  Estimation of Share-Elasticities

In Chapter 6 we deal with the market-structure analysis based on the factor analysis of market-share elasticities. The reader may recall that there are two types of market-share elasticities, namely, point- and arc-share elasticities. Since the elasticities obtainable in practice are arc elasticities, one may think of factor-analyzing arc elasticities to investigate the structure of the market and competition. Unfortunately, this is not at all feasible.

Recall the definition of an arc elasticity for variable Xk .

esi = Dsi

DXki

  xki

si

 

Dsi in the above definition is not the total change in si , but the change corresponding to the change in Xki , DXki . We have no means of separating the effects of various marketing variables on market shares, unless, of course, we apply some models to observed market shares. Indeed it is the main purpose of the models discussed in this book to identify the effects of marketing variables. Thus, in order to estimate share-elasticities specific to a marketing variable, we propose first to estimate the parameters of a market-share model from a data set (i.e., brand shares and marketing variables), and then use theoretical expressions for point elasticities (see Chapter 3) for the relevant model to obtain elasticities estimates.

A numerical example may clarify this procedure. When we applied the raw-score attraction model to the margarine data in Table 5.1, we have obtained a price-parameter estimate of -8.337. If a brand's share is 0.2, then the point-elasticity estimate is given by -8.337×(1 - 0.2) = -6.67 . Although we are unable to estimate arc elasticities in this manner, point-elasticity estimates will serve as approximations for arc elasticities.

5.11  Problems with Zero Market Shares

Since the dependent variable in log-linear regression is the logarithm of either market shares or the numbers of units sold, it is impossible to compute the value of the dependent variable if observed market shares or numbers of units are zero. In any data collection procedure one may observe a zero market share or number of units sold for some brand-period combination. There are two procedures for handling those data sets which contain zero market shares.

The first is to assign some arbitrarily small values (0.001, say) to zero market shares. But this procedure amounts to assigning a large negative value to log 0, and tends to bias the estimated parameter values. (The smaller the assigned value, the greater the absolute values of estimated parameters.)

The second procedure is to delete from the data set those brand-period combinations for which observed market-shares are zero. Young, Kan H. & Linds Y. Young [1975], ``Estimation of Regressions Involving Logarithmic Transformations of Zero Values in the Dependent Variables,'' The American Statistician , 29 (August), 118-20.Though this procedure may seem arbitrary at first glance, it has some logic of its own. First, if a brand were not bought in a certain period, that would be sufficient basis to infer that the brand was not in the consumers' choice set. Second, since one is usually more interested in estimating accurately the behavior of those brands which command large shares, it may be argued that one need not bother with those brands which often take zero market shares. Third, that zero market shares are not usable for estimation is not a problem limited to log-linear regression procedures. Consider, for example, the case in which the share estimate for brand i in period t is based on the number of consumers who purchased that brand, nit (i = 1, 2, ¼, m ). Assuming that numbers {n1t, n2t, ¼, nmt} are generated by a multinomial process (see section 5.1.1 on maximum-likelihood estimation), one may wish to use a maximum-likelihood procedure for estimating parameters of attraction models. Note, however, that those observations for which nit = 0 do not contribute at all to the likelihood function (5.2). In a sense, the maximum-likelihood procedure ignores all brand-period combinations for which nit = 0 .

There are two drawbacks to the deletion of zero market shares. One is the reduction of the degree of freedom due to the deletion. But this drawback may be compensated by a proper research design in that, if the number of brands per period is reduced by the deletion, the number of periods (or areas) may be increased to obtain an adequate degree of freedom. The second drawback is that the estimated parameters are somewhat biased (in the direction of smaller absolute values). But, we believe that the biases which are introduced by this procedure are far less than those which are introduced by replacing zero shares by an arbitrarily small constant. It may be added that we found in our simulation studies that the true parameter values lie between those estimated after deleting zero-share observations and those estimated after replacing zero shares by an arbitrary constant. This finding leads us to consider another somewhat arbitrary, and so far untested, procedure, which adds a small constant to all brand-period combinations, disregarding if they are zero share or not. In other words, we suggest that the dependent variable, logsit , is to be replaced by log(sit + c) , where sit is the share of brand i in period t and c is the arbitrary constant. We found that, if one selects the value of c properly, the estimated parameters are free of biases which other two procedures tend to create. The appropriate value of c seems to vary from one data set to the next. So far we have been unable to find a logic to determining the correct value of c that is applicable to a particular data set. Here we only indicate that a fruitful course of research may lie in the direction of this estimation procedure.

Zero market shares create particularly difficult problems for the multivariate regression in (5.30). The missing market share for one brand may cause the observation to be deleted from all the regressions. In cases such as this, when it is particularly important to have all the dependent measures present, the EM algorithm discussed by MalhotraMalhotra, Naresh [1987], ``Analyzing Market Research Data with Incomplete Information on the Dependent Variable,'' Journal of Marketing Research , XXIV (February), 74-84. ould be useful.

When imputing values which are missing in the data one should always ask why are the data missing? The imputation literatureFor an excellent recent treatment see Little, Roderick J. A. & Donald B. Rubin [1987], Statistical Analysis with Missing Data . New York: John Wiley & Sons, Inc.treats data missing-at-random (MAR), missing-completely-at-random (MCAR), and missing-by-unknown-mechanisms (MBUM), but rarely do these conditions fit the zero market shares in POS data. If a brand simply is not distributed in one or more of the retail outlets, neither MAR, MCAR, nor MBUM assumptions are appropriate. Even if the brand is distributed, it is not always possible to tell if the zero market share results from an out-of-stock condition or simply from no sales. But, in either case, these conditions are neither random or by unknown mechanisms. One clue comes from the other data associated with a brand. If price and promotional variables are present for the zero-market-share brand, one can assume the brand is distributed, but nothing more. The problem concerns only imputing the value of the dependent measure. If price and promotional measures are also missing, the imputation problem is more severe. Widely differing patterns of distribution would greatly complicate the multivariate regression in (5.30). In such cases it is probably simpler to delete the missing observations in the market-share model, and use the method discussed in section 5.12 for estimating cross effects.

While simply deleting the observation is an acceptable solution to the problem of differing patterns of distribution in market-share models, it is not an acceptable approach to this problem in category-volume models. Zero market share isn't the issue, since the dependent measure is the (log of) total sales volume. But missing values for prices are particularly worrisome, since we cannot take the log of a missing value. In the market-share model for POS data, there is an observation for each brand in each store in each week. For the corresponding category-volume model there is just an observation for each store in each week. The measures in an observation reflect the influence of each brand's prices and promotional activity on total volume. If we were to delete the whole observation whenever a single brand was not in distribution, widely differing distribution patterns over stores could result in the deletion of all observations. We wish to minimize the influence that the missing value has on the parameter corresponding to that measure, but allow the other measures in the observation to have their normal influence in parameter estimation.

While an developing an algorithm to minimize the influence of missing prices is a worthwhile topic for future research, there is a simple approach for achieving a reasonable result in the interim. We merely need to create brand-absence dummy variables, which would take a value of one when then brand is absent and a value of one when present. If we then replace the missing (log) price with a zero, the parameter of the brand-absence measure show the penalty uniquely associated with not distributing the brand. This approach will be illustrated in the next section.

5.12  The Coffee-Market Example

To illustrate the use of these estimation techniques on POS data, consider the ground, caffeinated coffee market. Data, provided by Information Resources, Inc., from BehaviorScan stores in two cities, report price, newspaper feature, in-store display and store-coupon activity for all brands. The small-volume, premium brands were aggregated into an ``All Other Branded'' (AOB) category, and the small ``Private Label'' (PL) brands were aggregated into an ``All Other Private Label'' (AOPL) category. Consequently, twelve brands of coffee were analyzed: Folgers, Regular Maxwell House, Maxwell House Master Blend, Hills Bros., Chock Full O'Nuts, Yuban, Chase & Sanborne, AOB, PL 1, PL 2, PL 3, and AOPL. For eighteen months, each week's data for a brand were aggregated over package weights, and over stores-within-grocery chains in the two cities. These are aggregate data from stores, not discrete-choice data from BehaviorScan consumer panels. Price for each brand was aggregated into average price per pound, net of coupons redeemed. Feature, display and coupon were represented as percent of volume sold on promotions of each type to allow for aggregation over stores with slightly differing promotional environments. The data were divided into a year for calibration of the market-share model, and six months for cross-validation. The average price and market share of each brand appear in Table 5.12.

Table 5.12: Coffee Data - Average Prices and Market Shares

 
  Average Average
Brand Price/lb. Share
 
Folgers $2.33 28.5
Maxwell House $2.22 24.2
Master Blend $2.72 7.8
Hills Bros. $2.13 4.3
Chock Full O Nuts $2.02 15.3
Yuban $3.11 0.2
Chase & Sanborne $2.34 0.3
All Other Branded $2.64 2.4
Private Label 1 $1.99 3.9
Private Label 2 $1.95 3.6
Private Label 3 $1.93 3.7
All Other Private Labels $1.95 5.7

5.12.1  The Market-Share Model

With four marketing instruments per brand the full cross-effects model would have 587 parameters (4 ×12 ×12 + 11). To avoid estimating so many parameters an asymmetric market-share model was estimated by procedures similar to those discussed in Carpenter, Cooper, Hanssens, and Midgley [1988].Carpenter et al. suggest forming dynamically weighted, attraction components to deal with the lagged effects of marketing instruments. Chapter 3 discusses alternative methods for specifying the dynamic components, but neither of these approaches was used in this illustration. Store-week data are sufficiently disaggregate that they rarely have the complex time-series properties dealt with in Carpenter et al., so that no dynamically weighted, attraction components were needed.The distinctiveness of marketing efforts were incorporated by using exp(z-scores) for each marketing instrument. A differential-effects model was estimated with a unique parameter for each brand's price, feature, display, and store coupons, and a brand-specific intercept for the qualitative features of each brand using OLS procedures. The brand-specific intercept which was closest to zero (PL 2) was set to zero to avoid singularity. The residuals from this differential-effects model were cross-correlated brand by brand with the transformed contemporaneous explanatory variables for all other brands. The cross-competitive effects which were significant in the residual analysis were entered into the model.The criteria for inclusion of a cross effect were that it had to be based on more than 52 observations and the correlation had to be significant beyond the .05 level.

This specification approach leads to a generalized attraction model:

Ait = exp(ai + e1i) K
Õ
k = 1 
[exp(zkit) ]bki
Õ
(k*j*) eCi 
[exp(zk*j*t) ]bk*ij*

where ai is brand i 's constant component of attraction, e1i is specification error, bki is brand i 's market-response parameter on the kth marketing-mix element, exp(zkit) is brand i 's attraction component for the kth marketing-mix element (standardized over brands within a store-week), Ci is the set of cross-competitive effects on brand i, exp(zk*j*t) is the standardized attraction component of the cross-competitive influence of brand j* 's marketing-mix element k* on brand i , (k*j*)eCi , and bk*ij* is the cross-effect parameter for the influence of brand j* 's attraction component k* on brand i 's market share.

For the final model the residuals from the OLS estimation were used to estimate the error variances for each brand. The weights for a regression were formed as

wi = 1
(1- 1

m

) ^
s
 

i 
 
    .

These weights compensate for heteroscedasticity of error variances over brands, but do not treat the possibility of nonzero error covariances. The results for the calibration period of 52 weeks appear in Table 5.13.

The resulting model has an R2 of .93 with 140 parameters estimated and 2,051 residual degrees of freedom (F2051140 = 181). Since the model is estimated without an intercept, R2 is redefined as is noted on the regression output. In models estimated without an intercept R2 is like the congruence coefficient discussed in section 5.6. If the mean of the dependent measure is equal to zero, the lack of an intercept doesn't matter, and R2 has the normal interpretation as the proportion of linearly accountable variation in the reduced form of the dependent measure. The dependent measure in the OLS-estimation phase does have a mean of zero (and an R2 of .92) but rescaling by the weights affects the mean of the dependent measure. So while it is obvious that the cross-effects model fits extremely well, it is not strictly proper to interpret .93 as the proportion of explained variation.Because reweighting changes the interpretation of R2 , to assess the incremental contribution of the cross effects, it is simpler to compare the OLS differential-effects model to the OLS cross-effects model. In this case the OLS differential-effects model has an R2 of .82, so that the cross effects represent a substantial improvement over the good-fitting differential-effects model.

We cross validate these models by combining the parameter values in Table 5.13 with fresh data to form a single composite prediction variable, and then correlate the predicted dependent measure with the actual dependent measure for the new observations; 26 weeks of fresh data were used in cross validation. The squared cross-validity correlation is .85 using the parameters in Table 5.13. This is an excellent result for a relationship that uses just one composite variable to predict over 1,000 observations (F10121 = 5808). The OLS differential-effects model has a squared cross-validity correlation of .79, indicating that the cross effects do enhance the model in a stable manner.

Table 5.13: Regression Results for Cross-Effects Model (MCI)

 
Coffee Data Base For Pittsfield And Marion Markets
Ground-Caffeinated Coffee Brands Only
MCI Regression
Model: Coffee
Dep Variable: LCSHARE Log-Centered Share
Analysis Of Variance
    Sum of Mean    
Source DF Squares Square F Value Prob > F
Model 140 11556.72 82.55 181.54 0.01
Error 2051 932.62 0.45    
U Total 2191 12489.34      
Root MSE 0.67 R-Square 0.93  
Dep Mean 0.18 Adj R-Sq 0.92  
C.V. 383.34      
Note: No intercept term is used. R-Square is redefined.
 
Parameter Estimates
    Parm Std T For H0: Prob >
Variable DF Est Err Parm=0 |T|
Folg Intercept 1 2.54 0.17 15.07 0.01
Folg Price Z-Score 1 -0.96 0.07 -13.07 0.01
Folg Featv Z-Score 1 0.06 0.04 1.52 0.13
Folg Dispv Z-Score 1 0.16 0.05 3.56 0.01
Folg Coupv Z-Score 1 -0.13 0.05 -2.53 0.01
RMH Intercept 1 1.92 0.12 15.50 0.01
RMH Price Z-Score 1 -0.58 0.06 -10.02 0.01
RMH Featv Z-Score 1 0.00 0.03 0.12 0.91
RMH Dispv Z-Score 1 0.06 0.03 1.74 0.08
RMH Coupv Z-Score 1 0.11 0.04 2.92 0.01
MHMB Intercept 1 1.79 0.17 10.27 0.01
MHMB Price Z-Score 1 -0.24 0.07 -3.21 0.01
MHMB Featv Z-Score 1 0.19 0.04 5.33 0.01
MHMB Dispv Z-Score 1 0.22 0.05 4.79 0.01
MHMB Coupv Z-Score 1 -0.08 0.06 -1.43 0.15
HlBr Intercept 1 -0.50 0.11 -4.49 0.01
HlBr Price Z-Score 1 0.04 0.07 0.57 0.57
HlBr Featv Z-Score 1 0.48 0.05 8.96 0.01
HlBr Dispv Z-Score 1 0.23 0.05 4.57 0.01
HlBr Coupv Z-Score 1 1.52 0.19 7.97 0.01
CFON Intercept 1 0.61 0.11 5.37 0.01
CFON Price Z-Score 1 -1.33 0.09 -14.50 0.01
CFON Featv Z-Score 1 0.12 0.05 2.27 0.02
CFON Dispv Z-Score 1 -0.04 0.04 -0.94 0.35
CFON Coupv Z-Score 1 -0.22 0.07 -3.35 0.01
Yub Intercept 1 -0.15 0.21 -0.71 0.48
Yub Price Z-Score 1 -0.77 0.09 -8.70 0.01
Yub Featv Z-Score 1 0.21 0.21 0.98 0.33
Yub Dispv Z-Score 1 0.70 0.25 2.82 0.01
Yub Coupv Z-Score 1 0.15 0.22 0.70 0.49
C_S Intercept 1 -0.42 0.17 -2.48 0.01
C_S Price Z-Score 1 -0.27 0.14 -2.01 0.05
C_S Featv Z-Score 1 -0.07 0.31 -0.22 0.83
C_S Dispv Z-Score 1 1.19 0.33 3.65 0.01
C_S Coupv Z-Score 1 0.78 0.24 3.21 0.01

 
Parameter Estimates, Continued
    Parm Std T For H0: Prob >
Variable DF Est Err Parm=0 |T|
AOB Intercept 1 0.50 0.12 4.00 0.01
AOB Price Z-Score 1 -0.49 0.06 -8.28 0.01
AOB Featv Z-Score 1 -0.24 0.06 -3.75 0.01
AOB Dispv Z-Score 1 0.13 0.04 2.87 0.01
AOB Coupv Z-Score 1 0.16 0.09 1.84 0.07
PL1 Intercept 1 0.28 0.16 1.75 0.08
PL1 Price Z-Score 1 -1.07 0.09 -11.64 0.01
PL1 Featv Z-Score 1 -0.06 0.04 -1.47 0.14
PL1 Dispv Z-Score 1 -0.06 0.04 -1.62 0.10
PL1 Coupv Z-Score 1 0.03 0.03 0.79 0.43
PL2 Price Z-Score 1 -1.11 0.17 -6.68 0.01
PL2 Featv Z-Score 1 0.06 0.14 0.43 0.67
PL2 Dispv Z-Score 1 0.12 0.13 0.91 0.36
PL2 Coupv Z-Score 1 0.41 0.42 0.97 0.33
PL3 Intercept 1 -0.30 0.22 -1.36 0.17
PL3 Price Z-Score 1 -1.00 0.15 -6.53 0.01
PL3 Featv Z-Score 1 0.02 0.06 0.28 0.78
PL3 Dispv Z-Score 1 0.35 0.41 0.84 0.40
PL3 Coupv Z-Score 1 0.05 0.05 0.95 0.34
AOPL Intercept 1 0.25 0.15 1.68 0.09
AOPL Price Z-Score 1 -0.21 0.06 -3.47 0.01
AOPL Featv Z-Score 1 0.07 0.03 2.62 0.01
AOPL Dispv Z-Score 1 -0.04 0.05 -0.68 0.50
AOPL Coupv Z-Score 1 0.02 0.04 0.43 0.67
Crs Of RMH Price Effect On Folg 1 -0.27 0.07 -3.96 0.01
Crs Of MHMB Price Effect On Folg 1 -0.10 0.08 -1.29 0.20
Crs Of HlBr Price Effect On Folg 1 0.06 0.06 0.98 0.33
Crs Of CFON Price Effect On Folg 1 0.05 0.06 0.92 0.36
Crs Of Yub Price Effect On Folg 1 -0.32 0.06 -5.85 0.01
Crs Of AOB Price Effect On Folg 1 -0.31 0.06 -5.20 0.01
Crs Of RMH Featv Effect On Folg 1 -0.13 0.03 -3.75 0.01
Crs Of Yub Featv Effect On Folg 1 -0.04 0.18 -0.24 0.81
Crs Of RMH Dispv Effect On Folg 1 -0.09 0.04 -2.40 0.02
Crs Of MHMB Dispv Effect On Folg 1 0.12 0.05 2.72 0.01
Crs Of Yub Dispv Effect On Folg 1 0.01 0.21 0.04 0.97
Crs Of AOB Dispv Effect On Folg 1 0.02 0.04 0.44 0.66
Crs Of RMH Coupv Effect On Folg 1 0.03 0.04 0.68 0.50
Crs Of MHMB Coupv Effect On Folg 1 0.03 0.05 0.47 0.64
Crs Of HlBR Coupv Effect On Folg 1 1.04 0.17 6.06 0.01
Crs Of Yub Coupv Effect On Folg 1 0.30 0.18 1.66 0.10
Crs Of AOPL Coupv Effect On Folg 1 -0.06 0.04 -1.70 0.09
Crs Of Folg Price Effect On RMH 1 -0.10 0.06 -1.54 0.12
Crs Of Yub Price Effect On RMH 1 -0.05 0.04 -1.31 0.19
Crs Of AOB Price Effect On RMH 1 -0.22 0.04 -4.83 0.01
Crs Of AOPL Price Effect On RMH 1 0.17 0.04 4.63 0.01
Crs Of Folg Featv Effect On RMH 1 -0.00 0.03 -0.09 0.93
Crs Of Yub Featv Effect On RMH 1 0.19 0.17 1.12 0.26
Crs Of AOB Featv Effect On RMH 1 -0.12 0.05 -2.24 0.03
Crs Of Folg Dispv Effect On RMH 1 -0.04 0.04 -0.92 0.36
Crs Of HlBr Dispv Effect On RMH 1 -0.08 0.03 -2.37 0.02
Crs Of Yub Dispv Effect On RMH 1 -0.49 0.20 -2.46 0.01
Crs Of HlBr Coupv Effect On RMH 1 0.31 0.15 2.09 0.04
Crs Of CFON Coupv Effect On RMH 1 -0.05 0.05 -0.87 0.39

 
Parameter Estimates, Continued
    Parm Std T For H0: Prob >
Variable DF Est Err Parm=0 |T|
Crs Of Yub Coupv Effect On RMH 1 0.54 0.18 3.01 0.01
Crs Of AOB Coupv Effect On RMH 1 0.20 0.07 2.76 0.01
Crs Of Yub Price Effect On MHMB 1 -0.10 0.05 -2.10 0.04
Crs Of AOB Price Effect On MHMB 1 -0.29 0.06 -4.92 0.01
Crs Of AOPL Price Effect On MHMB 1 0.38 0.04 9.73 0.01
Crs Of RMH Featv Effect On MHMB 1 -0.04 0.03 -1.27 0.21
Crs Of Yub Featv Effect On MHMB 1 0.52 0.17 3.02 0.01
Crs Of AOB Featv Effect On MHMB 1 -0.12 0.06 -2.19 0.03
Crs Of HlBr Dispv Effect On MHMB 1 -0.09 0.03 -2.69 0.01
Crs Of Yub Dispv Effect On MHMB 1 -0.43 0.22 -2.01 0.04
Crs Of AOPL Dispv Effect On MHMB 1 -0.06 0.05 -1.01 0.31
Crs Of RMH Coupv Effect On MHMB 1 0.08 0.04 2.19 0.03
Crs Of HlBr Coupv Effect On MHMB 1 0.50 0.16 3.04 0.01
Crs Of Yub Coupv Effect On MHMB 1 0.42 0.16 2.56 0.01
Crs Of AOB Coupv Effect On MHMB 1 0.14 0.07 1.89 0.06
Crs Of AOPL Coupv Effect On MHMB 1 -0.00 0.04 -0.14 0.89
Crs Of MHMB Price Effect On HlBr 1 0.19 0.07 2.71 0.01
Crs Of AOB Price Effect On HlBr 1 0.29 0.05 5.82 0.01
Crs Of MHMB Featv Effect On HlBr 1 -0.05 0.07 -0.78 0.44
Crs Of MHMB Dispv Effect On HlBr 1 -0.00 0.08 -0.02 0.99
Crs Of CFON Dispv Effect On HlBr 1 0.03 0.04 0.78 0.43
Crs Of AOB Dispv Effect On HlBr 1 -0.04 0.05 -0.76 0.44
Crs Of RMH Price Effect On CFON 1 0.31 0.08 3.70 0.01
Crs Of MHMB Price Effect On CFON 1 -0.69 0.06 -10.81 0.01
Crs Of HlBr Price Effect On CFON 1 -0.17 0.07 -2.48 0.01
Crs Of Folg Featv Effect On CFON 1 0.10 0.06 1.72 0.09
Crs Of AOB Featv Effect On CFON 1 0.01 0.06 0.11 0.91
Crs Of AOB Dispv Effect On CFON B -0.03 0.05 -0.70 0.49
Crs Of Folg Coupv Effect On CFON 0 -0.07 0.08 -0.90 0.37
Crs Of MHMB Coupv Effect On CFON 1 -0.63 0.14 -4.39 0.01
Crs Of HlBr Coupv Effect On CFON 1 0.01 0.19 0.05 0.96
Crs Of Folg Price Effect On Yub 1 0.10 0.06 1.58 0.12
Crs Of Folg Dispv Effect On Yub 1 -0.12 0.08 -1.48 0.14
Crs Of MHMB Dispv Effect On Yub 1 0.49 0.10 4.92 0.01
Crs Of Folg Coupv Effect On Yub 1 -0.07 0.06 -1.27 0.21
Crs Of Folg Price Effect On AOB 1 0.52 0.10 5.43 0.01
Crs Of RMH Price Effect On AOB 1 0.94 0.09 10.61 0.01
Crs Of HlBr Price Effect On AOB 1 0.35 0.08 4.36 0.01
Crs Of CFON Price Effect On AOB 1 -0.00 0.08 -0.03 0.98
Crs Of Yub Price Effect On AOB 1 0.33 0.05 6.06 0.01
Crs Of AOPL Price Effect On AOB 1 0.91 0.07 13.87 0.01
Crs Of Folg Featv Effect On AOB 1 0.01 0.04 0.25 0.80
Crs Of Yub Featv Effect On AOB 1 0.09 0.05 1.74 0.08
Crs Of Folg Dispv Effect On AOB 1 -0.14 0.05 -2.88 0.01
Crs Of Yub Dispv Effect On AOB 1 -0.18 0.06 -2.97 0.01
Crs Of RMH Coupv Effect On AOB 1 0.15 0.04 3.32 0.01
Crs Of CFON Coupv Effect On AOB 1 -0.19 0.07 -2.91 0.01
Crs Of Yub Coupv Effect On AOB 1 0.06 0.13 0.46 0.65
Crs Of Folg Price Effect On AOPL 1 -0.21 0.09 -2.30 0.02
Crs Of RMH Price Effect On AOPL 1 -0.48 0.07 -6.66 0.01
Crs Of MHMB Price Effect On AOPL 1 0.09 0.03 2.66 0.01
Crs Of AOB Price Effect On AOPL 1 -0.08 0.06 -1.33 0.18

These results differ in minor fashion from those previously summarized by Cooper.Cooper, Lee G. [1988b], ``Competitive Maps: The Structure Underlying Asymmetric Cross Elasticities,'' Management Science , 34, 6 (June), 707-23.There are two sources of difference. First, the article is based on the OLS results. Second, the brand-specific effects estimated in that article are based on z-scores, rather than the more traditional brand-specific intercepts adopted in this book. Only the parameter values for the brand-specific effect are substantially affected by the differences between the two approaches. A brand-by-brand summary follows.

Folgers has the largest brand-specific intercept indicating a relatively high baseline level of attraction. If all brands were at the market average for prices and all other marketing instruments, so that only the differences in brand intercepts were reflected in the market share, Folgers would be predicted to capture 36% of the market. This is what we will call a baseline market share .Baseline shares can differ substantially from the average shares reported in Table 5.12. Average shares are a straightforward statistical concept, but baseline shares reflect something of a brand's fundamental franchise, all other things being equal. But all other things are rarely equal. Market power can come from the way a brand uses its marketing instruments (i.e., its promotion policy) as well as from its fundamental franchise. Baseline share figures are reported for each of the brands. These can be usefully compared to the average-share figures, but should not be thought of as a prediction of long-run market share.Folgers has a very strong and significant price parameter. Being priced above the market average will sharply reduce its baseline market share, while price reductions will sharply increase share. There is a positive but insignificant feature effect. There is a strong positive effect for in-store displays. The effect of store coupons is negative and statistically extreme. While we would normally expect store-coupon promotions to have a positive effect, we should note two things. First, the average number of pounds-per-week of Folgers sold on store coupons is 1,175 compare to 2,018 pounds sold on in-store displays and 1,397 pounds sold per week of newspaper features. So there is some indication in these data that this might not be a spurious coefficient. Second, the price measure is net of coupons redeemed. While this reflects the influence of manufacturers coupons as well as store coupons, it does mean that some of the benefits of store coupons are folded into the price effect. There are four significant cross-price effects impacting Folgers. Regular Maxwell House, Maxwell House Master Blend, Yuban, and the AOB category all have significantly less price impact on Folgers than reflected in the differential-effects model. Folgers has significantly more of a price effect on the AOB category and significantly less price impact on the AOPL brands than would otherwise be expected. For features, only the increased competitive impact of Regular Maxwell House is significant. For displays, Regular Maxwell House has more of an effect, while Master Blend has less of an effect than otherwise expected. Folgers' displays exert more pressure on the AOB category than otherwise expected. Hills Bros. coupons put significantly less pressure on Folgers than expected from differential effects alone.

Regular Maxwell House also has a strong, positive brand-specific intercept, which translates into a baseline market share of 19%. It has significant price and coupon effects. Regular Maxwell House has significant competitive price effects on Chock Full O'Nuts and the AOB category, but it exerts significantly less competitive pressure on Folgers and AOPL with its price. AOPL has a significant competitive price effect, while the AOB category exerts significantly less price pressure. RMH features attack Folgers, and features for the AOB category exert significant pressure on RMH. RMH displays exert significant competitive pressure on Folgers, while Hills Bros. and Yuban attack RMH with their displays. RMH coupons have less competitive effect on Master Blend and the AOB category than would otherwise be expected, and coupons for Hill Bros., Yuban and the AOB category have significantly less impact on RMH in return.

Maxwell House Master Blend has a significant intercept which translates into a baseline share of 17%. Price, feature, and display effects are significant in the expected directions. The coupon effect is insignificant and wrong signed. Master Blend receives more price pressure from AOPL, but less from Yuban and the AOB category than would otherwise be expected. In return Master Blend exerts more price pressure on Hills Bros. and AOPL, and less pressure on CFON and Folgers than the differential-effects models could reflect. AOB features are more competitive and Yuban features are less competitive due to their significant cross effects on Master Blend. Master Blend displays are less competitive with both Folgers and Yuban than otherwise expected, while displays for Hills Bros. and Yuban exert extra pressure on Master Blend. Store coupons for Regular Maxwell House, Hills Bros. and Yuban all have less effect than otherwise expected. Store coupons for Master Blend do exert pressure on Chock Full O'Nuts.

Hills Bros.' intercept translates into a baseline share of 2%. It shows strong effects for features, displays, and coupons. The self-price effect is not significant, but it does have a significant competitive price effect on the AOB category. It has less price effect on CFON than otherwise expected. Master Blend and the AOB category exert stable competitive price effects on Hills Bros. There are no feature cross effects, but Hill Bros. has significant competitive display effects on Regular Maxwell House and Maxwell House Master Blend (as already noted).

Chock Full O'Nuts has a small baseline share (5%), but strong price and feature effects. Its use of these instruments helps it maintain the third largest average market share (15%). The Regular Maxwell House has a strong, competitive price effect on Chock Full O'Nuts. But both Master Blend and Hills Bros. exert significantly less price pressure on CFON. There are no significant feature or display cross effects, but CFON's store coupons exert extra pressure on the AOB category and Master Blend's store coupons exert extra pressure on CFON.

Yuban has a baseline share of 2%, but its high price results in a much smaller average share. It has significant price and display effects. Yuban exerts less price pressure on Folgers and Master Blend, but more pressure on the AOB category than otherwise expected. Features for Yuban have less impact on Master Blend than reflected in simpler models. Yuban displays have significant competitive effect on both Maxwell House brands and the AOB category, while Master Blends displays are less competitive in return. The display effect of both Maxwell House brands is reversed in the only two coupon effects concerning Yuban. This is such a small brand in these markets that it probably should have been folded into the AOB category. Its stronger position on the West Coast may have led the authors astray.

Chase & Sanborne also has a baseline share of 2%. Its average share is even less, due to its high price and the infrequency of promotions. Its price, display, and coupon effects are statistically significant. There are no cross effects involving Chase & Sanborne.

The premium brands in the AOB category collectively have a baseline share of 5%. There are strong price and display effects, but the feature effect is statistically extreme in the expected direction. With aggregates of brands such as AOB, it may be hard to get a clear signals from all the parameters. AOB exerts additional competitive price pressure on Hills Bros., but seems to complement Folgers and both Maxwell House brands. The AOB category receives extra price pressure from Folgers, Regular Maxwell House, Hills Bros., Yuban, and AOPL. Features for the AOB category have an extra competitive effect on both Maxwell House brands. Store coupons for AOB and Regular Maxwell House have less effect on each other than otherwise expected, but store coupons for CFON do hurt the AOB category.

The private-label brands (PL 1, PL 2, PL 3 and AOPL) collectively have a baseline share of 13%. All four have significant price effects, and AOPL has a significant feature effect. AOPL exerts price pressure on both Maxwell House brands and the AOB category. While Master Blend returns the press, both Folgers and Regular Maxwell House are less price competitive than otherwise expected. There are no cross effects for features, displays, or store coupons for the private label brands.This was in part dictated by the criterion for a minimum of 53 observations before a significant residual correlation could qualify as a cross effect. This excluded all but the AOPL brand. In the category-volume model presented later in this chapter and in the brand planning exercise in Chapter 7 all the private label brands are aggregated together. If this had been done in the market-share model, more cross effects involving these brands might have been identified. If market-share analysis is done as an iterative process (as was discussed early in this book), this refinement could be undertaken.

That price is a major instrument in this market is reflected in having 11 of 12 self-price effects significant. Four self-feature effects, six self-display effects, three self-coupon effects, and seven brand-specific intercepts were significant.

Residual analysis seems to be a practical means for identifying cross effects. The criterion identified 29 cross-price effects, of which 22 were statistically significant in the final model. There were 12 cross-feature effects, 4 of which were significant in the final model; 18 display effects were identified and half of these were significant in the final model. Of the 20 cross-coupon effects identified in the residuals from the differential-effects model, 10 were significant in the final model.

Reading through a regression output like this is a tedious but useful step in developing an initial understanding of market and competitive structure. But two more elements are needed before responsible brand planning can take place. First, parameters have to be converted to elasticities before an overall picture of the structure can be achieved (see Chapter 6). And second, a category-volume model must be calibrated before a market simulator can be developed. This is the topic of the next section.

5.12.2  The Category-Volume Model

A category-volume model of the style in equation (5.33) is reported in Table 5.14.Only data from grocery chains 1 - 3 are used in this model so that the results would correspond to the competitive maps developed in Chapter 6 and the market simulator developed in Chapter 7.The private-label brands were aggregated into a single

Table 5.14: Regression Results for Category-Volume Model

 
Dep Variable: LTWVOL
Analysis of Variance
    Sum of Mean    
Source DF Squares Square F Value Prob > F
Model 31 42.88 1.38 38.29 0.01
Error 124 4.48 0.04    
C Total 155 47.36      
Root MSE 0.19 R-Square 0.91  
Dep Mean 7.55 Adj R-Sq 0.88  
C.V. 2.52      
 
Parameter Estimates
    Parm Std T for H0  
Variable DF Est Err Parm=0 Prob > |T|
INTERCEP 1 6.73 0.84 7.98 0.01
BA4-HLBR 1 -0.13 0.35 -0.36 0.72
LPR1-Folg 1 -0.74 0.38 -1.96 0.05
LPR2-RMH 1 -0.73 0.40 -1.83 0.07
LPR3-MHMB 1 0.56 0.51 1.09 0.28
LPR4-HLBR 1 -0.13 0.40 -0.33 0.74
LPR5-CFON 1 -2.09 0.42 -4.97 0.01
LPR6-Yub 1 -0.32 0.73 -0.43 0.67
LPR7-CAS 1 0.77 1.02 0.75 0.45
LPR8-AOB 1 3.25 0.25 13.08 0.01
LPRPL-APL 1 -0.67 0.45 -1.50 0.14
D1-Folg 1 0.62 0.14 4.47 0.01
D2-RMH 1 0.50 0.10 4.79 0.01
D3-MHMB 1 0.29 0.12 2.53 0.01
D4-HLBR 1 0.13 0.06 1.98 0.05
D5-CFON 1 -0.05 0.09 -0.50 0.62
D8-AOB 1 0.38 0.12 3.13 0.01
DPL-APL 1 0.05 0.10 0.48 0.63
C1-Folg 1 -0.13 0.18 -0.70 0.49
C2-RMH 1 0.08 0.10 0.81 0.42
C3-MHMB 1 0.04 0.40 0.10 0.92
C4-HLBR 1 -2.06 0.95 -2.16 0.03
C5-CFON 1 0.30 0.23 1.28 0.20
C8-AOB 1 -0.68 0.59 -1.14 0.26
CPL-APL 1 0.17 0.12 1.44 0.15
F1-Folg 1 -0.08 0.12 -0.67 0.50
F2-RMH 1 0.01 0.09 0.07 0.95
F3-MHMB 1 0.03 0.08 0.39 0.70
F4-HLBR 1 -0.01 0.10 -0.14 0.89
F5-CFON 1 0.06 0.09 0.63 0.53
F8-AOB 1 0.56 0.12 4.68 0.01
FPL-APL 1 0.01 0.10 0.06 0.95

PL brand. A preliminary model showed that lagged volume had no significant effect (t = -.96), that there were no features, displays, or coupons in Chains 1 - 3 for either Yuban or Chase & Sanborne (so that these effects were deleted). Only Hills Bros. had a distribution pattern that required a brand-absence coefficient (BA4).

The overall fit of the model is quite good (R2 = .91).This would be boosted to .99 by the inclusion of chain-specific intercepts. But this category-volume model is destined for use in the market simulator to be used in Chapter 7. We feel that the generality of the planning frame used in that chapter is enhanced by predicting volume for a generic chain rather than chain by chain.The strongest price influences on total volume come from discounts for Folgers, Maxwell House, and Chock Full O'Nuts. Discounts for these brands clearly expand the weekly volume. As prices for the aggregate AOB category increase, total volume increases - perhaps reflecting supply conditions or prestige effects for these premium brands. Displays for Folgers, both Maxwell House brands, Hills Bros., and AOB drive up category volume. Hills Bros. store coupons seem to contract total volume, reflecting the infrequent (and apparently counter-cyclical) store-couponing policy for this brand. The only significant feature effect is associated with the AOB category.

5.12.3  Combining Share and Category Volume

The choice of measures incorporated into both the market-share and category-volume models was dictated in large part by the need for a diagnostically useful market simulator. To the extent that the variables inside these markets can explain market behavior, we obtain a way of translating market history into elasticities. Chapter 6 develops methods for mapping the market and competitive structure implied by the elasticities - as well as methods for visualizing the sources driving changes in competitive structure. In Chapter 7 the market-share and category volume models are combined into a market simulator for evaluating the consequences of marketing actions for all brands.

5.13  Large-Scale Competitive Analysis

This section addresses two questions. The first concerns whether or not market-share analysis can be done on a large enough scale to be practical. Simply stated, the issue is how large is too large ? The second issue centers on the fixation managers seem to have concerning the signs of parameters developed using best linear-unbiased estimation. Simply stated, the issue is is BLUE always best ? Both of these topics will be discussed using experience arising from the implementation of market-share models on optical-scanner (POS) records of weekly store sales from Nielsen Micro-Scantrack databases and IRI store-level databases.

There are 15 steps which have been integrated into a SAS(R) macro program to perform the analytical tasks in estimating asymmetric market-share models.

  1. Form the flat file containing variables [Sales plus Marketing Instruments] and observations [Brands × Stores × Weeks].
  2. Choose the model form (MCI or MNL) and the transformations of variables (zeta-scores, exp(z-scores), or raw scores).
  3. Form the differential-effects file containing the expanded set of variables [Sales + (Instruments + 1) × Brands] for the same observations.
  4. Form the differential-effects covariance matrix and store.
  5. Estimate the differential-effects model.
  6. Find the brand intercept nearest zero and delete.
  7. Re-estimate the differential-effects model.
  8. Compute the residuals and sort by brand.
  9. Cross correlate each brand's residuals with the marketing instruments of every competitor.
  10. Tally the significant cross correlations.
  11. Form the differential cross-effect variables.
  12. Compute and store complete covariances (differential effects and cross-competitive effects).
  13. Simultaneously re-estimate the parameters for all the effects in the calibration data.
  14. Estimate or GLS weights and re-estimate parameters.
  15. Cross validate on fresh data.

5.13.1  How Large Is Too Large?

The size implications of two applications are summarized in Table 5.15. The two applications reported there involve data from IRI and A.C. Nielsen. The IRI data are those just summarized for the ground, caffeinated coffee market. The Micro-Scantrack data involve a mature category of a frequently purchased, branded good. There were around 30 brands which were represented at the brand-size level - leading to 66 competitors in the model. The IRI data tracked four marketing instruments: prices, newspaper features, store coupons, and in-store displays. These data predate the size grading of newspaper features now standard with IRI data. The Nielsen data tracked five marketing instruments: prices, major ads, line ads, coupon ads, and in-store displays. Including the brand-specific intercepts, the Step 3 differential-effects file for the IRI example has 60 variables, while the Nielsen application contains 396 differential-effect variables. With seven grocery chains reporting 52 weeks of sales, the IRI example has about 2200 observations in the calibration data set. The Nielsen example has up to 155 stores reporting each week, which translates to about 113,000 observations in 26 weeks.

Step 10 involves a user-controlled, statistical criterion for which residual correlations are translated into cross-competitive effects. In the IRI application any correlation with more than 52 observations and a significance level more extreme than .05 was selected. This produced 81 cross effects involving all marketing instruments and leading to a Step 12 covariance matrix around 140 × 140. Using the same criterion on the Nielsen example led to the identification of around 4,000 potential cross-competitive effects. This would require the computation of a 4,400 × 4,400 covariance matrix, which is too large to compute in SAS(R) on an IBM 3083. Making the required number of observations much larger and the required significance level wildly extreme still lead to around 700 potential cross-competitive effects. Finally only the 200 statistically most extreme, cross-competitive effects were selected. These most-extreme effects all involved prices.

The comparison of timing results are somewhat exaggerated by the differences in the mainframes involved. The IBM 3090 model 200 on which the smaller example was run is a enormously capable computer.

Table 5.15: Computer Resources for Two Applications

 
IRI Nielsen
Chain-Level Data Micro Scantrack Data
 
12 Brands 66 Brand-Sizes
4 Instruments 5 Instruments
  Price   Price
  Features   Major Ads
      Line Ads
  Store Coupons   Coupon Ads
  Displays   Displays
60 Differential Effects 396 Differential Effects
7 Chains/Week Up to 155 Stores/Week
52 Weeks ~ 2200 Obs. 26 Weeks ~ 113000 Obs.
 
Cross Effects
 
Obs > 50 p < .05 Obs > 50 p < .05
79 Cross Effects ~ 4000 Cross Effects
    Pick 200 Most Extreme
 
Timing
 
On IBM 3090 On IBM 3083
~ 32 CPU Seconds ~ 120 CPU Minutes
Steps 1 - 15 Steps 1 - 10
    ~ 120 CPU Minutes
    Steps 11 - 12
    ~ 10 CPU Minutes
    Step 13

While neither the vector or parallel capabilities of this machine were really involved in this illustration, the size of the problem did not tax the resources of the 3090. All 15 steps in the analysis took around 32 CPU seconds. The IBM 3083 used in the large application is an extended architecture (XA) machine, but the time and space required still reflected a substantial strain on the machine resources. The first ten steps required two hours of CPU time, most of which was spent forming the large ( ~ 400 ×400 ) covariance matrix. Forming the extended covariance matrix, including 200 cross effects, required another two hours of CPU time. Once the covariance matrix was stored, however, trying out different specifications in search of a final model only took about 10 CPU minutes per run. The estimation step was not run on the large example.

The huge number of initial cross effects in the 66-competitor example makes it clear that we can get too large unless careful judgment is exercised. The size of the analysis is quite sensitive to the number of competitors for which a full differential-effects specification is attempted. This application would have been more manageable if the 30 brands were considered the basis of the differential-effects specification, and size had been treated as a simple variable in most cases.

The 66-competitor illustration is near the limit of practicality using the system of models employed here. For comparison, however, it is useful to assess the resources needed to estimate this size illustration using the analytical methods developed by ShuganShugan, Steven M. [1987], ``Estimating Brand Positioning Maps from Supermarket Scanning Data,'' Journal of Marketing Research , XXIV (February), 1-18.for data such as these. Shugan's method requires the computation of many simple regressions. If a very fast machine required only 40 nanoseconds to compute a regression, it would take 2 ×1083 CPU seconds to complete the 66-competitor illustration. This means that if a super computer had begun at the moment of the creation of the universe, it would still not be done. In fact, the age of the universe could be taken to the seventh power and computation would still be incomplete.

5.13.2  Is BLUE Always Best?

Best linear-unbiased estimation provides the robust foundation on which the competitive-analysis system relies for its parameter estimates. But, as every analyst knows, some parameters can turn up with the ``wrong signs.'' Price parameters which are positive are difficult to explain except perhaps in prestige product classes. Negative parameters for promotions or advertising are difficult to explain - particularly to the managers running the promotions.

It seems to be left to the analyst to explain such events, as managers seem to presume that they are the consequences or quirks of the models. Analysts assume that the explanation is in the data, and the managers typically know the market conditions reflected in the data far better than the analysts.

There are several basic problems with this scenario. First is a problem of salience - are wrong-signed parameters more salient than they should be? The second problem concerns orientation. In simple constant-elasticity models the parameters are the elasticities. But complex market-response models recognize that elasticities vary as market conditions change. Management needs to know how markets respond to a firm's marketing efforts, but that knowledge is reflected far better in elasticities than in parameters. Third, there is an organizational problem. In the tension between management science and management, analysts should be more responsible for the models and managers more responsible for the data and how results are interpreted. But what one side does not understand should be the responsibility of both sides to figure out. Management scientists must develop and apply techniques across a number of managerial domains. They should not be expected to know the data of a domain with the kind of intimacy needed to manage. The second and third problems are addressed in more depth in Chapter 7, so that only the first is considered further here.

The problem of salience asks if wrong-signed parameter estimates get more attention than their frequency should command. Tables 5.16 and 5.17 summarize the parameter estimates for the two illustrations.

Table 5.16: Summary of BLUE Parameters - IRI Data

 
  Differential-Effects Model Cross-Effects Model
    R2 = .83 F218459 = 180   R2 = .93 F2051140 = 181
 
Marketing Right No. Wrong Sign Right No. Wrong Sign
Instruments Sign Signif. p < .05 Sign Signif. p < .05
Prices 11/12 9/12 0/12 11/12 11/12 0/12
Features 7/12 3/12 1/12 9/12 4/12 1*/12
Displays 9/12 8/12 0/12 9/12 6/12 0/12
Coupons 8/12 1/12 1/12 9/12 3/12 2/12
Totals 35/48 21/48 2/48 38/48 24/48 3*/48
* One aggregate brand.

Table 5.17: Summary of BLUE Parameters - Nielsen Data

 
  Cross-Effects Model
    R2 = .67 F113000446 = 503
 
Marketing     Wrong Sign
Instruments Right Sign Significant p < .05
Prices 62/66 55/66 4/66
Major Ads 50/66 35/66 1/66
Line Ads 57/66 29/66 1/66
Coupon Ads 43/66 21/66 7/66
Displays 55/66 47/66 2/66
Totals 267/330 187/330 15/330

In Table 5.16 we see that in the differential-effects model 21 of 48 parameters are statistically significant in the expected direction, while only 2 of 48 parameters are statistically extreme with the wrong sign. Moving to the cross-effects model, 24 of 48 differential effects are statistically significant in the expected direction, in spite of the inclusion of 81 cross effects. In the cross-effects model there are 3 of 48 differential-effect parameters which are statistically extreme in the unexpected direction, and one of these relates to a brand aggregate. Since brand aggregates are not expected to behave as regularly as brands, these parameters probably present no problems for the management scientist or the manager. This is certainly not different than one might expect by random chance. Yet it is very likely that these parameters will be the ones questioned by managers. The analyst is forced to track the stability of the pattern of coefficients between the differential-effects model and the cross-effects model, as well as check the possible sources of collinearity of the variables or lack of variability in the instruments in question. But because of the strong prior hypotheses of managers about the directions of marketing effects, the focus is often on the two unusual parameters, rather than the 24 significant differential effects or the 45 significant cross effects which seem to be driving the market. The burden of explanation is on the analysts who may know little about the market data from which these parameters arise.

The problem is tractable perhaps, when only a few parameters require special explanation. But with large-scale applications the number of parameters to follow can reasonably grow large. Table 5.17 summarizes the cross-effects model for the 66-competitor example. While 187 of 330 differential effects are significant in the expected direction, 15 of 330 have the wrong sign and p < .05. 15 of 330 beyond the .05 level is well within expectation, but explaining the source of these potentially anomalous effects is at least time consuming and diverting from the main task of understanding market response.

Given the strong prior hypotheses of managers, there is another approach to parameter estimation which merits study. Quadratic programming would allow us to specify a set of inequality constraints on the parameters which would correspond to the prior hypotheses of managers. Consider an estimation scheme in which the differential-effect parameters estimated in Steps 5 and 7 would be bounded by a quadratic program to conform to the prior hypotheses. The residual analysis in Steps 8 - 11 would proceed as before. But at Step 13 the cross-competitive effect parameters would be estimated against the full set of residuals, rather than recombined with the differential effects in a BLUE scheme for overall recalibration against market shares. This approach gives primacy to the explanatory power of the differential effects. Whatever they can explain which is consistent with prior hypotheses is given to them. The cross-competitive effects are used to explain the systematic part of whatever is left over.

Whenever one considers moving away from BLUE schemes, caution and study are advised. But given the strong priors regarding the effects of marketing instruments, this avenue of research should be pursued.

5.14  Appendix for Chapter 5

5.14.1  Generalized Least Squares Estimation

Nakanishi and Cooper [1974] showed that the total covariance matrix of errors Se is approximately the sum of the variance-covariance matrix among sampling errors, Se2 , and the variance-covariance matrix among specification errors, Se1 . For the simplified estimation procedures the estimate of Se2t comes from

  ^
S
 

e2t 
= 1

nt

( ^
P
 
-1
t 
- J)
(5.34)

where nt is the number of individuals (purchases) in time period t , [^(P)]-1t is an (mt ×mt) diagonal matrix with entries equal to the inverse of the market shares estimated by the OLS procedure for the mt brands in this period, and J is a conformal matrix of ones.

The variance-covariance matrix of specification errors, Se1 , is assumed to be constant in each time period and is estimated by [^(s)]e12I where

  ^
s
 
2
e1 
=
Q - T
å
t = 1 
tr ^
S
 

e2t 
+tr[( T
å
t = 1 
Zt¢Zt)-1( T
å
t = 1 
Zt¢ ^
S
 

e2t 
Zt)]

  T
å
t = 1 
mt - gK - T
 
(5.35)

where Q is the sum of squares of the OLS errors, and Zt is an (mt ×[K+T]) matrix containing the logs of the K explanatory variables with the T time-period dummy variables concatenated to it. This formula for [^(s)]e12 is considerably simpler than the one in Nakanishi and Cooper [1974, p. 308)] and also corrects a typographical error in that equation.

The total variance-covariance matrix [^(S)]e is a block-diagonal matrix in which each block is the sum of [^(S)]e2t + [^(S)]e1 .