Replication: Monte Carlo Study in Berry (1994)

23 minute read

Published:

Fixed Coefficients Random Utility (Demand) Estimation

This notebook reviews the estimation and inference of a linear random utility model when the agent is facing a finite number of alternatives.

Introduction

Consider a set of $J+1$ alternatives ${0,1,2,…,J}$. The utility that decision maker (DM) $i$ receives from buying projusct $j$ is \(u_{ij} = x_{ij}' \beta -\alpha p_j + \xi_j+ \epsilon_{ij}.\) The Decision Maker maximizes her utility \(y_i =\arg \min_{j} u_{ij}.\)

We now assume that $\epsilon_{ij}$ are $i.i.d.$ across DMs and across alternatives. In addition, we assume that $\epsilon_{ij}$ are distributed (standard) T1EV. We can write the following Conditional Choice Probabilities (CCP): \(Pr(y_i = j) = \frac{e^{x_{ij}'\beta}}{\sum_{k=0}^{J}e^{x_{ik}'\beta}}.\)

Aggregate, market-level data In Berry, Levinson, and Pakes (1995) and in many other empirical work following BLP the researcher observes only market-level data. That means that the characteristics vector of the alternatives is not indexed by $i$. The variation in product characteristics are unobserved and get absorbed by the error term $\epsilon$. The choice probabilities become \(Pr(y = j|x_j, \xi_j; \beta) \ \text{for} \ j=0,,,J, = \frac{e^{x_{j}'\beta}}{\sum_{k=0}^{J}e^{x_{k}'\beta}}.\) The left-hand side is simply the market share of product/alternative $j$. We will denote these market shares as $s_0,s_1,…,s_{J-1}$. The CCP above all have the same denominator. Moreover, for identification reasons, we normalize $x_0 = 0$. Therefore, \(\frac{s_j}{s_0} = e^{x_j'\beta}.\)

Using Berry’s Inversion (1994), and take log for both sides gives us:

\[\text{ln}(s_j) - \text{ln} (s_0) \ = \delta_j \equiv x_j' \beta - \alpha p_j + \xi_j \ \ \ \ \ \text{(eq} A)\]

where

  • Mean utility level $\delta_j$ contains the product characteristic $x_j$, $price_j$ and the aggregate error $\xi_j$.
  • Econometricians observe aggregate market share for good $j$, outside good, product characteristics $x_j$, price $p_j$.
  • $\delta_j$ is uniquely identified directly from a simple algebraic calculation involving market share.
  • This is a OLS. (We need instruments for the price!)

For the rest of this notebook, we will introduce two empirical examples:

A. Estimate logit-demand using, BLP(1995)’s aggregate market level data.

  • A1. Introduction of car data from BLP(1995)
  • A2. Data cleaning
  • A3. Run linear regression using eq(A)
  • A4. Run 2sls using instruments

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game

  • B1. Data Generating Process
  • B2. Obtain (numerically) equilibrium price and market shares
  • B3. Regress using OLS / IV (cost shifters, and competitors’ product charactersitics as instruments for price. (Table 1, Berry (1994)

A1.Introduction of car data from BLP(1995)

As an empirical study, we will replicate Table 3, as in BLP (1995).

  • We obtain product characteristics from the annual issues of the Automotive News Market Data Book and you can find BLP.csv.

  • Data includes the number of cylinders, number of doors, weight, engine displacement, horsepower, length, width, wheelbase, EPA miles per gallon rating (MPG), and dummy variables for whether the car has front-wheel drive, automatic transmission, power steering, and air conditioning.
  • The data set includes this information on observed products from 1971-1990.
  • The price variable is list retail price (in $1000’s). Prices are in 1983 dollars. (We used the Consumer Price Index to deflate.)
  • The sales variable corresponds to U.S. sales (in 1000’s) by nameplate.
  • The product characteristics correspond to the characteristics of the base model for the given nameplate.
  • To capture the cost of driving, we include milers per dollar (MP$), calculated as MPG divided by price per gallon. (Notice that MPG and pricer per gallon is provided.)

  • In terms of potential market size, there is no formal definition. we used the yearly number of households in the U.S. from Statistical Abstract of the U.S.
  • We assume that each model comprises a single firm to avoid a multi-product pricing problem.
# Query / DataFramesMeta is used for cleaning dataset
# FixedEffectModels is used for running regression
# Distributions, Random, NLsolve are used for Monte Carlo study
using CSV, DataFrames, Query, DataFramesMeta, FixedEffectModels, Distributions, Random, NLsolve
ENV["COLUMNS"],ENV["LINES"] = 350,50  #This is not specific to Julia, it's a Jupyter notebook environment variable

dataset = CSV.read("/Users/jinkim/Dropbox/2020 Summer/dropbox_RA_work/Berry/BLP.csv")
#first(dataset,10)
┌ Warning: `CSV.read(input; kw...)` is deprecated in favor of `using DataFrames; CSV.read(input, DataFrame; kw...)
│   caller = read(::String) at CSV.jl:40
└ @ CSV /Users/jinkim/.julia/packages/CSV/MKemC/src/CSV.jl:40

Variable name/short description

For detailed description, please see BLP(1995) section 7.1. (Data section)

Variable nameDescription
nameCar
idCar ID
yeYear
cyCylinder
drNumber of Doors
atAutomatic Transmission
psPower Steering
airAir Conditioning
drvFront Wheel Drive
pPrice (in $ 1000’s)
wtWeight
domDomestic
dispEngine Displacement
hpHorse Power
lngLength
wdtWidth
wbWheelbase
mpgMiles per Gallon
qQuantities
firmidsFirm ID
euroIndicator for EURO car
reliRating
dfiIndicator for Digital Fuel Injection
hp2wtHP to Weight (ratio)
sizeLength X Width (/1000)
japanJapan
cpiCPI
gaspriceGas Price per gallon
nb_hhSize of Household (Potential Market Size)
catSize Cateogry
cat(Using for nested logit)
door2I(door=2)
door3I(door=3)
door4I(door=4)
door5I(door=5)
sampleweightWeights
mpgdMiles per gallon (imputed from gas prices)
dpmDollars per miles (imputed from gas prices)
modelidCar name
#summary statistics
describe(dataset,:all)

38 rows × 13 columns

variablemeanstdminq25medianq75maxnuniquenmissingfirstlasteltype
SymbolUnion…Union…AnyUnion…Union…Union…AnyUnion…NothingAnyAnyDataType
1nameACINTEYUYUGO542ACINTEYUYUGOString
2id2560.891517.551291309.02325.03927.0559237354506Int64
3year1981.545.7408219711977.01982.01987.0199019861989Int64
4cy5.32071.5571204.04.06.01244Int64
5dr3.294090.96515422.04.04.0532Int64
6at0.3261160.46889600.00.01.0100Int64
7ps0.5331530.49901200.01.01.0110Int64
8air0.2417680.42825100.00.00.0100Int64
9drv0.3549840.47861600.00.01.0111Int64
10p11.76148.643783.393276.713758.7286513.074168.59688.483583.50726Float64
11wt2930.47722.36614452375.02861.03383.0536222491832Int64
12dom0.5899860.49194700.01.01.0100Int64
13disp177.746102.0321.0109.0151.0231.0500.097.068.0Float64
14hp117.00546.68813988.0105.0140.036511352Int64
15lng186.70520.0657139.0172.2185.0200.0236.0168.5139.0Float64
16wdt69.65575.2979553.065.969.073.081.065.660.7Float64
17wb104.6169.81714.397.0103.4110.8212.196.584.6Float64
18mpg20.99645.81079.1317.020.025.053.027.028.0Float64
19q78.80489.07990.04915.60347.35109.002646.52627.80710.5Float64
20firmids13.74386.2590918.016.019.026323Int64
21euro0.238160.42605300.00.00.0101Int64
22reli3.04331.2910812.03.04.0551Int64
23dfi0.01353180.11556300.00.00.0100Int64
24hp2wt0.3943750.09664290.1704550.3365850.3750490.4275090.9475810.5024460.283843Float64
25size1.310160.2376370.7561.131281.269831.45271.8881.105360.84373Float64
26japan0.1718540.37733800.00.00.0110Int64
27cpi88.348828.877240.560.696.5113.6130.7109.6124.0Float64
28gasprice1.027270.2065090.7853620.8267941.017881.135231.471210.8267940.810081Float64
29nb_hh81539.08770.716477874142.083527.089479.0933478845892830Int64
30catcompactmidsize3compactcompactString
31door20.3419030.47445400.00.01.0101Int64
32door30.04194860.20051700.00.00.0110Int64
33door40.5963010.49074900.01.01.0100Int64
34door50.01984660.13950500.00.00.0100Int64
35sampleweight78804.089079.94915603.047350.0109002.06465262780710500Int64
36mpgd21.12486.943018.6117815.983820.505924.800765.425632.656234.5645Float64
37dpm0.0523890.01678170.01528450.04032150.04876640.06256340.116120.0306220.0289315Float64
38modelACINTE1986YUYUGO19892172ACINTE1986YUYUGO1989String
### Replicate Table 1: Summary Stats
using Statistics

#Add or substrct column names as needed:
cnames = [:year,:cy,:dr,:at,:ps,:air,:drv,:p,:wt,:dom,:disp,:hp,:lng,:wdt,:wb,:mpg]
aggregate(dataset[!,cnames], [:year], mean)
┌ Warning: `aggregate(d, cols, f, sort=false, skipmissing=false)` is deprecated. Instead use combine(groupby(d, cols, sort=false, skipmissing=false), names(d, Not(cols)) .=> f)`
│   caller = top-level scope at In[4]:6
└ @ Core In[4]:6

20 rows × 16 columns

yearcy_meandr_meanat_meanps_meanair_meandrv_meanp_meanwt_meandom_meandisp_meanhp_meanlng_meanwdt_meanwb_meanmpg_mean
Int64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64Float64
119864.946153.438460.3307690.6846150.3230770.57692311.78262733.550.569231160.212110.192180.58367.9892101.82923.5546
219875.03.433570.3216780.720280.3916080.61538513.43632785.150.538462163.357117.028180.41667.9965101.90922.6503
319885.093333.440.40.7666670.4466670.61333314.88572847.80.533333163.836125.053181.75668.2853102.44321.88
419894.99322.99320.374150.7959180.5034010.68027216.69052895.970.496599163.503133.837181.34868.5537103.03721.7891
519905.06872.854960.404580.8167940.4580150.71755714.03842923.470.511452.7084133.588182.15668.655101.84321.8168
619716.043483.369570.1521740.1630430.00.08.856333176.080.684783245.548171.891195.44672.9891110.78717.1995
719755.860223.268820.3118280.3870970.1075270.07526889.641843328.60.645161238.5119.473195.31271.7419108.12916.2688
819765.626263.282830.2828280.3434340.101010.09090919.490073159.450.616162218.66112.889192.51570.8889106.83318.5556
919775.610533.242110.2947370.3789470.08421050.1368429.864293070.060.610526204.085109.463190.74770.4947106.03819.7789
1019805.233013.359220.2524270.3592230.1747570.25242710.72692776.990.601942179.44699.699185.39869.4854103.68421.6699
1119815.163793.215520.3362070.4913790.2413790.23275913.03522819.370.517241171.325102.017186.5669.3276103.92922.5
1219824.981823.327270.3363640.4909090.2363640.38181811.59132747.250.6158.53698.3185.27368.8636103.15123.7727
1319834.886963.304350.339130.5304350.20.43478311.14082736.820.617391156.16997.8522184.08768.7913102.85725.1478
1419844.920353.353980.3451330.6637170.2831860.47787611.64772765.410.59292166.069107.265183.82268.4637102.73524.1504
1519854.963243.316180.3455880.6691180.3308820.52205912.47642759.990.566176163.106108.081182.09668.1625102.34922.3868
1619785.578953.336840.2631580.3473680.09473680.17894710.60212964.80.642105199.494107.695189.34770.3579105.33719.9263
1719795.333333.284310.2352940.3137250.08823530.21568610.45132840.490.598039184.635104.108186.55969.5103.84920.1078
1819726.202253.449440.3370790.3483150.04494380.09.042823253.780.696629256.947134.348196.77573.3146111.56316.3317
1919736.372093.441860.3953490.3720930.06976740.09.04523337.420.709302261.949131.256198.38473.0233111.21516.2506
2019746.03.305560.3750.3750.1250.09.254733268.290.652778239.179122.556196.19471.9306108.87416.3329

A2. Data cleaning

Step 1. Obtain market share for each good $j$: $s_{jt}$ = $\frac{q_{jt}}{nb_hh_{t}}$

For notation, let denote total market size $nb_hh_t = M_t$

dataset = @linq dataset |> 
groupby([:year]) |>
transform(Total_Q = sum(:q))

dataset.s_j = dataset.q./dataset.nb_hh;

Step 2. Obtain market share for outside good 0: $s_{0t}$ = $\frac{ \Big(nb_hh_t - \sum_{k=1}^J(q_{kt}) \Big)}{nb_hh_t}$

dataset.s_0 = (dataset.nb_hh-dataset.Total_Q)./dataset.nb_hh;

Step 3. Construct dependent variable: $ \text{ln}(s_{jt}) - \text{ln}(s_{0t}) $

dataset.log_s_j_0 = log.(dataset.s_j) - log.(dataset.s_0);

A3. Run linear regression using eq(A)

Step 4. we use hp2wt, air, mpgd, size as product characteristics:

\(\text{ln}(s_j) - \text{ln} (s_0) \ = \delta_j \equiv x_j' \beta - \alpha p_j + \xi_j\)

result = reg(dataset, @formula(log_s_j_0 ~ p+hp2wt+air+mpgd+size ),  save = true,);
print(result)
                               Linear Model                               
===========================================================================
Number of obs:                  2217   Degrees of freedom:                6
R2:                            0.390   R2 Adjusted:                   0.388
F Statistic:                  282.47   p-value:                       0.000
===========================================================================
               Estimate  Std.Error   t value Pr(>|t|)  Lower 95%  Upper 95%
---------------------------------------------------------------------------
p            -0.0886398 0.00401348  -22.0855    0.000 -0.0965104 -0.0807692
hp2wt        -0.0731735   0.276701  -0.26445    0.791  -0.615794   0.469447
air          -0.0380115  0.0726455 -0.523246    0.601  -0.180472   0.104449
mpgd          0.0288388 0.00439518   6.56147    0.000  0.0202197  0.0374579
size            2.40052   0.126801   18.9315    0.000    2.15186    2.64919
(Intercept)    -10.2035    0.26002  -39.2412    0.000   -10.7134   -9.69356
===========================================================================

Step 5. Obtain Price elasticities:

Note that own price elasticities $(\eta_j$) is given by: \(\begin{align} \eta_j & = \frac{\partial Pr(j)}{\partial price_j} \underbrace{\frac{price_j}{Pr(j)}}_{\frac{price_j}{s_j \times M}} \\ & \text{Note that} \ \frac{\partial Pr(j)}{\partial price_j} = \frac{\partial s_j}{\partial price_j} \times M \ \text{where} \ s_j = \frac{e^{\delta_j}}{\sum_k^J e^{\delta_j}} \\ & \text{Appealing to chain rule}: \frac{\partial s_j}{\partial price_j} \ M = \Bigg[ \alpha \frac{e^{\delta_j}}{\sum_k^J e^{\delta_j}} - \alpha \Big( \frac{e^{\delta_j}}{\sum_k^J e^{\delta_j}}\Big)^2 \Bigg] = M \alpha [s_j - s_j^2] = M \alpha s_j[1- s_j]\\ & \text{Rearranging these terms gives us:} \\ & \eta_j = \underbrace{\frac{\partial Pr(j)}{\partial price_j}}_{M \alpha s_j[1- s_j]} \underbrace{\frac{price_j}{Pr(j)}}_{\frac{price_j}{s_j \times M}} = M \alpha s_j[1- s_j] \times \frac{price_j}{s_j} \frac{1}{M} = \underbrace{\alpha \times (1-s_j) \times price_j}_\text{price elasticities for good j} \\ & = \alpha \times (1-s_j) \times price_j \end{align}\)

# Following price elasticities, I can derive price elasticities for each good j, using price coefficients alpha.

price_coef = coef(result)[2];
dataset.e = price_coef * (ones(nrow(dataset))-dataset.s_j) .* dataset.p;

q1 = @from i in dataset begin;
            @where i.e >-1
            @select {elasticity=i.e}
            @collect DataFrame
    end;
nrow(q1)

1502

Replication: BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.

I derive the number of inelastic car model. My estimates are 1,502. BLP’s estimates were 1,494, which is pretty close.

A4. Run 2sls using instruments

Following BLP, I use the following instruments for price.

1. the sum of size at market $t$. (Note that you need to drop product $j$’s own size.)

2. the sum of size across rival firm products at market $t$.

# IV 1

dataset = @linq dataset |> 
groupby([:year]) |>
transform(Total_size = sum(:size));
dataset.iv_size1 = dataset.Total_size - dataset.size;
# IV 2

dataset = @linq dataset |> 
groupby([:year , :firmids]) |>
transform(sum_size = sum(:size));
dataset.iv_size2 = dataset.Total_size - dataset.sum_size;
# 2SLS Regression for demand estimation
# First stage: regress price on Z and X

first_stage_result = reg(dataset, @formula(p ~ iv_size1+ iv_size2+hp2wt + air+mpgd+size), save = true, );
print(first_stage_result)
                              Linear Model                              
=========================================================================
Number of obs:                 2217   Degrees of freedom:               7
R2:                           0.592   R2 Adjusted:                  0.591
F Statistic:                534.872   p-value:                      0.000
=========================================================================
              Estimate  Std.Error  t value Pr(>|t|)  Lower 95%  Upper 95%
-------------------------------------------------------------------------
iv_size1     -0.030544 0.00999712 -3.05528    0.002 -0.0501487 -0.0109393
iv_size2     0.0919841 0.00809711  11.3601    0.000  0.0761054   0.107863
hp2wt          25.8362    1.31783  19.6051    0.000    23.2518    28.4205
air            9.57023   0.327809  29.1945    0.000    8.92738    10.2131
mpgd         -0.272477  0.0270151 -10.0861    0.000  -0.325455  -0.219499
size           2.25533   0.727117  3.10174    0.002   0.829424    3.68123
(Intercept)   -5.21123    1.50736  -3.4572    0.001   -8.16721   -2.25525
=========================================================================
# Second Stage: regress log(s_j)-log(s_0) on xhat

xhat = predict(first_stage_result, dataset);
dataset.p_iv = xhat;
second_stage_result = reg(dataset, @formula(log_s_j_0 ~ p_iv+hp2wt + air+mpgd+size), save = true);
print(second_stage_result)
                                Linear Model                                
============================================================================
Number of obs:                   2217  Degrees of freedom:                 6
R2:                             0.355  R2 Adjusted:                    0.354
F Statistic:                  243.848  p-value:                        0.000
============================================================================
                Estimate  Std.Error  t value Pr(>|t|)  Lower 95%   Upper 95%
----------------------------------------------------------------------------
p_iv           -0.289427  0.0156063 -18.5455    0.000  -0.320032   -0.258823
hp2wt            5.63223   0.513603  10.9661    0.000    4.62503     6.63942
air              2.18099   0.182327  11.9619    0.000    1.82344     2.53854
mpgd         -0.00984143 0.00536771 -1.83345    0.067 -0.0203677 0.000684853
size             2.19676   0.131213  16.7419    0.000    1.93944     2.45407
(Intercept)      -9.5444   0.271767 -35.1198    0.000   -10.0773    -9.01145
============================================================================
price_coef = coef(second_stage_result)[2];
dataset.e_iv = price_coef * (ones(nrow(dataset))-dataset.s_j) .* dataset.p;
q1 = @from i in dataset begin;
            @where i.e_iv >-1
            @select {number_of_children=i.e_iv}
            @collect DataFrame
    end;

Comparision with BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.

Note that I have slightly different price coefficients, I observe number of inelastic demand good is 2. BLP estimates were 22.

nrow(q1)
2

Step 6. Discussion : IV regressions

Reported price coefficient ($\alpha$) is -0.0886 in OLS.

Now we have -0.2894 ($\alpha$) in IV regression. Prices are upward biased in OLS.

# OLS Results
print(result)
                               Linear Model                               
===========================================================================
Number of obs:                  2217   Degrees of freedom:                6
R2:                            0.390   R2 Adjusted:                   0.388
F Statistic:                  282.47   p-value:                       0.000
===========================================================================
               Estimate  Std.Error   t value Pr(>|t|)  Lower 95%  Upper 95%
---------------------------------------------------------------------------
p            -0.0886398 0.00401348  -22.0855    0.000 -0.0965104 -0.0807692
hp2wt        -0.0731735   0.276701  -0.26445    0.791  -0.615794   0.469447
air          -0.0380115  0.0726455 -0.523246    0.601  -0.180472   0.104449
mpgd          0.0288388 0.00439518   6.56147    0.000  0.0202197  0.0374579
size            2.40052   0.126801   18.9315    0.000    2.15186    2.64919
(Intercept)    -10.2035    0.26002  -39.2412    0.000   -10.7134   -9.69356
===========================================================================
# 2SLS Results
print(second_stage_result)
                                Linear Model                                
============================================================================
Number of obs:                   2217  Degrees of freedom:                 6
R2:                             0.355  R2 Adjusted:                    0.354
F Statistic:                  243.848  p-value:                        0.000
============================================================================
                Estimate  Std.Error  t value Pr(>|t|)  Lower 95%   Upper 95%
----------------------------------------------------------------------------
p_iv           -0.289427  0.0156063 -18.5455    0.000  -0.320032   -0.258823
hp2wt            5.63223   0.513603  10.9661    0.000    4.62503     6.63942
air              2.18099   0.182327  11.9619    0.000    1.82344     2.53854
mpgd         -0.00984143 0.00536771 -1.83345    0.067 -0.0203677 0.000684853
size             2.19676   0.131213  16.7419    0.000    1.93944     2.45407
(Intercept)      -9.5444   0.271767 -35.1198    0.000   -10.0773    -9.01145
============================================================================

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game

  • B1. Data Generating Process
  • B2. Obtain (numerically) equilibrium price and market shares
  • B3. Regress using OLS / IV

B1. Data Generating Process

Market is characterized by dupoly firms which procuce single good, with aggregate market shares, and price for each good. We assume that duopoly firms compete in 500 “independent” (isolated) markets.

In D.G.P., we solve Nash-Bertrand game so that we derive dupoly firm’s price, and market shares. We use cost shifters and product characteritics to numerically solve this game. Since it is D.G.P. we use true parameters to obtain price, and market shares.

As an econometrician, we observe dupoly firm’s market share, price, costs, and product characteristics.

The utility of each consumer $i$ in each market is given by:

\begin{equation} u_{ij} = \beta_0 + \beta_1 x_j + \sigma_d \xi_j - \alpha p_j + \epsilon_{ij} \end{equation} Marginal cost is constrained to be postitive and given by:

\begin{equation} c_j = e^{\gamma_0 + \gamma_x x_j + \sigma_c \xi_j + \gamma_w w_j + \sigma_\omega \omega_j} \end{equation}

The exogenous data $x_j, \xi_j, w_j, $ and $\omega_j$ are all created standard normal random variables.

True parameter is given by:

ParameterTrue ValueDescription
$ \beta_0$5Intercept (demand)
$ \beta_x$2Utility from good $x$
$ \sigma_d$1 / 3 (second monte carlo)Covariance
$\alpha $1Price coefficients
$ \gamma_0$1Intercept (Supply)
$\gamma_x $5Cost from good $x$
$ \sigma_c$0.25Covariance $(\xi_j)$
$ \gamma_w$0.25Parameters for input costs
$ \sigma_\omega$0.25Covariance $(\omega_j)$

B2. Obtain (numerically) equilibrium price and market shares (still D.G.P)

I solve nonlinear equation where under j=1,2. (Argument: $p_1, p_2, s_1(p_1, p_2), s_2(p_1,p_2)$) $X$ is a vector dupoly firm’s product charactersitics $ X = { x_1, x_2 }$ Note that $s_0 = 1-s_1-s_2$ \(\begin{align} p_1 & = c_1 - \frac{1}{\alpha (1-s_1)} \\ p_2 & = c_2 - \frac{1}{\alpha (1-s_2)} \\ & \text{Note that $s_1$, and $s_2$ is given by} \\ s_1(X,p_1, p_2) & = \frac{ \text{exp}^{\beta_0 + \beta_1 x_1 + \sigma_d \xi_1 - \alpha p_1}}{1+\text{exp}^{\beta_0 + \beta_1 x_1 + \sigma_d \xi_1 - \alpha p_1} +\text{exp}^{\beta_0 + \beta_1 x_2 + \sigma_d \xi_2 - \alpha p_2} } \\ s_2(X,p_1, p_2) & = \frac{ \text{exp}^{\beta_0 + \beta_1 x_2 + \sigma_d \xi_2 - \alpha p_2}}{1+\text{exp}^{\beta_0 + \beta_1 x_1 + \sigma_d \xi_1 - \alpha p_1} + \text{exp}^{\beta_0 + \beta_1 x_2 + \sigma_d \xi_2 - \alpha p_2}} \end{align}\)

Using Nonlinear solver, we can obtain equilibrium outcome: $p_1, p_2, s_1, s_2 (s_0= 1-s_1-s_2).$ One might concern about multiple equilibria for this game. Since we solve single product under duopoly (which is simple market), we observe unique solution for this Monte Carlo Study. Please see Nalebuff (1991) for multi-product firm problem, or uniquness of this game.

B3. Regress using OLS / IV

Following Berry’s inversion an econometrician run following OLS/IV regression. An econometrician observes price, product characteristics, and cost shifters for 500 independent dupoly markets.

\begin{align} \text{ln} (s_j) - \text{ln} (s_0) \ & = \delta_j
& = \beta_0 + \beta_1 x_j - \alpha p_j + \sigma_d \xi_j \end{align}

For IV, I use cost shifters, and competitors’ product characteristics, as in Berry (1994)

Note that in OLS, since an econometrician cannot observe $\xi_j$ term, price coefficients $\alpha$ is upward biased. In the IV regression, the observed cost factors, $w_j$, and the product characteristic of the rival firm are used as instruments for price.

For each simulation, we independently draw these for 500 markets, as in Berry(1994).

Repeat 100 times, and report Monte Racrlo results

# Define parameters, as in Berry's 1994 Monte Carlo

beta_0 = 5.0
beta_x = 2.0
sigma_d = 1.0
alpha = -1.0

gamma_0 = 1.0
gamma_x = 0.5

sigma_c = 0.25
gamma_w = 0.25
sigma_omega = 0.25

T = 500
S = 100
d = Normal()
Normal{Float64}(μ=0.0, σ=1.0)
# Define Non-linear solver

function f!(F, x)
    
    # D.G.P for true costs
    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])
    
    # Derive equillibrium price / quantity(markset shares)
    price_1 = cost_1 - 1/(alpha*(1-x[1]))
    price_2 = cost_2 - 1/(alpha*(1-x[2]))    
    w
    #x[1]: market share of good 1
    #x[2]: market share of good 2
    
    denom = 1 + exp(beta_0 + beta_x*data_temp1[:x] + sigma_d*data_temp1[:xi] + alpha*price_1)  + exp(beta_0 + beta_x*data_temp2[:x] + sigma_d*data_temp2[:xi] + alpha*price_2) 
    F[1] = x[1] - exp(beta_0 + beta_x*data_temp1[:x] + sigma_d*data_temp1[:xi] + alpha*price_1)/denom
    F[2] = x[2] - exp(beta_0 + beta_x*data_temp2[:x] + sigma_d*data_temp2[:xi] + alpha*price_2)/denom
end

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=1$

### It takes 5 seconds, at the current workflow
sigma_d = 1

for s = 1:S
    ### Step B1. D.G.P.
    
    # If you want to have same results, you need to assign random seed
    Random.seed!(s*100+T+1)
    x_1 = rand(d, T);
    Random.seed!(s*100+T+2)
    xi_1 = rand(d, T);
    Random.seed!(s*100+T+3)
    w_1 = rand(d, T);
    Random.seed!(s*100+T+4)
    omega_1 = rand(d, T);
    
    Random.seed!(s*100+T+5)
    x_2 = rand(d, T);
    Random.seed!(s*100+T+6)
    xi_2 = rand(d, T);
    Random.seed!(s*100+T+7)
    w_2 = rand(d, T);
    Random.seed!(s*100+T+8)
    omega_2 = rand(d, T);

    data_1 = DataFrame(x = x_1[1:T], xi = xi_1[1:T], w = w_1[1:T], omega = omega_1[1:T], iv=x_2[1:T]);
    data_2 = DataFrame(x = x_2[1:T], xi = xi_2[1:T], w = w_2[1:T], omega = omega_2[1:T], iv=x_1[1:T]);

    # For the first periods

    data_temp1 = data_1[1,:]
    data_temp2 = data_2[1,:]
    global data_temp1, data_temp2
    
    ### Step B2. Solve Equilibrium price and market shares using nonlinear-solver
    
    a= nlsolve(f!, [0.1; 0.1])

    vector_s1 = [a.zero[1]]
    vector_s2 = [a.zero[2]]
    vector_s0 = [1-a.zero[1]-a.zero[2]]

    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

    vector_p1 = [cost_1 - 1/(alpha*(1-a.zero[1]))]
    vector_p2 = [cost_2 - 1/(alpha*(1-a.zero[2]))]

    vector_delta_1 = [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]
    vector_delta_2 = [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]

    # From the second market to T markets.
    t=2
    for t = 2:T

        data_temp1 = data_1[t,:]
        data_temp2 = data_2[t,:]
        
        # Step 1. Solve Equilibrium price / market shares
        
        a= nlsolve(f!, [0.0; 0.0])

        append!(vector_s1, [a.zero[1]]);
        append!(vector_s2, [a.zero[2]]);
        append!(vector_s0, [1-a.zero[1]-a.zero[2]]);
        cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
        cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

        append!(vector_p1, [cost_1 - 1/(alpha*(1-a.zero[1]))]);
        append!(vector_p2, [cost_2 - 1/(alpha*(1-a.zero[2]))]);

        append!(vector_delta_1, [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]);
        append!(vector_delta_2, [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]);

    end

    data_1.price = vector_p1;
    data_2.price = vector_p2;

    data_1.s = vector_s1;
    data_2.s = vector_s2;

    data_1.delta = vector_delta_1;
    data_2.delta = vector_delta_2;

    data_1.log_sj_s0 = log.(vector_s1) - log.(vector_s0);
    data_2.log_sj_s0 = log.(vector_s2) - log.(vector_s0);
    
    # Merge into dataset
    data_merged =  append!(data_1, data_2);
    
    ### B3. Regress using OLS / IV
    
    ## OLS Regression
    
    ols_result = reg(data_merged, @formula(log_sj_s0 ~ x + price), save = true, Vcov.robust());
    ols_cons = coef(ols_result)[1];
    ols_x = coef(ols_result)[2];
    ols_p = coef(ols_result)[3];

    ## IV Regression
    first_stage_result = reg(data_merged, @formula(price ~  iv + w +x), save = true, Vcov.robust());

    xhat = predict(first_stage_result, data_merged);
    data_merged.xhat = xhat;

    iv_result = reg(data_merged, @formula(log_sj_s0 ~  x + xhat), save = true, Vcov.robust());
    iv_cons = coef(iv_result)[1];
    iv_x = coef(iv_result)[2];
    iv_p = coef(iv_result)[3];
    
    if s == 1
        vector_ols_cons = [ols_cons]
        vector_ols_x = [ols_x]
        vector_ols_p = [ols_p]
    
        vector_iv_cons = [iv_cons]
        vector_iv_x = [iv_x]
        vector_iv_p = [iv_p]
        
        # Store Monte Carlo Results
        global vector_ols_cons, vector_ols_x, vector_ols_p, vector_iv_cons, vector_iv_x, vector_iv_p
    else

        append!(vector_ols_cons, [ols_cons])
        append!(vector_ols_x, [ols_x])
        append!(vector_ols_p, [ols_p])
        
        append!(vector_iv_cons, [iv_cons])
        append!(vector_iv_x, [iv_x])
        append!(vector_iv_p, [iv_p])

    end
end

print("Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)")

result_summary =DataFrame( True_parameter = [beta_0, beta_x, alpha], OLS_mean = [mean(vector_ols_cons),mean(vector_ols_x),mean(vector_ols_p)], OLS_se = [std(vector_ols_cons),std(vector_ols_x),std(vector_ols_p)],
    IV_mean = [mean(vector_iv_cons),mean(vector_iv_x),mean(vector_iv_p)], IV_se =[std(vector_iv_cons),std(vector_iv_x),std(vector_iv_p)]);
print("Result Summary")
print(result_summary)
Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)Result Summary3×5 DataFrame
│ Row │ True_parameter │ OLS_mean │ OLS_se    │ IV_mean  │ IV_se     │
│     │ [90mFloat64[39m        │ [90mFloat64[39m  │ [90mFloat64[39m   │ [90mFloat64[39m  │ [90mFloat64[39m   │
├─────┼────────────────┼──────────┼───────────┼──────────┼───────────┤
│ 1   │ 5.0            │ 3.1872   │ 0.235697  │ 5.01814  │ 0.266852  │
│ 2   │ 2.0            │ 1.33611  │ 0.0742151 │ 2.01013  │ 0.0994837 │
│ 3   │ -1.0           │ -0.63979 │ 0.0482097 │ -1.00436 │ 0.0513297 │

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=3$

sigma_d = 3.0

for s = 1:S
    ### Step B1. D.G.P.
    
    # If you want to have same results, you need to assign random seed
    Random.seed!(s*100+T+1)
    x_1 = rand(d, T);
    Random.seed!(s*100+T+2)
    xi_1 = rand(d, T);
    Random.seed!(s*100+T+3)
    w_1 = rand(d, T);
    Random.seed!(s*100+T+4)
    omega_1 = rand(d, T);
    
    Random.seed!(s*100+T+5)
    x_2 = rand(d, T);
    Random.seed!(s*100+T+6)
    xi_2 = rand(d, T);
    Random.seed!(s*100+T+7)
    w_2 = rand(d, T);
    Random.seed!(s*100+T+8)
    omega_2 = rand(d, T);

    data_1 = DataFrame(x = x_1[1:T], xi = xi_1[1:T], w = w_1[1:T], omega = omega_1[1:T], iv=x_2[1:T]);
    data_2 = DataFrame(x = x_2[1:T], xi = xi_2[1:T], w = w_2[1:T], omega = omega_2[1:T], iv=x_1[1:T]);

    # For the first periods

    data_temp1 = data_1[1,:]
    data_temp2 = data_2[1,:]
    global data_temp1, data_temp2
    
    ### Step B2. Solve Equilibrium price and market shares using nonlinear-solver
    
    a= nlsolve(f!, [0.1; 0.1])

    vector_s1 = [a.zero[1]]
    vector_s2 = [a.zero[2]]
    vector_s0 = [1-a.zero[1]-a.zero[2]]

    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

    vector_p1 = [cost_1 - 1/(alpha*(1-a.zero[1]))]
    vector_p2 = [cost_2 - 1/(alpha*(1-a.zero[2]))]

    vector_delta_1 = [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]
    vector_delta_2 = [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]

    # From the second market to T markets.
    t=2
    for t = 2:T

        data_temp1 = data_1[t,:]
        data_temp2 = data_2[t,:]
        
        # Step 1. Solve Equilibrium price / market shares
        
        a= nlsolve(f!, [0.0; 0.0])

        append!(vector_s1, [a.zero[1]]);
        append!(vector_s2, [a.zero[2]]);
        append!(vector_s0, [1-a.zero[1]-a.zero[2]]);
        cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
        cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

        append!(vector_p1, [cost_1 - 1/(alpha*(1-a.zero[1]))]);
        append!(vector_p2, [cost_2 - 1/(alpha*(1-a.zero[2]))]);

        append!(vector_delta_1, [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]);
        append!(vector_delta_2, [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]);

    end

    data_1.price = vector_p1;
    data_2.price = vector_p2;

    data_1.s = vector_s1;
    data_2.s = vector_s2;

    data_1.delta = vector_delta_1;
    data_2.delta = vector_delta_2;

    data_1.log_sj_s0 = log.(vector_s1) - log.(vector_s0);
    data_2.log_sj_s0 = log.(vector_s2) - log.(vector_s0);
    
    # Merge into dataset
    data_merged =  append!(data_1, data_2);
    
    ### B3. Regress using OLS / IV
    
    ## OLS Regression
    
    ols_result = reg(data_merged, @formula(log_sj_s0 ~ x + price), save = true, Vcov.robust());
    ols_cons = coef(ols_result)[1];
    ols_x = coef(ols_result)[2];
    ols_p = coef(ols_result)[3];

    ## IV Regression
    first_stage_result = reg(data_merged, @formula(price ~  iv + w +x), save = true, Vcov.robust());

    xhat = predict(first_stage_result, data_merged);
    data_merged.xhat = xhat;

    iv_result = reg(data_merged, @formula(log_sj_s0 ~  x + xhat), save = true, Vcov.robust());
    iv_cons = coef(iv_result)[1];
    iv_x = coef(iv_result)[2];
    iv_p = coef(iv_result)[3];
    
    if s == 1
        vector_ols_cons = [ols_cons]
        vector_ols_x = [ols_x]
        vector_ols_p = [ols_p]
    
        vector_iv_cons = [iv_cons]
        vector_iv_x = [iv_x]
        vector_iv_p = [iv_p]
        
        # Store Monte Carlo Results
        global vector_ols_cons, vector_ols_x, vector_ols_p, vector_iv_cons, vector_iv_x, vector_iv_p
    else

        append!(vector_ols_cons, [ols_cons])
        append!(vector_ols_x, [ols_x])
        append!(vector_ols_p, [ols_p])
        
        append!(vector_iv_cons, [iv_cons])
        append!(vector_iv_x, [iv_x])
        append!(vector_iv_p, [iv_p])

    end
end

print("Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)")

result_summary =DataFrame( True_parameter = [beta_0, beta_x, alpha], OLS_mean = [mean(vector_ols_cons),mean(vector_ols_x),mean(vector_ols_p)], OLS_se = [std(vector_ols_cons),std(vector_ols_x),std(vector_ols_p)],
    IV_mean = [mean(vector_iv_cons),mean(vector_iv_x),mean(vector_iv_p)], IV_se =[std(vector_iv_cons),std(vector_iv_x),std(vector_iv_p)]);
print("Result Summary")
print(result_summary)
Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 3)3×5 DataFrame
│ Row │ True_parameter │ OLS_mean  │ OLS_se    │ IV_mean  │ IV_se    │
│     │ [90mFloat64[39m        │ [90mFloat64[39m   │ [90mFloat64[39m   │ [90mFloat64[39m  │ [90mFloat64[39m  │
├─────┼────────────────┼───────────┼───────────┼──────────┼──────────┤
│ 1   │ 5.0            │ -0.762803 │ 0.418897  │ 5.03208  │ 0.847055 │
│ 2   │ 2.0            │ 0.0195958 │ 0.115257  │ 1.99649  │ 0.301829 │
│ 3   │ -1.0           │ 0.105563  │ 0.0831606 │ -1.00709 │ 0.166926 │

Reference

Berry, Steven T. “Estimating discrete-choice models of product differentiation.” The RAND Journal of Economics (1994): 242-262.

Berry, Steven, James Levinsohn, and Ariel Pakes. “Automobile prices in market equilibrium.” Econometrica: Journal of the Econometric Society (1995): 841-890.