Replication: Monte Carlo Study in Berry (1994)

23 minute read

Published: December 19, 2020

Fixed Coefficients Random Utility (Demand) Estimation

This notebook reviews the estimation and inference of a linear random utility model when the agent is facing a finite number of alternatives.

Introduction

Consider a set of $J+1$ alternatives ${0,1,2,…,J}$. The utility that decision maker (DM) $i$ receives from buying projusct $j$ is $u_{ij} = x_{ij}' \beta -\alpha p_j + \xi_j+ \epsilon_{ij}.$ The Decision Maker maximizes her utility $y_i =\arg \min_{j} u_{ij}.$

We now assume that $\epsilon_{ij}$ are $i.i.d.$ across DMs and across alternatives. In addition, we assume that $\epsilon_{ij}$ are distributed (standard) T1EV. We can write the following Conditional Choice Probabilities (CCP): $Pr(y_i = j) = \frac{e^{x_{ij}'\beta}}{\sum_{k=0}^{J}e^{x_{ik}'\beta}}.$

Aggregate, market-level data In Berry, Levinson, and Pakes (1995) and in many other empirical work following BLP the researcher observes only market-level data. That means that the characteristics vector of the alternatives is not indexed by $i$. The variation in product characteristics are unobserved and get absorbed by the error term $\epsilon$. The choice probabilities become $Pr(y = j|x_j, \xi_j; \beta) \ \text{for} \ j=0,,,J, = \frac{e^{x_{j}'\beta}}{\sum_{k=0}^{J}e^{x_{k}'\beta}}.$ The left-hand side is simply the market share of product/alternative $j$. We will denote these market shares as $s_0,s_1,…,s_{J-1}$. The CCP above all have the same denominator. Moreover, for identification reasons, we normalize $x_0 = 0$. Therefore, $\frac{s_j}{s_0} = e^{x_j'\beta}.$

Using Berry’s Inversion (1994), and take log for both sides gives us:

\[\text{ln}(s_j) - \text{ln} (s_0) \ = \delta_j \equiv x_j' \beta - \alpha p_j + \xi_j \ \ \ \ \ \text{(eq} A)\]

where

Mean utility level $\delta_j$ contains the product characteristic $x_j$, $price_j$ and the aggregate error $\xi_j$.
Econometricians observe aggregate market share for good $j$, outside good, product characteristics $x_j$, price $p_j$.
$\delta_j$ is uniquely identified directly from a simple algebraic calculation involving market share.
This is a OLS. (We need instruments for the price!)

For the rest of this notebook, we will introduce two empirical examples:

A. Estimate logit-demand using, BLP(1995)’s aggregate market level data.

A1. Introduction of car data from BLP(1995)
A2. Data cleaning
A3. Run linear regression using eq(A)
A4. Run 2sls using instruments

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game

B1. Data Generating Process
B2. Obtain (numerically) equilibrium price and market shares
B3. Regress using OLS / IV (cost shifters, and competitors’ product charactersitics as instruments for price. (Table 1, Berry (1994)

A1.Introduction of car data from BLP(1995)

As an empirical study, we will replicate Table 3, as in BLP (1995).

We obtain product characteristics from the annual issues of the Automotive News Market Data Book and you can find BLP.csv.
Data includes the number of cylinders, number of doors, weight, engine displacement, horsepower, length, width, wheelbase, EPA miles per gallon rating (MPG), and dummy variables for whether the car has front-wheel drive, automatic transmission, power steering, and air conditioning.
The data set includes this information on observed products from 1971-1990.
The price variable is list retail price (in $1000’s). Prices are in 1983 dollars. (We used the Consumer Price Index to deflate.)
The sales variable corresponds to U.S. sales (in 1000’s) by nameplate.
The product characteristics correspond to the characteristics of the base model for the given nameplate.
To capture the cost of driving, we include milers per dollar (MP$), calculated as MPG divided by price per gallon. (Notice that MPG and pricer per gallon is provided.)
In terms of potential market size, there is no formal definition. we used the yearly number of households in the U.S. from Statistical Abstract of the U.S.
We assume that each model comprises a single firm to avoid a multi-product pricing problem.

# Query / DataFramesMeta is used for cleaning dataset
# FixedEffectModels is used for running regression
# Distributions, Random, NLsolve are used for Monte Carlo study
using CSV, DataFrames, Query, DataFramesMeta, FixedEffectModels, Distributions, Random, NLsolve

ENV["COLUMNS"],ENV["LINES"] = 350,50  #This is not specific to Julia, it's a Jupyter notebook environment variable

dataset = CSV.read("/Users/jinkim/Dropbox/2020 Summer/dropbox_RA_work/Berry/BLP.csv")
#first(dataset,10)

┌ Warning: `CSV.read(input; kw...)` is deprecated in favor of `using DataFrames; CSV.read(input, DataFrame; kw...)
│   caller = read(::String) at CSV.jl:40
└ @ CSV /Users/jinkim/.julia/packages/CSV/MKemC/src/CSV.jl:40

Variable name/short description

For detailed description, please see BLP(1995) section 7.1. (Data section)

Variable name	Description
name	Car
id	Car ID
ye	Year
cy	Cylinder
dr	Number of Doors
at	Automatic Transmission
ps	Power Steering
air	Air Conditioning
drv	Front Wheel Drive
p	Price (in $ 1000’s)
wt	Weight
dom	Domestic
disp	Engine Displacement
hp	Horse Power
lng	Length
wdt	Width
wb	Wheelbase
mpg	Miles per Gallon
q	Quantities
firmids	Firm ID
euro	Indicator for EURO car
reli	Rating
dfi	Indicator for Digital Fuel Injection
hp2wt	HP to Weight (ratio)
size	Length X Width (/1000)
japan	Japan
cpi	CPI
gasprice	Gas Price per gallon
nb_hh	Size of Household (Potential Market Size)
cat	Size Cateogry
cat	(Using for nested logit)
door2	I(door=2)
door3	I(door=3)
door4	I(door=4)
door5	I(door=5)
sampleweight	Weights
mpgd	Miles per gallon (imputed from gas prices)
dpm	Dollars per miles (imputed from gas prices)
modelid	Car name

#summary statistics
describe(dataset,:all)

38 rows × 13 columns

	variable	mean	std	min	q25	median	q75	max	nunique	nmissing	first	last	eltype
	Symbol	Union…	Union…	Any	Union…	Union…	Union…	Any	Union…	Nothing	Any	Any	DataType
1	name			ACINTE				YUYUGO	542		ACINTE	YUYUGO	String
2	id	2560.89	1517.55	129	1309.0	2325.0	3927.0	5592			3735	4506	Int64
3	year	1981.54	5.74082	1971	1977.0	1982.0	1987.0	1990			1986	1989	Int64
4	cy	5.3207	1.55712	0	4.0	4.0	6.0	12			4	4	Int64
5	dr	3.29409	0.965154	2	2.0	4.0	4.0	5			3	2	Int64
6	at	0.326116	0.468896	0	0.0	0.0	1.0	1			0	0	Int64
7	ps	0.533153	0.499012	0	0.0	1.0	1.0	1			1	0	Int64
8	air	0.241768	0.428251	0	0.0	0.0	0.0	1			0	0	Int64
9	drv	0.354984	0.478616	0	0.0	0.0	1.0	1			1	1	Int64
10	p	11.7614	8.64378	3.39327	6.71375	8.72865	13.0741	68.5968			8.48358	3.50726	Float64
11	wt	2930.47	722.366	1445	2375.0	2861.0	3383.0	5362			2249	1832	Int64
12	dom	0.589986	0.491947	0	0.0	1.0	1.0	1			0	0	Int64
13	disp	177.746	102.032	1.0	109.0	151.0	231.0	500.0			97.0	68.0	Float64
14	hp	117.005	46.6881	39	88.0	105.0	140.0	365			113	52	Int64
15	lng	186.705	20.0657	139.0	172.2	185.0	200.0	236.0			168.5	139.0	Float64
16	wdt	69.6557	5.29795	53.0	65.9	69.0	73.0	81.0			65.6	60.7	Float64
17	wb	104.616	9.817	14.3	97.0	103.4	110.8	212.1			96.5	84.6	Float64
18	mpg	20.9964	5.8107	9.13	17.0	20.0	25.0	53.0			27.0	28.0	Float64
19	q	78.804	89.0799	0.049	15.603	47.35	109.002	646.526			27.807	10.5	Float64
20	firmids	13.7438	6.25909	1	8.0	16.0	19.0	26			3	23	Int64
21	euro	0.23816	0.426053	0	0.0	0.0	0.0	1			0	1	Int64
22	reli	3.0433	1.29108	1	2.0	3.0	4.0	5			5	1	Int64
23	dfi	0.0135318	0.115563	0	0.0	0.0	0.0	1			0	0	Int64
24	hp2wt	0.394375	0.0966429	0.170455	0.336585	0.375049	0.427509	0.947581			0.502446	0.283843	Float64
25	size	1.31016	0.237637	0.756	1.13128	1.26983	1.4527	1.888			1.10536	0.84373	Float64
26	japan	0.171854	0.377338	0	0.0	0.0	0.0	1			1	0	Int64
27	cpi	88.3488	28.8772	40.5	60.6	96.5	113.6	130.7			109.6	124.0	Float64
28	gasprice	1.02727	0.206509	0.785362	0.826794	1.01788	1.13523	1.47121			0.826794	0.810081	Float64
29	nb_hh	81539.0	8770.71	64778	74142.0	83527.0	89479.0	93347			88458	92830	Int64
30	cat			compact				midsize	3		compact	compact	String
31	door2	0.341903	0.474454	0	0.0	0.0	1.0	1			0	1	Int64
32	door3	0.0419486	0.200517	0	0.0	0.0	0.0	1			1	0	Int64
33	door4	0.596301	0.490749	0	0.0	1.0	1.0	1			0	0	Int64
34	door5	0.0198466	0.139505	0	0.0	0.0	0.0	1			0	0	Int64
35	sampleweight	78804.0	89079.9	49	15603.0	47350.0	109002.0	646526			27807	10500	Int64
36	mpgd	21.1248	6.94301	8.61178	15.9838	20.5059	24.8007	65.4256			32.6562	34.5645	Float64
37	dpm	0.052389	0.0167817	0.0152845	0.0403215	0.0487664	0.0625634	0.11612			0.030622	0.0289315	Float64
38	model			ACINTE1986				YUYUGO1989	2172		ACINTE1986	YUYUGO1989	String

### Replicate Table 1: Summary Stats
using Statistics

#Add or substrct column names as needed:
cnames = [:year,:cy,:dr,:at,:ps,:air,:drv,:p,:wt,:dom,:disp,:hp,:lng,:wdt,:wb,:mpg]
aggregate(dataset[!,cnames], [:year], mean)

┌ Warning: `aggregate(d, cols, f, sort=false, skipmissing=false)` is deprecated. Instead use combine(groupby(d, cols, sort=false, skipmissing=false), names(d, Not(cols)) .=> f)`
│   caller = top-level scope at In[4]:6
└ @ Core In[4]:6

20 rows × 16 columns

	year	cy_mean	dr_mean	at_mean	ps_mean	air_mean	drv_mean	p_mean	wt_mean	dom_mean	disp_mean	hp_mean	lng_mean	wdt_mean	wb_mean	mpg_mean
	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1986	4.94615	3.43846	0.330769	0.684615	0.323077	0.576923	11.7826	2733.55	0.569231	160.212	110.192	180.583	67.9892	101.829	23.5546
2	1987	5.0	3.43357	0.321678	0.72028	0.391608	0.615385	13.4363	2785.15	0.538462	163.357	117.028	180.416	67.9965	101.909	22.6503
3	1988	5.09333	3.44	0.4	0.766667	0.446667	0.613333	14.8857	2847.8	0.533333	163.836	125.053	181.756	68.2853	102.443	21.88
4	1989	4.9932	2.9932	0.37415	0.795918	0.503401	0.680272	16.6905	2895.97	0.496599	163.503	133.837	181.348	68.5537	103.037	21.7891
5	1990	5.0687	2.85496	0.40458	0.816794	0.458015	0.717557	14.0384	2923.47	0.51145	2.7084	133.588	182.156	68.655	101.843	21.8168
6	1971	6.04348	3.36957	0.152174	0.163043	0.0	0.0	8.85633	3176.08	0.684783	245.548	171.891	195.446	72.9891	110.787	17.1995
7	1975	5.86022	3.26882	0.311828	0.387097	0.107527	0.0752688	9.64184	3328.6	0.645161	238.5	119.473	195.312	71.7419	108.129	16.2688
8	1976	5.62626	3.28283	0.282828	0.343434	0.10101	0.0909091	9.49007	3159.45	0.616162	218.66	112.889	192.515	70.8889	106.833	18.5556
9	1977	5.61053	3.24211	0.294737	0.378947	0.0842105	0.136842	9.86429	3070.06	0.610526	204.085	109.463	190.747	70.4947	106.038	19.7789
10	1980	5.23301	3.35922	0.252427	0.359223	0.174757	0.252427	10.7269	2776.99	0.601942	179.446	99.699	185.398	69.4854	103.684	21.6699
11	1981	5.16379	3.21552	0.336207	0.491379	0.241379	0.232759	13.0352	2819.37	0.517241	171.325	102.017	186.56	69.3276	103.929	22.5
12	1982	4.98182	3.32727	0.336364	0.490909	0.236364	0.381818	11.5913	2747.25	0.6	158.536	98.3	185.273	68.8636	103.151	23.7727
13	1983	4.88696	3.30435	0.33913	0.530435	0.2	0.434783	11.1408	2736.82	0.617391	156.169	97.8522	184.087	68.7913	102.857	25.1478
14	1984	4.92035	3.35398	0.345133	0.663717	0.283186	0.477876	11.6477	2765.41	0.59292	166.069	107.265	183.822	68.4637	102.735	24.1504
15	1985	4.96324	3.31618	0.345588	0.669118	0.330882	0.522059	12.4764	2759.99	0.566176	163.106	108.081	182.096	68.1625	102.349	22.3868
16	1978	5.57895	3.33684	0.263158	0.347368	0.0947368	0.178947	10.6021	2964.8	0.642105	199.494	107.695	189.347	70.3579	105.337	19.9263
17	1979	5.33333	3.28431	0.235294	0.313725	0.0882353	0.215686	10.4513	2840.49	0.598039	184.635	104.108	186.559	69.5	103.849	20.1078
18	1972	6.20225	3.44944	0.337079	0.348315	0.0449438	0.0	9.04282	3253.78	0.696629	256.947	134.348	196.775	73.3146	111.563	16.3317
19	1973	6.37209	3.44186	0.395349	0.372093	0.0697674	0.0	9.0452	3337.42	0.709302	261.949	131.256	198.384	73.0233	111.215	16.2506
20	1974	6.0	3.30556	0.375	0.375	0.125	0.0	9.25473	3268.29	0.652778	239.179	122.556	196.194	71.9306	108.874	16.3329

A2. Data cleaning

For notation, let denote total market size $nb_hh_t = M_t$

dataset = @linq dataset |> 
groupby([:year]) |>
transform(Total_Q = sum(:q))

dataset.s_j = dataset.q./dataset.nb_hh;

dataset.s_0 = (dataset.nb_hh-dataset.Total_Q)./dataset.nb_hh;

Step 3. Construct dependent variable: $ \text{ln}(s_{jt}) - \text{ln}(s_{0t}) $

dataset.log_s_j_0 = log.(dataset.s_j) - log.(dataset.s_0);

A3. Run linear regression using eq(A)

Step 4. we use hp2wt, air, mpgd, size as product characteristics:

$\text{ln}(s_j) - \text{ln} (s_0) \ = \delta_j \equiv x_j' \beta - \alpha p_j + \xi_j$

result = reg(dataset, @formula(log_s_j_0 ~ p+hp2wt+air+mpgd+size ),  save = true,);
print(result)

                               Linear Model                               
===========================================================================
Number of obs:                  2217   Degrees of freedom:                6
R2:                            0.390   R2 Adjusted:                   0.388
F Statistic:                  282.47   p-value:                       0.000
===========================================================================
               Estimate  Std.Error   t value Pr(>|t|)  Lower 95%  Upper 95%
---------------------------------------------------------------------------
p            -0.0886398 0.00401348  -22.0855    0.000 -0.0965104 -0.0807692
hp2wt        -0.0731735   0.276701  -0.26445    0.791  -0.615794   0.469447
air          -0.0380115  0.0726455 -0.523246    0.601  -0.180472   0.104449
mpgd          0.0288388 0.00439518   6.56147    0.000  0.0202197  0.0374579
size            2.40052   0.126801   18.9315    0.000    2.15186    2.64919
(Intercept)    -10.2035    0.26002  -39.2412    0.000   -10.7134   -9.69356
===========================================================================

Step 5. Obtain Price elasticities:

Note that own price elasticities $(\eta_j$) is given by: $\begin{align} \eta_j & = \frac{\partial Pr(j)}{\partial price_j} \underbrace{\frac{price_j}{Pr(j)}}_{\frac{price_j}{s_j \times M}} \\ & \text{Note that} \ \frac{\partial Pr(j)}{\partial price_j} = \frac{\partial s_j}{\partial price_j} \times M \ \text{where} \ s_j = \frac{e^{\delta_j}}{\sum_k^J e^{\delta_j}} \\ & \text{Appealing to chain rule}: \frac{\partial s_j}{\partial price_j} \ M = \Bigg[ \alpha \frac{e^{\delta_j}}{\sum_k^J e^{\delta_j}} - \alpha \Big( \frac{e^{\delta_j}}{\sum_k^J e^{\delta_j}}\Big)^2 \Bigg] = M \alpha [s_j - s_j^2] = M \alpha s_j[1- s_j]\\ & \text{Rearranging these terms gives us:} \\ & \eta_j = \underbrace{\frac{\partial Pr(j)}{\partial price_j}}_{M \alpha s_j[1- s_j]} \underbrace{\frac{price_j}{Pr(j)}}_{\frac{price_j}{s_j \times M}} = M \alpha s_j[1- s_j] \times \frac{price_j}{s_j} \frac{1}{M} = \underbrace{\alpha \times (1-s_j) \times price_j}_\text{price elasticities for good j} \\ & = \alpha \times (1-s_j) \times price_j \end{align}$

# Following price elasticities, I can derive price elasticities for each good j, using price coefficients alpha.

price_coef = coef(result)[2];
dataset.e = price_coef * (ones(nrow(dataset))-dataset.s_j) .* dataset.p;

q1 = @from i in dataset begin;
            @where i.e >-1
            @select {elasticity=i.e}
            @collect DataFrame
    end;
nrow(q1)

Replication: BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.

I derive the number of inelastic car model. My estimates are 1,502. BLP’s estimates were 1,494, which is pretty close.

A4. Run 2sls using instruments

Following BLP, I use the following instruments for price.

1. the sum of size at market $t$. (Note that you need to drop product $j$’s own size.)

2. the sum of size across rival firm products at market $t$.

# IV 1

dataset = @linq dataset |> 
groupby([:year]) |>
transform(Total_size = sum(:size));
dataset.iv_size1 = dataset.Total_size - dataset.size;

# IV 2

dataset = @linq dataset |> 
groupby([:year , :firmids]) |>
transform(sum_size = sum(:size));
dataset.iv_size2 = dataset.Total_size - dataset.sum_size;

# 2SLS Regression for demand estimation
# First stage: regress price on Z and X

first_stage_result = reg(dataset, @formula(p ~ iv_size1+ iv_size2+hp2wt + air+mpgd+size), save = true, );
print(first_stage_result)

                              Linear Model                              
=========================================================================
Number of obs:                 2217   Degrees of freedom:               7
R2:                           0.592   R2 Adjusted:                  0.591
F Statistic:                534.872   p-value:                      0.000
=========================================================================
              Estimate  Std.Error  t value Pr(>|t|)  Lower 95%  Upper 95%
-------------------------------------------------------------------------
iv_size1     -0.030544 0.00999712 -3.05528    0.002 -0.0501487 -0.0109393
iv_size2     0.0919841 0.00809711  11.3601    0.000  0.0761054   0.107863
hp2wt          25.8362    1.31783  19.6051    0.000    23.2518    28.4205
air            9.57023   0.327809  29.1945    0.000    8.92738    10.2131
mpgd         -0.272477  0.0270151 -10.0861    0.000  -0.325455  -0.219499
size           2.25533   0.727117  3.10174    0.002   0.829424    3.68123
(Intercept)   -5.21123    1.50736  -3.4572    0.001   -8.16721   -2.25525
=========================================================================

# Second Stage: regress log(s_j)-log(s_0) on xhat

xhat = predict(first_stage_result, dataset);
dataset.p_iv = xhat;
second_stage_result = reg(dataset, @formula(log_s_j_0 ~ p_iv+hp2wt + air+mpgd+size), save = true);
print(second_stage_result)

                                Linear Model                                
============================================================================
Number of obs:                   2217  Degrees of freedom:                 6
R2:                             0.355  R2 Adjusted:                    0.354
F Statistic:                  243.848  p-value:                        0.000
============================================================================
                Estimate  Std.Error  t value Pr(>|t|)  Lower 95%   Upper 95%
----------------------------------------------------------------------------
p_iv           -0.289427  0.0156063 -18.5455    0.000  -0.320032   -0.258823
hp2wt            5.63223   0.513603  10.9661    0.000    4.62503     6.63942
air              2.18099   0.182327  11.9619    0.000    1.82344     2.53854
mpgd         -0.00984143 0.00536771 -1.83345    0.067 -0.0203677 0.000684853
size             2.19676   0.131213  16.7419    0.000    1.93944     2.45407
(Intercept)      -9.5444   0.271767 -35.1198    0.000   -10.0773    -9.01145
============================================================================

price_coef = coef(second_stage_result)[2];
dataset.e_iv = price_coef * (ones(nrow(dataset))-dataset.s_j) .* dataset.p;

q1 = @from i in dataset begin;
            @where i.e_iv >-1
            @select {number_of_children=i.e_iv}
            @collect DataFrame
    end;

Comparision with BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.

Note that I have slightly different price coefficients, I observe number of inelastic demand good is 2. BLP estimates were 22.

nrow(q1)

Step 6. Discussion : IV regressions

Reported price coefficient ($\alpha$) is -0.0886 in OLS.

Now we have -0.2894 ($\alpha$) in IV regression. Prices are upward biased in OLS.

# OLS Results
print(result)

                               Linear Model                               
===========================================================================
Number of obs:                  2217   Degrees of freedom:                6
R2:                            0.390   R2 Adjusted:                   0.388
F Statistic:                  282.47   p-value:                       0.000
===========================================================================
               Estimate  Std.Error   t value Pr(>|t|)  Lower 95%  Upper 95%
---------------------------------------------------------------------------
p            -0.0886398 0.00401348  -22.0855    0.000 -0.0965104 -0.0807692
hp2wt        -0.0731735   0.276701  -0.26445    0.791  -0.615794   0.469447
air          -0.0380115  0.0726455 -0.523246    0.601  -0.180472   0.104449
mpgd          0.0288388 0.00439518   6.56147    0.000  0.0202197  0.0374579
size            2.40052   0.126801   18.9315    0.000    2.15186    2.64919
(Intercept)    -10.2035    0.26002  -39.2412    0.000   -10.7134   -9.69356
===========================================================================

# 2SLS Results
print(second_stage_result)

                                Linear Model                                
============================================================================
Number of obs:                   2217  Degrees of freedom:                 6
R2:                             0.355  R2 Adjusted:                    0.354
F Statistic:                  243.848  p-value:                        0.000
============================================================================
                Estimate  Std.Error  t value Pr(>|t|)  Lower 95%   Upper 95%
----------------------------------------------------------------------------
p_iv           -0.289427  0.0156063 -18.5455    0.000  -0.320032   -0.258823
hp2wt            5.63223   0.513603  10.9661    0.000    4.62503     6.63942
air              2.18099   0.182327  11.9619    0.000    1.82344     2.53854
mpgd         -0.00984143 0.00536771 -1.83345    0.067 -0.0203677 0.000684853
size             2.19676   0.131213  16.7419    0.000    1.93944     2.45407
(Intercept)      -9.5444   0.271767 -35.1198    0.000   -10.0773    -9.01145
============================================================================

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game

B1. Data Generating Process
B2. Obtain (numerically) equilibrium price and market shares
B3. Regress using OLS / IV

B1. Data Generating Process

Market is characterized by dupoly firms which procuce single good, with aggregate market shares, and price for each good. We assume that duopoly firms compete in 500 “independent” (isolated) markets.

In D.G.P., we solve Nash-Bertrand game so that we derive dupoly firm’s price, and market shares. We use cost shifters and product characteritics to numerically solve this game. Since it is D.G.P. we use true parameters to obtain price, and market shares.

As an econometrician, we observe dupoly firm’s market share, price, costs, and product characteristics.

The utility of each consumer $i$ in each market is given by:

\begin{equation} u_{ij} = \beta_0 + \beta_1 x_j + \sigma_d \xi_j - \alpha p_j + \epsilon_{ij} \end{equation} Marginal cost is constrained to be postitive and given by:

\begin{equation} c_j = e^{\gamma_0 + \gamma_x x_j + \sigma_c \xi_j + \gamma_w w_j + \sigma_\omega \omega_j} \end{equation}

The exogenous data $x_j, \xi_j, w_j, $ and $\omega_j$ are all created standard normal random variables.

True parameter is given by:

Parameter	True Value	Description
$ \beta_0$	5	Intercept (demand)
$ \beta_x$	2	Utility from good $x$
$ \sigma_d$	1 / 3 (second monte carlo)	Covariance
$\alpha $	1	Price coefficients
$ \gamma_0$	1	Intercept (Supply)
$\gamma_x $	5	Cost from good $x$
$ \sigma_c$	0.25	Covariance $(\xi_j)$
$ \gamma_w$	0.25	Parameters for input costs
$ \sigma_\omega$	0.25	Covariance $(\omega_j)$

B2. Obtain (numerically) equilibrium price and market shares (still D.G.P)

I solve nonlinear equation where under j=1,2. (Argument: $p_1, p_2, s_1(p_1, p_2), s_2(p_1,p_2)$) $X$ is a vector dupoly firm’s product charactersitics $ X = { x_1, x_2 }$ Note that $s_0 = 1-s_1-s_2$ $\begin{align} p_1 & = c_1 - \frac{1}{\alpha (1-s_1)} \\ p_2 & = c_2 - \frac{1}{\alpha (1-s_2)} \\ & \text{Note that $s_1$, and $s_2$ is given by} \\ s_1(X,p_1, p_2) & = \frac{ \text{exp}^{\beta_0 + \beta_1 x_1 + \sigma_d \xi_1 - \alpha p_1}}{1+\text{exp}^{\beta_0 + \beta_1 x_1 + \sigma_d \xi_1 - \alpha p_1} +\text{exp}^{\beta_0 + \beta_1 x_2 + \sigma_d \xi_2 - \alpha p_2} } \\ s_2(X,p_1, p_2) & = \frac{ \text{exp}^{\beta_0 + \beta_1 x_2 + \sigma_d \xi_2 - \alpha p_2}}{1+\text{exp}^{\beta_0 + \beta_1 x_1 + \sigma_d \xi_1 - \alpha p_1} + \text{exp}^{\beta_0 + \beta_1 x_2 + \sigma_d \xi_2 - \alpha p_2}} \end{align}$

Using Nonlinear solver, we can obtain equilibrium outcome: $p_1, p_2, s_1, s_2 (s_0= 1-s_1-s_2).$ One might concern about multiple equilibria for this game. Since we solve single product under duopoly (which is simple market), we observe unique solution for this Monte Carlo Study. Please see Nalebuff (1991) for multi-product firm problem, or uniquness of this game.

B3. Regress using OLS / IV

Following Berry’s inversion an econometrician run following OLS/IV regression. An econometrician observes price, product characteristics, and cost shifters for 500 independent dupoly markets.

\begin{align} \text{ln} (s_j) - \text{ln} (s_0) \ & = \delta_j
& = \beta_0 + \beta_1 x_j - \alpha p_j + \sigma_d \xi_j \end{align}

For IV, I use cost shifters, and competitors’ product characteristics, as in Berry (1994)

Note that in OLS, since an econometrician cannot observe $\xi_j$ term, price coefficients $\alpha$ is upward biased. In the IV regression, the observed cost factors, $w_j$, and the product characteristic of the rival firm are used as instruments for price.

For each simulation, we independently draw these for 500 markets, as in Berry(1994).

Repeat 100 times, and report Monte Racrlo results

# Define parameters, as in Berry's 1994 Monte Carlo

beta_0 = 5.0
beta_x = 2.0
sigma_d = 1.0
alpha = -1.0

gamma_0 = 1.0
gamma_x = 0.5

sigma_c = 0.25
gamma_w = 0.25
sigma_omega = 0.25

T = 500
S = 100
d = Normal()

Normal{Float64}(μ=0.0, σ=1.0)

# Define Non-linear solver

function f!(F, x)
    
    # D.G.P for true costs
    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])
    
    # Derive equillibrium price / quantity(markset shares)
    price_1 = cost_1 - 1/(alpha*(1-x[1]))
    price_2 = cost_2 - 1/(alpha*(1-x[2]))    
    w
    #x[1]: market share of good 1
    #x[2]: market share of good 2
    
    denom = 1 + exp(beta_0 + beta_x*data_temp1[:x] + sigma_d*data_temp1[:xi] + alpha*price_1)  + exp(beta_0 + beta_x*data_temp2[:x] + sigma_d*data_temp2[:xi] + alpha*price_2) 
    F[1] = x[1] - exp(beta_0 + beta_x*data_temp1[:x] + sigma_d*data_temp1[:xi] + alpha*price_1)/denom
    F[2] = x[2] - exp(beta_0 + beta_x*data_temp2[:x] + sigma_d*data_temp2[:xi] + alpha*price_2)/denom
end

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=1$

### It takes 5 seconds, at the current workflow
sigma_d = 1

for s = 1:S
    ### Step B1. D.G.P.
    
    # If you want to have same results, you need to assign random seed
    Random.seed!(s*100+T+1)
    x_1 = rand(d, T);
    Random.seed!(s*100+T+2)
    xi_1 = rand(d, T);
    Random.seed!(s*100+T+3)
    w_1 = rand(d, T);
    Random.seed!(s*100+T+4)
    omega_1 = rand(d, T);
    
    Random.seed!(s*100+T+5)
    x_2 = rand(d, T);
    Random.seed!(s*100+T+6)
    xi_2 = rand(d, T);
    Random.seed!(s*100+T+7)
    w_2 = rand(d, T);
    Random.seed!(s*100+T+8)
    omega_2 = rand(d, T);

    data_1 = DataFrame(x = x_1[1:T], xi = xi_1[1:T], w = w_1[1:T], omega = omega_1[1:T], iv=x_2[1:T]);
    data_2 = DataFrame(x = x_2[1:T], xi = xi_2[1:T], w = w_2[1:T], omega = omega_2[1:T], iv=x_1[1:T]);

    # For the first periods

    data_temp1 = data_1[1,:]
    data_temp2 = data_2[1,:]
    global data_temp1, data_temp2
    
    ### Step B2. Solve Equilibrium price and market shares using nonlinear-solver
    
    a= nlsolve(f!, [0.1; 0.1])

    vector_s1 = [a.zero[1]]
    vector_s2 = [a.zero[2]]
    vector_s0 = [1-a.zero[1]-a.zero[2]]

    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

    vector_p1 = [cost_1 - 1/(alpha*(1-a.zero[1]))]
    vector_p2 = [cost_2 - 1/(alpha*(1-a.zero[2]))]

    vector_delta_1 = [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]
    vector_delta_2 = [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]

    # From the second market to T markets.
    t=2
    for t = 2:T

        data_temp1 = data_1[t,:]
        data_temp2 = data_2[t,:]
        
        # Step 1. Solve Equilibrium price / market shares
        
        a= nlsolve(f!, [0.0; 0.0])

        append!(vector_s1, [a.zero[1]]);
        append!(vector_s2, [a.zero[2]]);
        append!(vector_s0, [1-a.zero[1]-a.zero[2]]);
        cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
        cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

        append!(vector_p1, [cost_1 - 1/(alpha*(1-a.zero[1]))]);
        append!(vector_p2, [cost_2 - 1/(alpha*(1-a.zero[2]))]);

        append!(vector_delta_1, [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]);
        append!(vector_delta_2, [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]);

    end

    data_1.price = vector_p1;
    data_2.price = vector_p2;

    data_1.s = vector_s1;
    data_2.s = vector_s2;

    data_1.delta = vector_delta_1;
    data_2.delta = vector_delta_2;

    data_1.log_sj_s0 = log.(vector_s1) - log.(vector_s0);
    data_2.log_sj_s0 = log.(vector_s2) - log.(vector_s0);
    
    # Merge into dataset
    data_merged =  append!(data_1, data_2);
    
    ### B3. Regress using OLS / IV
    
    ## OLS Regression
    
    ols_result = reg(data_merged, @formula(log_sj_s0 ~ x + price), save = true, Vcov.robust());
    ols_cons = coef(ols_result)[1];
    ols_x = coef(ols_result)[2];
    ols_p = coef(ols_result)[3];

    ## IV Regression
    first_stage_result = reg(data_merged, @formula(price ~  iv + w +x), save = true, Vcov.robust());

    xhat = predict(first_stage_result, data_merged);
    data_merged.xhat = xhat;

    iv_result = reg(data_merged, @formula(log_sj_s0 ~  x + xhat), save = true, Vcov.robust());
    iv_cons = coef(iv_result)[1];
    iv_x = coef(iv_result)[2];
    iv_p = coef(iv_result)[3];
    
    if s == 1
        vector_ols_cons = [ols_cons]
        vector_ols_x = [ols_x]
        vector_ols_p = [ols_p]
    
        vector_iv_cons = [iv_cons]
        vector_iv_x = [iv_x]
        vector_iv_p = [iv_p]
        
        # Store Monte Carlo Results
        global vector_ols_cons, vector_ols_x, vector_ols_p, vector_iv_cons, vector_iv_x, vector_iv_p
    else

        append!(vector_ols_cons, [ols_cons])
        append!(vector_ols_x, [ols_x])
        append!(vector_ols_p, [ols_p])
        
        append!(vector_iv_cons, [iv_cons])
        append!(vector_iv_x, [iv_x])
        append!(vector_iv_p, [iv_p])

    end
end

print("Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)")

result_summary =DataFrame( True_parameter = [beta_0, beta_x, alpha], OLS_mean = [mean(vector_ols_cons),mean(vector_ols_x),mean(vector_ols_p)], OLS_se = [std(vector_ols_cons),std(vector_ols_x),std(vector_ols_p)],
    IV_mean = [mean(vector_iv_cons),mean(vector_iv_x),mean(vector_iv_p)], IV_se =[std(vector_iv_cons),std(vector_iv_x),std(vector_iv_p)]);
print("Result Summary")
print(result_summary)

Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)Result Summary3×5 DataFrame
│ Row │ True_parameter │ OLS_mean │ OLS_se    │ IV_mean  │ IV_se     │
│     │ [90mFloat64[39m        │ [90mFloat64[39m  │ [90mFloat64[39m   │ [90mFloat64[39m  │ [90mFloat64[39m   │
├─────┼────────────────┼──────────┼───────────┼──────────┼───────────┤
│ 1   │ 5.0            │ 3.1872   │ 0.235697  │ 5.01814  │ 0.266852  │
│ 2   │ 2.0            │ 1.33611  │ 0.0742151 │ 2.01013  │ 0.0994837 │
│ 3   │ -1.0           │ -0.63979 │ 0.0482097 │ -1.00436 │ 0.0513297 │

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=3$

sigma_d = 3.0

for s = 1:S
    ### Step B1. D.G.P.
    
    # If you want to have same results, you need to assign random seed
    Random.seed!(s*100+T+1)
    x_1 = rand(d, T);
    Random.seed!(s*100+T+2)
    xi_1 = rand(d, T);
    Random.seed!(s*100+T+3)
    w_1 = rand(d, T);
    Random.seed!(s*100+T+4)
    omega_1 = rand(d, T);
    
    Random.seed!(s*100+T+5)
    x_2 = rand(d, T);
    Random.seed!(s*100+T+6)
    xi_2 = rand(d, T);
    Random.seed!(s*100+T+7)
    w_2 = rand(d, T);
    Random.seed!(s*100+T+8)
    omega_2 = rand(d, T);

    data_1 = DataFrame(x = x_1[1:T], xi = xi_1[1:T], w = w_1[1:T], omega = omega_1[1:T], iv=x_2[1:T]);
    data_2 = DataFrame(x = x_2[1:T], xi = xi_2[1:T], w = w_2[1:T], omega = omega_2[1:T], iv=x_1[1:T]);

    # For the first periods

    data_temp1 = data_1[1,:]
    data_temp2 = data_2[1,:]
    global data_temp1, data_temp2
    
    ### Step B2. Solve Equilibrium price and market shares using nonlinear-solver
    
    a= nlsolve(f!, [0.1; 0.1])

    vector_s1 = [a.zero[1]]
    vector_s2 = [a.zero[2]]
    vector_s0 = [1-a.zero[1]-a.zero[2]]

    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

    vector_p1 = [cost_1 - 1/(alpha*(1-a.zero[1]))]
    vector_p2 = [cost_2 - 1/(alpha*(1-a.zero[2]))]

    vector_delta_1 = [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]
    vector_delta_2 = [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]

    # From the second market to T markets.
    t=2
    for t = 2:T

        data_temp1 = data_1[t,:]
        data_temp2 = data_2[t,:]
        
        # Step 1. Solve Equilibrium price / market shares
        
        a= nlsolve(f!, [0.0; 0.0])

        append!(vector_s1, [a.zero[1]]);
        append!(vector_s2, [a.zero[2]]);
        append!(vector_s0, [1-a.zero[1]-a.zero[2]]);
        cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
        cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

        append!(vector_p1, [cost_1 - 1/(alpha*(1-a.zero[1]))]);
        append!(vector_p2, [cost_2 - 1/(alpha*(1-a.zero[2]))]);

        append!(vector_delta_1, [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]);
        append!(vector_delta_2, [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]);

    end

    data_1.price = vector_p1;
    data_2.price = vector_p2;

    data_1.s = vector_s1;
    data_2.s = vector_s2;

    data_1.delta = vector_delta_1;
    data_2.delta = vector_delta_2;

    data_1.log_sj_s0 = log.(vector_s1) - log.(vector_s0);
    data_2.log_sj_s0 = log.(vector_s2) - log.(vector_s0);
    
    # Merge into dataset
    data_merged =  append!(data_1, data_2);
    
    ### B3. Regress using OLS / IV
    
    ## OLS Regression
    
    ols_result = reg(data_merged, @formula(log_sj_s0 ~ x + price), save = true, Vcov.robust());
    ols_cons = coef(ols_result)[1];
    ols_x = coef(ols_result)[2];
    ols_p = coef(ols_result)[3];

    ## IV Regression
    first_stage_result = reg(data_merged, @formula(price ~  iv + w +x), save = true, Vcov.robust());

    xhat = predict(first_stage_result, data_merged);
    data_merged.xhat = xhat;

    iv_result = reg(data_merged, @formula(log_sj_s0 ~  x + xhat), save = true, Vcov.robust());
    iv_cons = coef(iv_result)[1];
    iv_x = coef(iv_result)[2];
    iv_p = coef(iv_result)[3];
    
    if s == 1
        vector_ols_cons = [ols_cons]
        vector_ols_x = [ols_x]
        vector_ols_p = [ols_p]
    
        vector_iv_cons = [iv_cons]
        vector_iv_x = [iv_x]
        vector_iv_p = [iv_p]
        
        # Store Monte Carlo Results
        global vector_ols_cons, vector_ols_x, vector_ols_p, vector_iv_cons, vector_iv_x, vector_iv_p
    else

        append!(vector_ols_cons, [ols_cons])
        append!(vector_ols_x, [ols_x])
        append!(vector_ols_p, [ols_p])
        
        append!(vector_iv_cons, [iv_cons])
        append!(vector_iv_x, [iv_x])
        append!(vector_iv_p, [iv_p])

    end
end

print("Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)")

result_summary =DataFrame( True_parameter = [beta_0, beta_x, alpha], OLS_mean = [mean(vector_ols_cons),mean(vector_ols_x),mean(vector_ols_p)], OLS_se = [std(vector_ols_cons),std(vector_ols_x),std(vector_ols_p)],
    IV_mean = [mean(vector_iv_cons),mean(vector_iv_x),mean(vector_iv_p)], IV_se =[std(vector_iv_cons),std(vector_iv_x),std(vector_iv_p)]);
print("Result Summary")
print(result_summary)

Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 3)3×5 DataFrame
│ Row │ True_parameter │ OLS_mean  │ OLS_se    │ IV_mean  │ IV_se    │
│     │ [90mFloat64[39m        │ [90mFloat64[39m   │ [90mFloat64[39m   │ [90mFloat64[39m  │ [90mFloat64[39m  │
├─────┼────────────────┼───────────┼───────────┼──────────┼──────────┤
│ 1   │ 5.0            │ -0.762803 │ 0.418897  │ 5.03208  │ 0.847055 │
│ 2   │ 2.0            │ 0.0195958 │ 0.115257  │ 1.99649  │ 0.301829 │
│ 3   │ -1.0           │ 0.105563  │ 0.0831606 │ -1.00709 │ 0.166926 │

Reference

Berry, Steven T. “Estimating discrete-choice models of product differentiation.” The RAND Journal of Economics (1994): 242-262.

Berry, Steven, James Levinsohn, and Ariel Pakes. “Automobile prices in market equilibrium.” Econometrica: Journal of the Econometric Society (1995): 841-890.

Share on

Twitter Facebook LinkedIn

Fixed Coefficients Random Utility (Demand) Estimation

Introduction

For the rest of this notebook, we will introduce two empirical examples:

A. Estimate logit-demand using, BLP(1995)’s aggregate market level data.

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game

A1.Introduction of car data from BLP(1995)

As an empirical study, we will replicate Table 3, as in BLP (1995).

Variable name/short description

For detailed description, please see BLP(1995) section 7.1. (Data section)

A2. Data cleaning

Step 1. Obtain market share for each good $j$: $s_{jt}$ = $\frac{q_{jt}}{nb_hh_{t}}$

Step 2. Obtain market share for outside good 0: $s_{0t}$ = $\frac{ \Big(nb_hh_t - \sum_{k=1}^J(q_{kt}) \Big)}{nb_hh_t}$

Step 3. Construct dependent variable: $ \text{ln}(s_{jt}) - \text{ln}(s_{0t}) $

A3. Run linear regression using eq(A)

Step 4. we use hp2wt, air, mpgd, size as product characteristics:

Step 5. Obtain Price elasticities:

Replication: BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.

I derive the number of inelastic car model. My estimates are 1,502. BLP’s estimates were 1,494, which is pretty close.

A4. Run 2sls using instruments

1. the sum of size at market $t$. (Note that you need to drop product $j$’s own size.)

2. the sum of size across rival firm products at market $t$.

Comparision with BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.

Note that I have slightly different price coefficients, I observe number of inelastic demand good is 2. BLP estimates were 22.

Step 6. Discussion : IV regressions

Reported price coefficient ($\alpha$) is -0.0886 in OLS.

Now we have -0.2894 ($\alpha$) in IV regression. Prices are upward biased in OLS.

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game

B1. Data Generating Process

B2. Obtain (numerically) equilibrium price and market shares (still D.G.P)

B3. Regress using OLS / IV

For each simulation, we independently draw these for 500 markets, as in Berry(1994).

Repeat 100 times, and report Monte Racrlo results

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=1$

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=3$

Reference

Share on