# Query / DataFramesMeta is used for cleaning dataset
# FixedEffectModels is used for running regression
# Distributions, Random, NLsolve are used for Monte Carlo study
using CSV, DataFrames, Query, DataFramesMeta, FixedEffectModels, Distributions, Random, NLsolve


ENV["COLUMNS"],ENV["LINES"] = 350,50  #This is not specific to Julia, it's a Jupyter notebook environment variable

#dataset = CSV.read("c:\\data\\BLP.csv");  # <---- change this
dataset = CSV.read("/Users/jinkim/Dropbox/2020 Summer/dropbox_RA_work/Berry/BLP.csv")
first(dataset,10)

┌ Warning: `CSV.read(input; kw...)` is deprecated in favor of `using DataFrames; CSV.read(input, DataFrame; kw...)
│   caller = read(::String) at CSV.jl:40
└ @ CSV /Users/jinkim/.julia/packages/CSV/MKemC/src/CSV.jl:40


#summary statistics
describe(dataset,:all)


### Replicate Table 1: Summary Stats
using Statistics

#Add or substrct column names as needed:
cnames = [:year,:cy,:dr,:at,:ps,:air,:drv,:p,:wt,:dom,:disp,:hp,:lng,:wdt,:wb,:mpg]
aggregate(dataset[!,cnames], [:year], mean)

┌ Warning: `aggregate(d, cols, f, sort=false, skipmissing=false)` is deprecated. Instead use combine(groupby(d, cols, sort=false, skipmissing=false), names(d, Not(cols)) .=> f)`
│   caller = top-level scope at In[4]:6
└ @ Core In[4]:6


dataset = @linq dataset |> 
groupby([:year]) |>
transform(Total_Q = sum(:q))

dataset.s_j = dataset.q./dataset.nb_hh;


dataset.s_0 = (dataset.nb_hh-dataset.Total_Q)./dataset.nb_hh;


dataset.log_s_j_0 = log.(dataset.s_j) - log.(dataset.s_0);


result = reg(dataset, @formula(log_s_j_0 ~ p+hp2wt+air+mpgd+size ),  save = true,);
print(result)

                               Linear Model                               
===========================================================================
Number of obs:                  2217   Degrees of freedom:                6
R2:                            0.390   R2 Adjusted:                   0.388
F Statistic:                  282.47   p-value:                       0.000
===========================================================================
               Estimate  Std.Error   t value Pr(>|t|)  Lower 95%  Upper 95%
---------------------------------------------------------------------------
p            -0.0886398 0.00401348  -22.0855    0.000 -0.0965104 -0.0807692
hp2wt        -0.0731735   0.276701  -0.26445    0.791  -0.615794   0.469447
air          -0.0380115  0.0726455 -0.523246    0.601  -0.180472   0.104449
mpgd          0.0288388 0.00439518   6.56147    0.000  0.0202197  0.0374579
size            2.40052   0.126801   18.9315    0.000    2.15186    2.64919
(Intercept)    -10.2035    0.26002  -39.2412    0.000   -10.7134   -9.69356
===========================================================================


# Following price elasticities, I can derive price elasticities for each good j, using price coefficients alpha.

price_coef = coef(result)[2];
dataset.e = price_coef * (ones(nrow(dataset))-dataset.s_j) .* dataset.p;

q1 = @from i in dataset begin;
            @where i.e >-1
            @select {elasticity=i.e}
            @collect DataFrame
    end;
nrow(q1)

1502


# IV 1

dataset = @linq dataset |> 
groupby([:year]) |>
transform(Total_size = sum(:size));
dataset.iv_size1 = dataset.Total_size - dataset.size;


# IV 2

dataset = @linq dataset |> 
groupby([:year , :firmids]) |>
transform(sum_size = sum(:size));
dataset.iv_size2 = dataset.Total_size - dataset.sum_size;


# 2SLS Regression for demand estimation
# First stage: regress price on Z and X

first_stage_result = reg(dataset, @formula(p ~ iv_size1+ iv_size2+hp2wt + air+mpgd+size), save = true, );
print(first_stage_result)

                              Linear Model                              
=========================================================================
Number of obs:                 2217   Degrees of freedom:               7
R2:                           0.592   R2 Adjusted:                  0.591
F Statistic:                534.872   p-value:                      0.000
=========================================================================
              Estimate  Std.Error  t value Pr(>|t|)  Lower 95%  Upper 95%
-------------------------------------------------------------------------
iv_size1     -0.030544 0.00999712 -3.05528    0.002 -0.0501487 -0.0109393
iv_size2     0.0919841 0.00809711  11.3601    0.000  0.0761054   0.107863
hp2wt          25.8362    1.31783  19.6051    0.000    23.2518    28.4205
air            9.57023   0.327809  29.1945    0.000    8.92738    10.2131
mpgd         -0.272477  0.0270151 -10.0861    0.000  -0.325455  -0.219499
size           2.25533   0.727117  3.10174    0.002   0.829424    3.68123
(Intercept)   -5.21123    1.50736  -3.4572    0.001   -8.16721   -2.25525
=========================================================================


# Second Stage: regress log(s_j)-log(s_0) on xhat

xhat = predict(first_stage_result, dataset);
dataset.p_iv = xhat;
second_stage_result = reg(dataset, @formula(log_s_j_0 ~ p_iv+hp2wt + air+mpgd+size), save = true);
print(second_stage_result)

                                Linear Model                                
============================================================================
Number of obs:                   2217  Degrees of freedom:                 6
R2:                             0.355  R2 Adjusted:                    0.354
F Statistic:                  243.848  p-value:                        0.000
============================================================================
                Estimate  Std.Error  t value Pr(>|t|)  Lower 95%   Upper 95%
----------------------------------------------------------------------------
p_iv           -0.289427  0.0156063 -18.5455    0.000  -0.320032   -0.258823
hp2wt            5.63223   0.513603  10.9661    0.000    4.62503     6.63942
air              2.18099   0.182327  11.9619    0.000    1.82344     2.53854
mpgd         -0.00984143 0.00536771 -1.83345    0.067 -0.0203677 0.000684853
size             2.19676   0.131213  16.7419    0.000    1.93944     2.45407
(Intercept)      -9.5444   0.271767 -35.1198    0.000   -10.0773    -9.01145
============================================================================


price_coef = coef(second_stage_result)[2];
dataset.e_iv = price_coef * (ones(nrow(dataset))-dataset.s_j) .* dataset.p;


q1 = @from i in dataset begin;
            @where i.e_iv >-1
            @select {number_of_children=i.e_iv}
            @collect DataFrame
    end;


nrow(q1)

2


# OLS Results
print(result)

                               Linear Model                               
===========================================================================
Number of obs:                  2217   Degrees of freedom:                6
R2:                            0.390   R2 Adjusted:                   0.388
F Statistic:                  282.47   p-value:                       0.000
===========================================================================
               Estimate  Std.Error   t value Pr(>|t|)  Lower 95%  Upper 95%
---------------------------------------------------------------------------
p            -0.0886398 0.00401348  -22.0855    0.000 -0.0965104 -0.0807692
hp2wt        -0.0731735   0.276701  -0.26445    0.791  -0.615794   0.469447
air          -0.0380115  0.0726455 -0.523246    0.601  -0.180472   0.104449
mpgd          0.0288388 0.00439518   6.56147    0.000  0.0202197  0.0374579
size            2.40052   0.126801   18.9315    0.000    2.15186    2.64919
(Intercept)    -10.2035    0.26002  -39.2412    0.000   -10.7134   -9.69356
===========================================================================


# 2SLS Results
print(second_stage_result)

                                Linear Model                                
============================================================================
Number of obs:                   2217  Degrees of freedom:                 6
R2:                             0.355  R2 Adjusted:                    0.354
F Statistic:                  243.848  p-value:                        0.000
============================================================================
                Estimate  Std.Error  t value Pr(>|t|)  Lower 95%   Upper 95%
----------------------------------------------------------------------------
p_iv           -0.289427  0.0156063 -18.5455    0.000  -0.320032   -0.258823
hp2wt            5.63223   0.513603  10.9661    0.000    4.62503     6.63942
air              2.18099   0.182327  11.9619    0.000    1.82344     2.53854
mpgd         -0.00984143 0.00536771 -1.83345    0.067 -0.0203677 0.000684853
size             2.19676   0.131213  16.7419    0.000    1.93944     2.45407
(Intercept)      -9.5444   0.271767 -35.1198    0.000   -10.0773    -9.01145
============================================================================


# Define parameters, as in Berry's 1994 Monte Carlo

beta_0 = 5.0
beta_x = 2.0
sigma_d = 1.0
alpha = -1.0

gamma_0 = 1.0
gamma_x = 0.5

sigma_c = 0.25
gamma_w = 0.25
sigma_omega = 0.25

T = 500
S = 100
d = Normal()

Normal{Float64}(μ=0.0, σ=1.0)


# Define Non-linear solver

function f!(F, x)
    
    # D.G.P for true costs
    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])
    
    # Derive equillibrium price / quantity(markset shares)
    price_1 = cost_1 - 1/(alpha*(1-x[1]))
    price_2 = cost_2 - 1/(alpha*(1-x[2]))    
    w
    #x[1]: market share of good 1
    #x[2]: market share of good 2
    
    denom = 1 + exp(beta_0 + beta_x*data_temp1[:x] + sigma_d*data_temp1[:xi] + alpha*price_1)  + exp(beta_0 + beta_x*data_temp2[:x] + sigma_d*data_temp2[:xi] + alpha*price_2) 
    F[1] = x[1] - exp(beta_0 + beta_x*data_temp1[:x] + sigma_d*data_temp1[:xi] + alpha*price_1)/denom
    F[2] = x[2] - exp(beta_0 + beta_x*data_temp2[:x] + sigma_d*data_temp2[:xi] + alpha*price_2)/denom
end


### It takes 5 seconds, at the current workflow
sigma_d = 1

for s = 1:S
    ### Step B1. D.G.P.
    
    # If you want to have same results, you need to assign random seed
    Random.seed!(s*100+T+1)
    x_1 = rand(d, T);
    Random.seed!(s*100+T+2)
    xi_1 = rand(d, T);
    Random.seed!(s*100+T+3)
    w_1 = rand(d, T);
    Random.seed!(s*100+T+4)
    omega_1 = rand(d, T);
    
    Random.seed!(s*100+T+5)
    x_2 = rand(d, T);
    Random.seed!(s*100+T+6)
    xi_2 = rand(d, T);
    Random.seed!(s*100+T+7)
    w_2 = rand(d, T);
    Random.seed!(s*100+T+8)
    omega_2 = rand(d, T);

    data_1 = DataFrame(x = x_1[1:T], xi = xi_1[1:T], w = w_1[1:T], omega = omega_1[1:T], iv=x_2[1:T]);
    data_2 = DataFrame(x = x_2[1:T], xi = xi_2[1:T], w = w_2[1:T], omega = omega_2[1:T], iv=x_1[1:T]);

    # For the first periods

    data_temp1 = data_1[1,:]
    data_temp2 = data_2[1,:]
    global data_temp1, data_temp2
    
    ### Step B2. Solve Equilibrium price and market shares using nonlinear-solver
    
    a= nlsolve(f!, [0.1; 0.1])

    vector_s1 = [a.zero[1]]
    vector_s2 = [a.zero[2]]
    vector_s0 = [1-a.zero[1]-a.zero[2]]

    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

    vector_p1 = [cost_1 - 1/(alpha*(1-a.zero[1]))]
    vector_p2 = [cost_2 - 1/(alpha*(1-a.zero[2]))]

    vector_delta_1 = [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]
    vector_delta_2 = [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]

    # From the second market to T markets.
    t=2
    for t = 2:T

        data_temp1 = data_1[t,:]
        data_temp2 = data_2[t,:]
        
        # Step 1. Solve Equilibrium price / market shares
        
        a= nlsolve(f!, [0.0; 0.0])

        append!(vector_s1, [a.zero[1]]);
        append!(vector_s2, [a.zero[2]]);
        append!(vector_s0, [1-a.zero[1]-a.zero[2]]);
        cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
        cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

        append!(vector_p1, [cost_1 - 1/(alpha*(1-a.zero[1]))]);
        append!(vector_p2, [cost_2 - 1/(alpha*(1-a.zero[2]))]);

        append!(vector_delta_1, [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]);
        append!(vector_delta_2, [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]);

    end

    data_1.price = vector_p1;
    data_2.price = vector_p2;

    data_1.s = vector_s1;
    data_2.s = vector_s2;

    data_1.delta = vector_delta_1;
    data_2.delta = vector_delta_2;

    data_1.log_sj_s0 = log.(vector_s1) - log.(vector_s0);
    data_2.log_sj_s0 = log.(vector_s2) - log.(vector_s0);
    
    # Merge into dataset
    data_merged =  append!(data_1, data_2);
    
    ### B3. Regress using OLS / IV
    
    ## OLS Regression
    
    ols_result = reg(data_merged, @formula(log_sj_s0 ~ x + price), save = true, Vcov.robust());
    ols_cons = coef(ols_result)[1];
    ols_x = coef(ols_result)[2];
    ols_p = coef(ols_result)[3];

    ## IV Regression
    first_stage_result = reg(data_merged, @formula(price ~  iv + w +x), save = true, Vcov.robust());

    xhat = predict(first_stage_result, data_merged);
    data_merged.xhat = xhat;

    iv_result = reg(data_merged, @formula(log_sj_s0 ~  x + xhat), save = true, Vcov.robust());
    iv_cons = coef(iv_result)[1];
    iv_x = coef(iv_result)[2];
    iv_p = coef(iv_result)[3];
    
    if s == 1
        vector_ols_cons = [ols_cons]
        vector_ols_x = [ols_x]
        vector_ols_p = [ols_p]
    
        vector_iv_cons = [iv_cons]
        vector_iv_x = [iv_x]
        vector_iv_p = [iv_p]
        
        # Store Monte Carlo Results
        global vector_ols_cons, vector_ols_x, vector_ols_p, vector_iv_cons, vector_iv_x, vector_iv_p
    else

        append!(vector_ols_cons, [ols_cons])
        append!(vector_ols_x, [ols_x])
        append!(vector_ols_p, [ols_p])
        
        append!(vector_iv_cons, [iv_cons])
        append!(vector_iv_x, [iv_x])
        append!(vector_iv_p, [iv_p])

    end
end

print("Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)")

result_summary =DataFrame( True_parameter = [beta_0, beta_x, alpha], OLS_mean = [mean(vector_ols_cons),mean(vector_ols_x),mean(vector_ols_p)], OLS_se = [std(vector_ols_cons),std(vector_ols_x),std(vector_ols_p)],
    IV_mean = [mean(vector_iv_cons),mean(vector_iv_x),mean(vector_iv_p)], IV_se =[std(vector_iv_cons),std(vector_iv_x),std(vector_iv_p)]);
print("Result Summary")
print(result_summary)

Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)Result Summary3×5 DataFrame
│ Row │ True_parameter │ OLS_mean │ OLS_se    │ IV_mean  │ IV_se     │
│     │ Float64        │ Float64  │ Float64   │ Float64  │ Float64   │
├─────┼────────────────┼──────────┼───────────┼──────────┼───────────┤
│ 1   │ 5.0            │ 3.1872   │ 0.235697  │ 5.01814  │ 0.266852  │
│ 2   │ 2.0            │ 1.33611  │ 0.0742151 │ 2.01013  │ 0.0994837 │
│ 3   │ -1.0           │ -0.63979 │ 0.0482097 │ -1.00436 │ 0.0513297 │


sigma_d = 3.0

for s = 1:S
    ### Step B1. D.G.P.
    
    # If you want to have same results, you need to assign random seed
    Random.seed!(s*100+T+1)
    x_1 = rand(d, T);
    Random.seed!(s*100+T+2)
    xi_1 = rand(d, T);
    Random.seed!(s*100+T+3)
    w_1 = rand(d, T);
    Random.seed!(s*100+T+4)
    omega_1 = rand(d, T);
    
    Random.seed!(s*100+T+5)
    x_2 = rand(d, T);
    Random.seed!(s*100+T+6)
    xi_2 = rand(d, T);
    Random.seed!(s*100+T+7)
    w_2 = rand(d, T);
    Random.seed!(s*100+T+8)
    omega_2 = rand(d, T);

    data_1 = DataFrame(x = x_1[1:T], xi = xi_1[1:T], w = w_1[1:T], omega = omega_1[1:T], iv=x_2[1:T]);
    data_2 = DataFrame(x = x_2[1:T], xi = xi_2[1:T], w = w_2[1:T], omega = omega_2[1:T], iv=x_1[1:T]);

    # For the first periods

    data_temp1 = data_1[1,:]
    data_temp2 = data_2[1,:]
    global data_temp1, data_temp2
    
    ### Step B2. Solve Equilibrium price and market shares using nonlinear-solver
    
    a= nlsolve(f!, [0.1; 0.1])

    vector_s1 = [a.zero[1]]
    vector_s2 = [a.zero[2]]
    vector_s0 = [1-a.zero[1]-a.zero[2]]

    cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
    cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

    vector_p1 = [cost_1 - 1/(alpha*(1-a.zero[1]))]
    vector_p2 = [cost_2 - 1/(alpha*(1-a.zero[2]))]

    vector_delta_1 = [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]
    vector_delta_2 = [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]

    # From the second market to T markets.
    t=2
    for t = 2:T

        data_temp1 = data_1[t,:]
        data_temp2 = data_2[t,:]
        
        # Step 1. Solve Equilibrium price / market shares
        
        a= nlsolve(f!, [0.0; 0.0])

        append!(vector_s1, [a.zero[1]]);
        append!(vector_s2, [a.zero[2]]);
        append!(vector_s0, [1-a.zero[1]-a.zero[2]]);
        cost_1 = exp(gamma_0 + gamma_x * data_temp1[:x] + sigma_c * data_temp1[:xi] + gamma_w * data_temp1[:w] + sigma_omega * data_temp1[:omega])
        cost_2 = exp(gamma_0 + gamma_x * data_temp2[:x] + sigma_c * data_temp2[:xi] + gamma_w * data_temp2[:w] + sigma_omega * data_temp2[:omega])

        append!(vector_p1, [cost_1 - 1/(alpha*(1-a.zero[1]))]);
        append!(vector_p2, [cost_2 - 1/(alpha*(1-a.zero[2]))]);

        append!(vector_delta_1, [beta_0 + beta_x * data_temp1[:x] + alpha*(cost_1 - 1/(alpha*(1-a.zero[1])) )]);
        append!(vector_delta_2, [beta_0 + beta_x * data_temp2[:x] + alpha*(cost_2 - 1/(alpha*(1-a.zero[2])) )]);

    end

    data_1.price = vector_p1;
    data_2.price = vector_p2;

    data_1.s = vector_s1;
    data_2.s = vector_s2;

    data_1.delta = vector_delta_1;
    data_2.delta = vector_delta_2;

    data_1.log_sj_s0 = log.(vector_s1) - log.(vector_s0);
    data_2.log_sj_s0 = log.(vector_s2) - log.(vector_s0);
    
    # Merge into dataset
    data_merged =  append!(data_1, data_2);
    
    ### B3. Regress using OLS / IV
    
    ## OLS Regression
    
    ols_result = reg(data_merged, @formula(log_sj_s0 ~ x + price), save = true, Vcov.robust());
    ols_cons = coef(ols_result)[1];
    ols_x = coef(ols_result)[2];
    ols_p = coef(ols_result)[3];

    ## IV Regression
    first_stage_result = reg(data_merged, @formula(price ~  iv + w +x), save = true, Vcov.robust());

    xhat = predict(first_stage_result, data_merged);
    data_merged.xhat = xhat;

    iv_result = reg(data_merged, @formula(log_sj_s0 ~  x + xhat), save = true, Vcov.robust());
    iv_cons = coef(iv_result)[1];
    iv_x = coef(iv_result)[2];
    iv_p = coef(iv_result)[3];
    
    if s == 1
        vector_ols_cons = [ols_cons]
        vector_ols_x = [ols_x]
        vector_ols_p = [ols_p]
    
        vector_iv_cons = [iv_cons]
        vector_iv_x = [iv_x]
        vector_iv_p = [iv_p]
        
        # Store Monte Carlo Results
        global vector_ols_cons, vector_ols_x, vector_ols_p, vector_iv_cons, vector_iv_x, vector_iv_p
    else

        append!(vector_ols_cons, [ols_cons])
        append!(vector_ols_x, [ols_x])
        append!(vector_ols_p, [ols_p])
        
        append!(vector_iv_cons, [iv_cons])
        append!(vector_iv_x, [iv_x])
        append!(vector_iv_p, [iv_p])

    end
end

print("Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 1)")

result_summary =DataFrame( True_parameter = [beta_0, beta_x, alpha], OLS_mean = [mean(vector_ols_cons),mean(vector_ols_x),mean(vector_ols_p)], OLS_se = [std(vector_ols_cons),std(vector_ols_x),std(vector_ols_p)],
    IV_mean = [mean(vector_iv_cons),mean(vector_iv_x),mean(vector_iv_p)], IV_se =[std(vector_iv_cons),std(vector_iv_x),std(vector_iv_p)]);
print("Result Summary")
print(result_summary)

Monte Carlo Parameter Estimates 100 Random Samples of 500 Duopoly Markets  Logit Utility (sigma_d  = 3)3×5 DataFrame
│ Row │ True_parameter │ OLS_mean  │ OLS_se    │ IV_mean  │ IV_se    │
│     │ Float64        │ Float64   │ Float64   │ Float64  │ Float64  │
├─────┼────────────────┼───────────┼───────────┼──────────┼──────────┤
│ 1   │ 5.0            │ -0.762803 │ 0.418897  │ 5.03208  │ 0.847055 │
│ 2   │ 2.0            │ 0.0195958 │ 0.115257  │ 1.99649  │ 0.301829 │
│ 3   │ -1.0           │ 0.105563  │ 0.0831606 │ -1.00709 │ 0.166926 │

	name	id	year	cy	dr	at	ps	air	drv	p	wt	dom	disp	hp	lng	wdt	wb	mpg	q	firmids	euro	reli	dfi	hp2wt	size	japan	cpi	gasprice	nb_hh	cat	door2	door3	door4	door5	sampleweight	mpgd	dpm	model
	String	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Float64	Int64	Int64	Float64	Int64	Float64	Float64	Float64	Float64	Float64	Int64	Int64	Int64	Int64	Float64	Float64	Int64	Float64	Float64	Int64	String	Int64	Int64	Int64	Int64	Int64	Float64	Float64	String
1	ACINTE	3735	1986	4	3	0	1	0	1	8.48358	2249	0	97.0	113	168.5	65.6	96.5	27.0	27.807	3	0	5	0	0.502446	1.10536	1	109.6	0.826794	88458	compact	0	1	0	0	27807	32.6562	0.030622	ACINTE1986
2	ACINTE	4030	1987	4	3	0	1	0	1	8.6787	2326	0	97.0	113	168.5	65.6	96.5	26.0	54.757	3	0	5	0	0.485813	1.10536	1	113.6	0.818662	89479	compact	0	1	0	0	54757	31.7591	0.031487	ACINTE1987
3	ACINTE	4327	1988	4	5	0	1	0	1	9.55199	2390	0	97.0	118	171.5	65.6	99.2	17.0	57.468	3	0	5	0	0.493724	1.12504	1	118.3	0.785362	91066	compact	0	0	0	1	57468	21.6461	0.0461978	ACINTE1988
4	ACINTE	4421	1989	4	2	0	1	0	1	10.5403	2313	0	97.0	118	168.7	65.6	96.5	26.0	77.4	3	0	5	0	0.51016	1.10667	1	124.0	0.810081	92830	compact	1	0	0	0	77400	32.0956	0.0311569	ACINTE1989
5	ACINTE	5421	1990	4	2	0	1	0	1	9.14308	2549	0	1.8	130	172.9	67.4	100.4	21.0	83.599	3	0	5	0	0.510004	1.16535	1	130.7	0.875542	93347	compact	1	0	0	0	83599	23.9851	0.0416925	ACINTE1990
6	ACLEGE	3736	1986	6	4	0	1	1	1	17.6077	2970	0	152.0	151	189.4	68.3	108.6	20.0	25.062	3	0	5	0	0.508417	1.2936	1	109.6	0.826794	88458	midsize	0	0	1	0	25062	24.1898	0.0413397	ACLEGE1986
7	ACLEGE	4031	1987	6	4	0	1	1	1	17.8327	3078	0	152.0	151	189.4	68.3	108.7	20.0	54.713	3	0	5	0	0.490578	1.2936	1	113.6	0.818662	89479	midsize	0	0	1	0	54713	24.4301	0.0409331	ACLEGE1987
8	ACLEGE	4328	1988	6	4	0	1	1	1	17.7599	3067	0	163.2	161	189.4	68.3	108.7	19.0	70.77	3	0	5	0	0.524943	1.2936	1	118.3	0.785362	91066	midsize	0	0	1	0	70770	24.1927	0.0413348	ACLEGE1988
9	ACLEGE	4422	1989	6	4	0	1	1	1	18.2258	3170	0	163.0	160	190.6	68.9	108.7	19.0	64.6	3	0	5	0	0.504732	1.31323	1	124.0	0.810081	92830	midsize	0	0	1	0	64600	23.4545	0.0426358	ACLEGE1989
10	ACLEGE	5422	1990	6	2	0	1	1	1	18.9441	3139	0	2.7	160	188.0	68.7	106.5	19.0	53.666	3	0	5	0	0.509716	1.29156	1	130.7	0.875542	93347	midsize	1	0	0	0	53666	21.7008	0.0460811	ACLEGE1990

Variable name	Description
name	Car
id	Car ID
ye	Year
cy	Cylinder
dr	Number of Doors
at	Automatic Transmission
ps	Power Steering
air	Air Conditioning
drv	Front Wheel Drive
p	Price (in \$ 1000's)
wt	Weight
dom	Domestic
disp	Engine Displacement
hp	Horse Power
lng	Length
wdt	Width
wb	Wheelbase
mpg	Miles per Gallon
q	Quantities
firmids	Firm ID
euro	Indicator for EURO car
reli	Rating
dfi	Indicator for Digital Fuel Injection
hp2wt	HP to Weight (ratio)
size	Length X Width (/1000)
japan	Japan
cpi	CPI
gasprice	Gas Price per gallon
nb_hh	Size of Household (Potential Market Size)
cat	Size Cateogry
cat	(Using for nested logit)
door2	I(door=2)
door3	I(door=3)
door4	I(door=4)
door5	I(door=5)
sampleweight	Weights
mpgd	Miles per gallon (imputed from gas prices)
dpm	Dollars per miles (imputed from gas prices)
modelid	Car name

	variable	mean	std	min	q25	median	q75	max	nunique	nmissing	first	last	eltype
	Symbol	Union…	Union…	Any	Union…	Union…	Union…	Any	Union…	Nothing	Any	Any	DataType
1	name			ACINTE				YUYUGO	542		ACINTE	YUYUGO	String
2	id	2560.89	1517.55	129	1309.0	2325.0	3927.0	5592			3735	4506	Int64
3	year	1981.54	5.74082	1971	1977.0	1982.0	1987.0	1990			1986	1989	Int64
4	cy	5.3207	1.55712	0	4.0	4.0	6.0	12			4	4	Int64
5	dr	3.29409	0.965154	2	2.0	4.0	4.0	5			3	2	Int64
6	at	0.326116	0.468896	0	0.0	0.0	1.0	1			0	0	Int64
7	ps	0.533153	0.499012	0	0.0	1.0	1.0	1			1	0	Int64
8	air	0.241768	0.428251	0	0.0	0.0	0.0	1			0	0	Int64
9	drv	0.354984	0.478616	0	0.0	0.0	1.0	1			1	1	Int64
10	p	11.7614	8.64378	3.39327	6.71375	8.72865	13.0741	68.5968			8.48358	3.50726	Float64
11	wt	2930.47	722.366	1445	2375.0	2861.0	3383.0	5362			2249	1832	Int64
12	dom	0.589986	0.491947	0	0.0	1.0	1.0	1			0	0	Int64
13	disp	177.746	102.032	1.0	109.0	151.0	231.0	500.0			97.0	68.0	Float64
14	hp	117.005	46.6881	39	88.0	105.0	140.0	365			113	52	Int64
15	lng	186.705	20.0657	139.0	172.2	185.0	200.0	236.0			168.5	139.0	Float64
16	wdt	69.6557	5.29795	53.0	65.9	69.0	73.0	81.0			65.6	60.7	Float64
17	wb	104.616	9.817	14.3	97.0	103.4	110.8	212.1			96.5	84.6	Float64
18	mpg	20.9964	5.8107	9.13	17.0	20.0	25.0	53.0			27.0	28.0	Float64
19	q	78.804	89.0799	0.049	15.603	47.35	109.002	646.526			27.807	10.5	Float64
20	firmids	13.7438	6.25909	1	8.0	16.0	19.0	26			3	23	Int64
21	euro	0.23816	0.426053	0	0.0	0.0	0.0	1			0	1	Int64
22	reli	3.0433	1.29108	1	2.0	3.0	4.0	5			5	1	Int64
23	dfi	0.0135318	0.115563	0	0.0	0.0	0.0	1			0	0	Int64
24	hp2wt	0.394375	0.0966429	0.170455	0.336585	0.375049	0.427509	0.947581			0.502446	0.283843	Float64
25	size	1.31016	0.237637	0.756	1.13128	1.26983	1.4527	1.888			1.10536	0.84373	Float64
26	japan	0.171854	0.377338	0	0.0	0.0	0.0	1			1	0	Int64
27	cpi	88.3488	28.8772	40.5	60.6	96.5	113.6	130.7			109.6	124.0	Float64
28	gasprice	1.02727	0.206509	0.785362	0.826794	1.01788	1.13523	1.47121			0.826794	0.810081	Float64
29	nb_hh	81539.0	8770.71	64778	74142.0	83527.0	89479.0	93347			88458	92830	Int64
30	cat			compact				midsize	3		compact	compact	String
31	door2	0.341903	0.474454	0	0.0	0.0	1.0	1			0	1	Int64
32	door3	0.0419486	0.200517	0	0.0	0.0	0.0	1			1	0	Int64
33	door4	0.596301	0.490749	0	0.0	1.0	1.0	1			0	0	Int64
34	door5	0.0198466	0.139505	0	0.0	0.0	0.0	1			0	0	Int64
35	sampleweight	78804.0	89079.9	49	15603.0	47350.0	109002.0	646526			27807	10500	Int64
36	mpgd	21.1248	6.94301	8.61178	15.9838	20.5059	24.8007	65.4256			32.6562	34.5645	Float64
37	dpm	0.052389	0.0167817	0.0152845	0.0403215	0.0487664	0.0625634	0.11612			0.030622	0.0289315	Float64
38	model			ACINTE1986				YUYUGO1989	2172		ACINTE1986	YUYUGO1989	String

	year	cy_mean	dr_mean	at_mean	ps_mean	air_mean	drv_mean	p_mean	wt_mean	dom_mean	disp_mean	hp_mean	lng_mean	wdt_mean	wb_mean	mpg_mean
	Int64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64	Float64
1	1986	4.94615	3.43846	0.330769	0.684615	0.323077	0.576923	11.7826	2733.55	0.569231	160.212	110.192	180.583	67.9892	101.829	23.5546
2	1987	5.0	3.43357	0.321678	0.72028	0.391608	0.615385	13.4363	2785.15	0.538462	163.357	117.028	180.416	67.9965	101.909	22.6503
3	1988	5.09333	3.44	0.4	0.766667	0.446667	0.613333	14.8857	2847.8	0.533333	163.836	125.053	181.756	68.2853	102.443	21.88
4	1989	4.9932	2.9932	0.37415	0.795918	0.503401	0.680272	16.6905	2895.97	0.496599	163.503	133.837	181.348	68.5537	103.037	21.7891
5	1990	5.0687	2.85496	0.40458	0.816794	0.458015	0.717557	14.0384	2923.47	0.51145	2.7084	133.588	182.156	68.655	101.843	21.8168
6	1971	6.04348	3.36957	0.152174	0.163043	0.0	0.0	8.85633	3176.08	0.684783	245.548	171.891	195.446	72.9891	110.787	17.1995
7	1975	5.86022	3.26882	0.311828	0.387097	0.107527	0.0752688	9.64184	3328.6	0.645161	238.5	119.473	195.312	71.7419	108.129	16.2688
8	1976	5.62626	3.28283	0.282828	0.343434	0.10101	0.0909091	9.49007	3159.45	0.616162	218.66	112.889	192.515	70.8889	106.833	18.5556
9	1977	5.61053	3.24211	0.294737	0.378947	0.0842105	0.136842	9.86429	3070.06	0.610526	204.085	109.463	190.747	70.4947	106.038	19.7789
10	1980	5.23301	3.35922	0.252427	0.359223	0.174757	0.252427	10.7269	2776.99	0.601942	179.446	99.699	185.398	69.4854	103.684	21.6699
11	1981	5.16379	3.21552	0.336207	0.491379	0.241379	0.232759	13.0352	2819.37	0.517241	171.325	102.017	186.56	69.3276	103.929	22.5
12	1982	4.98182	3.32727	0.336364	0.490909	0.236364	0.381818	11.5913	2747.25	0.6	158.536	98.3	185.273	68.8636	103.151	23.7727
13	1983	4.88696	3.30435	0.33913	0.530435	0.2	0.434783	11.1408	2736.82	0.617391	156.169	97.8522	184.087	68.7913	102.857	25.1478
14	1984	4.92035	3.35398	0.345133	0.663717	0.283186	0.477876	11.6477	2765.41	0.59292	166.069	107.265	183.822	68.4637	102.735	24.1504
15	1985	4.96324	3.31618	0.345588	0.669118	0.330882	0.522059	12.4764	2759.99	0.566176	163.106	108.081	182.096	68.1625	102.349	22.3868
16	1978	5.57895	3.33684	0.263158	0.347368	0.0947368	0.178947	10.6021	2964.8	0.642105	199.494	107.695	189.347	70.3579	105.337	19.9263
17	1979	5.33333	3.28431	0.235294	0.313725	0.0882353	0.215686	10.4513	2840.49	0.598039	184.635	104.108	186.559	69.5	103.849	20.1078
18	1972	6.20225	3.44944	0.337079	0.348315	0.0449438	0.0	9.04282	3253.78	0.696629	256.947	134.348	196.775	73.3146	111.563	16.3317
19	1973	6.37209	3.44186	0.395349	0.372093	0.0697674	0.0	9.0452	3337.42	0.709302	261.949	131.256	198.384	73.0233	111.215	16.2506
20	1974	6.0	3.30556	0.375	0.375	0.125	0.0	9.25473	3268.29	0.652778	239.179	122.556	196.194	71.9306	108.874	16.3329

Parameter	True Value	Description
$ \beta_0$	5	Intercept (demand)
$ \beta_x$	2	Utility from good $x$
$ \sigma_d$	1 / 3 (second monte carlo)	Covariance
$\alpha $	1	Price coefficients
$ \gamma_0$	1	Intercept (Supply)
$\gamma_x $	5	Cost from good $x$
$ \sigma_c$	0.25	Covariance $(\xi_j)$
$ \gamma_w$	0.25	Parameters for input costs
$ \sigma_\omega$	0.25	Covariance $(\omega_j)$

Fixed Coefficients Random Utility (Demand) Estimation¶

Introduction¶

For the rest of this notebook, we will introduce two empirical examples:¶

A. Estimate logit-demand using, BLP(1995)'s aggregate market level data.¶

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game¶

A1.Introduction of car data from BLP(1995)¶

As an empirical study, we will replicate Table 3, as in BLP (1995).¶

Variable name/short description¶

For detailed description, please see BLP(1995) section 7.1. (Data section)¶

A2. Data cleaning¶

Step 3. Construct dependent variable: $ \text{ln}(s_{jt}) - \text{ln}(s_{0t}) $¶

A3. Run linear regression using eq(A)¶

Step 4. we use hp2wt, air, mpgd, size as product characteristics:¶

Step 5. Obtain Price elasticities:¶

Replication: BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.¶

I derive the number of inelastic car model. My estimates are 1,502. BLP's estimates were 1,494, which is pretty close.¶

A4. Run 2sls using instruments¶

1. the sum of size at market $t$. (Note that you need to drop product $j$'s own size.)¶

2. the sum of size across rival firm products at market $t$.¶

Comparision with BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.¶

Note that I have slightly different price coefficients, I observe number of inelastic demand good is 2. BLP estimates were 22.¶

Step 6. Discussion : IV regressions¶

Reported price coefficient ($\alpha$) is -0.0886 in OLS.¶

Now we have -0.2894 ($\alpha$) in IV regression. Prices are upward biased in OLS.¶

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game¶

B1. Data Generating Process¶

B2. Obtain (numerically) equilibrium price and market shares (still D.G.P)¶

B3. Regress using OLS / IV¶

For each simulation, we independently draw these for 500 markets, as in Berry(1994).¶

Repeat 100 times, and report Monte Racrlo results¶

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=1$¶

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=3$¶

Reference¶

Fixed Coefficients Random Utility (Demand) Estimation¶

Introduction¶

For the rest of this notebook, we will introduce two empirical examples:¶

A. Estimate logit-demand using, BLP(1995)'s aggregate market level data.¶

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game¶

A1.Introduction of car data from BLP(1995)¶

As an empirical study, we will replicate Table 3, as in BLP (1995).¶

Variable name/short description¶

For detailed description, please see BLP(1995) section 7.1. (Data section)¶

A2. Data cleaning¶

Step 1. Obtain market share for each good $j$: $s_{jt}$ = $\frac{q_{jt}}{nb\_hh_{t}}$¶

Step 2. Obtain market share for outside good 0: $s_{0t}$ = $\frac{ \Big(nb\_hh_t - \sum_{k=1}^J(q_{kt}) \Big)}{nb\_hh_t}$¶

Step 3. Construct dependent variable: $ \text{ln}(s_{jt}) - \text{ln}(s_{0t}) $¶

A3. Run linear regression using eq(A)¶

Step 4. we use hp2wt, air, mpgd, size as product characteristics:¶

Step 5. Obtain Price elasticities:¶

Replication: BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.¶

I derive the number of inelastic car model. My estimates are 1,502. BLP's estimates were 1,494, which is pretty close.¶

A4. Run 2sls using instruments¶

1. the sum of size at market $t$. (Note that you need to drop product $j$'s own size.)¶

2. the sum of size across rival firm products at market $t$.¶

Comparision with BLP Table 3, IV Logit Demand Column (Row: No. Inelastic De) in page 873.¶

Note that I have slightly different price coefficients, I observe number of inelastic demand good is 2. BLP estimates were 22.¶

Step 6. Discussion : IV regressions¶

Reported price coefficient ($\alpha$) is -0.0886 in OLS.¶

Now we have -0.2894 ($\alpha$) in IV regression. Prices are upward biased in OLS.¶

B. Monte Carlo Example: estimate logit-demand after solving Nash-Bertrand game¶

B1. Data Generating Process¶

B2. Obtain (numerically) equilibrium price and market shares (still D.G.P)¶

B3. Regress using OLS / IV¶

For each simulation, we independently draw these for 500 markets, as in Berry(1994).¶

Repeat 100 times, and report Monte Racrlo results¶

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=1$¶

Replicate Table 1 in Berry(1994), column (1) and (2), where $\sigma_d=3$¶

Reference¶