Coarsened Exact Matching (CEM) Analysis

In this document, I’ll go through the process of Coarsened Exact Matching (CEM), which allows us to only include comparable counties in the dataset we have been working on. Through this process, we can minimize the imbalance in covariates, which can be problematic for the full regression model.

Coarsened Exact Matching (CEM)

To conduct CEM in R, we will use cem package. The method and the package have been developed and advanced by a team of researchers. You can find more details here.

set.seed(5000)
# install.packages("cem")
library(tidyverse)
library(cem)
library(ggplot2)

Clean the worksapce and import raw dataset from GitHub repository.

#rm(list = ls())     # Clean workspace

d <- read_csv("https://raw.githubusercontent.com/texastipi/broadband_entrepreneurship/master/Broadband-Entrepreneurship-TXKSME.csv")
glimpse(d)

Treatment, Outcome, and Covariate Variables

CEM involves defining treatment, outcome, and covariate variables of interest. In our analysis, we are interested in the treatment effect of broadband-related variables on entrepreneurial outcome variables. The reason for conducting CEM is to address the imbalance in terms of various economic and demographic factors of different counties. Below are the treatment, outcome, and covariates.

Treatment

For the listed treatment braodband variables, we split each variable into two categories (Low, High). Gallardo, Whitacre, Kumar, & Upendram (2021) used three categories and eliminated the ‘Medium’ category and consider ‘High’ as the treated group and ‘Low’ as untreated group. However, our sample is substantially smaller and we are more likely to end up with matched observations that are too small for any inferential statistical analysis considering the number of pre-treatment covariates. Therefore, we will convert the following variables to mean-based treatment categories:

pct_25_3_dec_2019_fcc: FCC 477 availability of broadband at 25/3Mbps per county (%, data as of Dec 2019)
pct_100_10_dec_2019_fcc: FCC 477 availability of broadband at 100/10Mbps per county (%, data as of Dec 2019)
pct_250_25_dec_2019_fcc: FCC 477 availability of broadband at 250/25Mbps per county (%, data as of Dec 2019)
pct_1000_100_dec_2019_fcc: FCC 477 availability of broadband at 1000/100Mbps per county (%, data as of Dec 2019)
pct_fixed_acs_2019: Fixed broadband service subscription according to ACS 5-year estimates (%, 2019)
pct_bb_qos: Compostive broadband quality of service (QoS) measure based on Microsoft’s broadband through-put data (2020) and Mlab’s broadband test data (2018-2019) (%)

Outcome

For the entrepreneurial outcome variables, we investigate several measures of entrepreneurship that were used in the traditional multiple regression analysis. In addition, we explore non-farm proprietorship per capita as a new possible representation of entrepreneurship in a county.

pct_nonfarm_bea_2019: Share of non-farm proprietors over total employment of a county (%, 2019; Source: BEA)
vd_mean_20: Average venture density (2020; Source: GoDaddy)
havd_mean_20: Average highly active venture density (2020; Source: GoDaddy)
nonfarmprop_percapita: Number of non-farm proprietors per capita (2019; Source: BEA)

Covariates

For covariates, we include several economic and demographic variables that could be closely related to the imbalance within the sample. To give an example, Gallardo et al. (2021) used natural log of population, % of people with bachelor’s degree, unemployment rate, and industry diversity index. Our full regression models used industry characteristics, education, age, and rurality as control variables. Here, we will use log of population, % of people with bachelor’s degree, unemployment rate, industry diversity index as covariates for matching. Rurality measures will be investigated in relation to the broadband treatments in the following regression models.

Log of population_2019: Natural log of county population (2019)
indstry_diversity: Industry diversty index calculated based on methods described by Gallardo et al. (2021)
pctbachelors_2019: % of people with Bachelor’s degree (2019; Source: ACS 5-year estimates)
pct_unemployment_2019: Unemployment rate (2019; Source: ACS 5-year estimates)

Rurality measures as additional independent variable

RUCC_2013: Rural-Urban Continuum Codes developed by USDA (2013)
metro_f: Metro/Non-Metro categories based on the RUCC documentation
IRR2010: Rurality index created by Waldorf & Kim (2019). Continuous measure ranging from 0 to 1

Data Preparation and Matching

Let’s first create a subset of data with only the variables we need and some basic identities.

## Subset data with the aforementioned variables, create NLpop_2019 (natural log of population)

d2 <- d %>% select(county_FIPS, county.x,
                   pct25_3_dec_2019_fcc, pct100_10_dec_2019_fcc, pct250_25_dec_2019_fcc, pct1000_100_dec_2019_fcc,
                   pct_fixed_acs_2019, pct_bb_qos,
                   pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                   population_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, RUCC_2013, metro_f, IRR2010) %>% 
  mutate(LNpop_2019 = log(population_2019))

Let’s create treatment variables by recoding the original variables. First, we’ll take a look at the variables distribution and create categories. For matching to work and be useful in the final analysis, we create categories based on average.

## Take a look at the broadband variable summaries

d2 %>% select(pct25_3_dec_2019_fcc, pct100_10_dec_2019_fcc, pct250_25_dec_2019_fcc, pct1000_100_dec_2019_fcc,
             pct_fixed_acs_2019, pct_bb_qos) %>% summary()

##  pct25_3_dec_2019_fcc pct100_10_dec_2019_fcc pct250_25_dec_2019_fcc pct1000_100_dec_2019_fcc pct_fixed_acs_2019   pct_bb_qos    
##  Min.   :0.0009918    Min.   :0.0000         Min.   :0.00000        Min.   :0.00000          Min.   :0.0960     Min.   :0.0000  
##  1st Qu.:0.7885686    1st Qu.:0.4003         1st Qu.:0.04303        1st Qu.:0.00000          1st Qu.:0.4280     1st Qu.:0.2158  
##  Median :0.9193920    Median :0.6730         Median :0.44859        Median :0.02025          Median :0.5190     Median :0.3572  
##  Mean   :0.8468896    Mean   :0.6052         Mean   :0.44310        Mean   :0.16860          Mean   :0.5176     Mean   :0.3930  
##  3rd Qu.:0.9910699    3rd Qu.:0.8776         3rd Qu.:0.78722        3rd Qu.:0.24036          3rd Qu.:0.6025     3rd Qu.:0.5643  
##  Max.   :1.0000000    Max.   :1.0000         Max.   :1.00000        Max.   :1.00000          Max.   :0.8410     Max.   :1.0000

## Use first and third quartile for creating categories
## For the 1000/100 Mbps measure, the 1st quartile is 0. Therefore, we will use mean split as an alternative

d2 <- d2 %>% 
  mutate(BB_25_3_high = case_when(pct25_3_dec_2019_fcc < mean(pct25_3_dec_2019_fcc, na.rm = T) ~ 0,
                                  pct25_3_dec_2019_fcc > mean(pct25_3_dec_2019_fcc, na.rm = T) ~ 1,
                                  TRUE ~ NA_real_),
         BB_100_10_high = case_when(pct100_10_dec_2019_fcc < mean(pct100_10_dec_2019_fcc, na.rm = T) ~ 0,
                                  pct100_10_dec_2019_fcc > mean(pct100_10_dec_2019_fcc, na.rm = T) ~ 1,
                                  TRUE ~ NA_real_),
         BB_250_25_high = case_when(pct250_25_dec_2019_fcc < mean(pct250_25_dec_2019_fcc, na.rm = T) ~ 0,
                                  pct250_25_dec_2019_fcc > mean(pct250_25_dec_2019_fcc, na.rm = T) ~ 1,
                                  TRUE ~ NA_real_),
         BB_1000_100_high = case_when(pct1000_100_dec_2019_fcc < mean(pct1000_100_dec_2019_fcc, na.rm = T) ~ 0,
                                  pct1000_100_dec_2019_fcc > mean(pct1000_100_dec_2019_fcc, na.rm = T) ~ 1,
                                  TRUE ~ NA_real_),
         BB_adoption_high = case_when(pct_fixed_acs_2019 < mean(pct_fixed_acs_2019, na.rm = T) ~ 0,
                                  pct_fixed_acs_2019 > mean(pct_fixed_acs_2019, na.rm = T) ~ 1,
                                  TRUE ~ NA_real_),
         BB_qos_high = case_when(pct_bb_qos < mean(pct_bb_qos, na.rm = T) ~ 0,
                                  pct_bb_qos > mean(pct_bb_qos, na.rm = T) ~ 1,
                                  TRUE ~ NA_real_))

str(d2)

## tibble [375 × 26] (S3: tbl_df/tbl/data.frame)
##  $ county_FIPS             : num [1:375] 20001 20003 20005 20007 20009 ...
##  $ county.x                : chr [1:375] "Allen County" "Anderson" "Atchison" "Barber" ...
##  $ pct25_3_dec_2019_fcc    : num [1:375] 1 1 0.815 0.876 0.931 ...
##  $ pct100_10_dec_2019_fcc  : num [1:375] 0.746 0.382 0.765 0.835 0.931 ...
##  $ pct250_25_dec_2019_fcc  : num [1:375] 0.6507 0 0.0748 0.7789 0.7619 ...
##  $ pct1000_100_dec_2019_fcc: num [1:375] 0.00509 0 0.07478 0.77886 0.02444 ...
##  $ pct_fixed_acs_2019      : num [1:375] 0.568 0.45 0.531 0.641 0.622 0.548 0.571 0.59 0.411 0.366 ...
##  $ pct_bb_qos              : num [1:375] 0.618 0.161 0.419 0.334 0.476 ...
##  $ pct_nonfarm_bea_2019    : num [1:375] 0.186 0.268 0.181 0.366 0.271 ...
##  $ vd_mean_20              : num [1:375] 0.148 2.255 1.288 2.123 1.492 ...
##  $ havd_mean_20            : num [1:375] 0.0249 0.4723 0.2745 0.7061 0.4667 ...
##  $ nonfarmprop_percapita   : num [1:375] 0.122 0.1367 0.0909 0.2634 0.1918 ...
##  $ population_2019         : num [1:375] 12556 7835 16268 4624 26453 ...
##  $ indstry_diversity       : num [1:375] 0.849 0.868 0.865 0.862 0.889 ...
##  $ pctbachelors_2019       : num [1:375] 0.203 0.181 0.22 0.207 0.209 0.217 0.212 0.298 0.255 0.159 ...
##  $ pct_unemployment_2019   : num [1:375] 0.046 0.034 0.028 0.025 0.039 0.023 0.034 0.038 0.045 0.068 ...
##  $ RUCC_2013               : num [1:375] 7 6 6 9 7 6 6 2 9 8 ...
##  $ metro_f                 : chr [1:375] "Nonmetro" "Nonmetro" "Nonmetro" "Nonmetro" ...
##  $ IRR2010                 : num [1:375] 0.55 0.57 0.53 0.61 0.54 0.55 0.56 0.5 0.62 0.6 ...
##  $ LNpop_2019              : num [1:375] 9.44 8.97 9.7 8.44 10.18 ...
##  $ BB_25_3_high            : num [1:375] 1 1 0 1 1 1 0 1 0 0 ...
##  $ BB_100_10_high          : num [1:375] 1 0 1 1 1 0 0 1 0 0 ...
##  $ BB_250_25_high          : num [1:375] 1 0 0 1 1 0 0 1 0 0 ...
##  $ BB_1000_100_high        : num [1:375] 0 0 0 1 0 0 0 0 0 0 ...
##  $ BB_adoption_high        : num [1:375] 1 0 1 1 1 1 1 1 0 0 ...
##  $ BB_qos_high             : num [1:375] 1 0 1 0 1 1 0 1 0 0 ...

We’ll now use cem package to match the sample to treatment based on the covariates we mentioned above.

Note that the package does not allow NAs in the treatment variable. Therefore, we have to create separate datasets for each treatment variable of interest.

Also, our sample size after omiting NAs will become substantially small. Especially as we are using 4 pre-treatment covariates for matching. This could be problematic, making the following regression analyses less effective and meaningful. Therefore, we will apply user choiced coarsening to the covariates. This will relax the categories for matching. Specifically, we will set custom cut points for each covariate by retrieving break points for 3 bins in the total range of each variable.

## Inspect how much observations are there in treated and control groups
table(d2$BB_25_3_high)

## 
##   0   1 
## 125 250

table(d2$BB_100_10_high)

## 
##   0   1 
## 166 209

table(d2$BB_250_25_high)

## 
##   0   1 
## 185 190

table(d2$BB_1000_100_high)

## 
##   0   1 
## 262 113

table(d2$BB_adoption_high)

## 
##   0   1 
## 185 190

table(d2$BB_qos_high)

## 
##   0   1 
## 208 167

## Set cut points for each covariate
popcut <- hist(d2$LNpop_2019, br=seq(min(d2$LNpop_2019, na.rm = T), max(d2$LNpop_2019, na.rm = T), length.out = 5), plot = F)$breaks

indstrycut <- hist(d2$indstry_diversity, br=seq(min(d2$indstry_diversity), max(d2$indstry_diversity), length.out = 5), plot = F)$breaks

educut <- hist(d2$pctbachelors_2019, br=seq(min(d2$pctbachelors_2019), max(d2$pctbachelors_2019), length.out = 5), plot = F)$breaks

unempcut <- hist(d2$pct_unemployment_2019, br=seq(min(d2$pct_unemployment_2019), max(d2$pct_unemployment_2019), length.out = 5), plot = F)$breaks

# Ruralities
irrcut <- hist(d2$IRR2010, br = seq(min(d2$IRR2010), max(d2$IRR2010), length.out = 5), plot = F)$breaks


mycp <- list(LNpop_2019 = popcut, indstry_diversity = indstrycut, pctbachelors_2019 = educut, pct_unemployment_2019 = unempcut, IRR2010 = irrcut)

## List of covariates
vars <- c("LNpop_2019", "indstry_diversity", "pctbachelors_2019", "pct_unemployment_2019", "IRR2010")

droplst <- c("pct_nonfarm_bea_2019", "vd_mean_20", "havd_mean_20", "nonfarmprop_percapita", "metro_f")

#### Treatment: BB_25_3_high ####

d25_3 <- d2 %>% select(BB_25_3_high,
                    pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                    LNpop_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, IRR2010, metro_f)
d25_3 <- data.frame(na.omit(d25_3))

## Overall imbalance check

imbalance(group = d25_3$BB_25_3_high, data = d25_3[vars])

## 
## Multivariate Imbalance Measure: L1=0.926
## Percentage of local common support: LCS=5.6%
## 
## Univariate Imbalance Measures:
## 
##                          statistic   type        L1       min        25%       50%       75%       max
## LNpop_2019             1.277501460 (diff) 0.2739712  1.183512  0.9751537  1.116906  1.361330  4.016635
## indstry_diversity      0.010816166 (diff) 0.0000000  0.119824  0.0104620  0.009591  0.003016 -0.000035
## pctbachelors_2019      0.036170988 (diff) 0.0000000  0.063000  0.0220000  0.026000  0.035000  0.160000
## pct_unemployment_2019  0.001061008 (diff) 0.0000000  0.000000  0.0100000  0.004000 -0.008000  0.010000
## IRR2010               -0.066790123 (diff) 0.0000000 -0.340000 -0.0600000 -0.040000 -0.050000 -0.030000

## CEM match using user-choice coarsening

match_fcc_25_3 <- cem(treatment = "BB_25_3_high", data = d25_3,
                      drop = droplst,
                      cutpoints = mycp, keep.all = T)

## 
## Using 'BB_25_3_high'='1' as baseline group

match_fcc_25_3

##            G0  G1
## All       120 243
## Matched    94 167
## Unmatched  26  76

#### Treatment: BB_100_10_high ####

d100_10 <- d2 %>% select(BB_100_10_high,
                    pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                    LNpop_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, IRR2010, metro_f)
d100_10 <- data.frame(na.omit(d100_10))

## Overall imbalance check

imbalance(group = d100_10$BB_100_10_high, data = d100_10[vars])

## 
## Multivariate Imbalance Measure: L1=0.920
## Percentage of local common support: LCS=5.0%
## 
## Univariate Imbalance Measures:
## 
##                           statistic   type       L1       min        25%       50%       75%       max
## LNpop_2019             1.3064855207 (diff) 0.295197  1.329880  0.9158668  1.251601  1.431673  3.625198
## indstry_diversity      0.0042943540 (diff) 0.000000  0.119824  0.0059060  0.001717 -0.002152 -0.003139
## pctbachelors_2019      0.0409810961 (diff) 0.000000  0.063000  0.0170000  0.035000  0.053000  0.165000
## pct_unemployment_2019 -0.0002424261 (diff) 0.000000  0.008000  0.0040000 -0.005000 -0.007000  0.010000
## IRR2010               -0.0645649631 (diff) 0.000000 -0.320000 -0.0800000 -0.040000 -0.040000 -0.020000

## CEM match using user-choice coarsening

match_fcc_100_10 <- cem(treatment = "BB_100_10_high", data = d100_10,
                        drop = droplst,
                        cutpoints = mycp, keep.all = T)

## 
## Using 'BB_100_10_high'='1' as baseline group

match_fcc_100_10

##            G0  G1
## All       160 203
## Matched   131 135
## Unmatched  29  68

#### Treatment: BB_250_25_high ####

d250_25 <- d2 %>% select(BB_250_25_high,
                    pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                    LNpop_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, IRR2010, metro_f)
d250_25 <- data.frame(na.omit(d250_25))

## Overall imbalance check

imbalance(group = d250_25$BB_250_25_high, data = d250_25[vars])

## 
## Multivariate Imbalance Measure: L1=0.886
## Percentage of local common support: LCS=6.5%
## 
## Univariate Imbalance Measures:
## 
##                           statistic   type        L1       min       25%       50%       75%       max
## LNpop_2019             1.3464351904 (diff) 0.2958093  1.183512  1.039657  1.313773  1.504972  3.625198
## indstry_diversity      0.0037317343 (diff) 0.0000000  0.119824  0.004863  0.001845 -0.001019 -0.005135
## pctbachelors_2019      0.0458836623 (diff) 0.0000000  0.063000  0.024000  0.037000  0.063000  0.064000
## pct_unemployment_2019  0.0008724264 (diff) 0.0000000  0.012000  0.005000 -0.001000 -0.006000  0.010000
## IRR2010               -0.0707904646 (diff) 0.0000000 -0.220000 -0.080000 -0.050000 -0.050000 -0.030000

## CEM match using user-choice coarsening

match_fcc_250_25 <- cem(treatment = "BB_250_25_high", data = d250_25,
                        drop = droplst,
                        cutpoints = mycp, keep.all = T)

## 
## Using 'BB_250_25_high'='1' as baseline group

match_fcc_250_25

##            G0  G1
## All       178 185
## Matched   146 129
## Unmatched  32  56

#### Treatment: BB_1000_100_high ####

d1000_100 <- d2 %>% select(BB_1000_100_high,
                    pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                    LNpop_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, IRR2010, metro_f)
d1000_100 <- data.frame(na.omit(d1000_100))

## Overall imbalance check

imbalance(group = d1000_100$BB_1000_100_high, data = d1000_100[vars])

## 
## Multivariate Imbalance Measure: L1=0.956
## Percentage of local common support: LCS=3.4%
## 
## Univariate Imbalance Measures:
## 
##                          statistic   type       L1       min        25%        50%       75%       max
## LNpop_2019            -0.406529632 (diff) 0.207529 -1.183512  0.1440635  0.1948011 -0.899226 -1.692590
## indstry_diversity     -0.003268772 (diff) 0.000000 -0.126105 -0.0026150 -0.0020620  0.000650  0.005135
## pctbachelors_2019     -0.047284427 (diff) 0.000000 -0.063000 -0.0400000 -0.0470000 -0.054000 -0.084000
## pct_unemployment_2019  0.007024775 (diff) 0.000000 -0.012000  0.0060000  0.0070000  0.011000  0.008000
## IRR2010                0.031886529 (diff) 0.000000  0.190000  0.0500000 -0.0100000  0.000000  0.030000

## CEM match using user-choice coarsening

match_fcc_1000_100 <- cem(treatment = "BB_1000_100_high", data = d1000_100,
                        drop = droplst,
                        cutpoints = mycp, keep.all = T)

## 
## Using 'BB_1000_100_high'='1' as baseline group

match_fcc_1000_100

##            G0  G1
## All       252 111
## Matched   199  91
## Unmatched  53  20

#### Treatment: BB_adoption_high ####

dadoption <- d2 %>% select(BB_adoption_high,
                    pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                    LNpop_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, IRR2010, metro_f)
dadoption <- data.frame(na.omit(dadoption))

## Overall imbalance check

imbalance(group = dadoption$BB_adoption_high, data = dadoption[vars])

## 
## Multivariate Imbalance Measure: L1=0.918
## Percentage of local common support: LCS=4.6%
## 
## Univariate Imbalance Measures:
## 
##                          statistic   type        L1        min        25%        50%       75%       max
## LNpop_2019             0.541803129 (diff) 0.3163024 -0.8740613 -0.3496337  0.5117201  1.243733  1.692590
## indstry_diversity     -0.005574648 (diff) 0.0000000 -0.0618870 -0.0044110 -0.0067190 -0.006251 -0.005158
## pctbachelors_2019      0.071844627 (diff) 0.0000000  0.0950000  0.0470000  0.0590000  0.082000  0.255000
## pct_unemployment_2019 -0.012052095 (diff) 0.0000000  0.0000000 -0.0070000 -0.0100000 -0.014000 -0.037000
## IRR2010               -0.031935337 (diff) 0.0000000 -0.2000000 -0.0600000 -0.0200000  0.020000  0.010000

## CEM match using user-choice coarsening

match_fcc_adoption <- cem(treatment = "BB_adoption_high", data = dadoption,
                        drop = droplst,
                        cutpoints = mycp, keep.all = T)

## 
## Using 'BB_adoption_high'='1' as baseline group

match_fcc_adoption

##            G0  G1
## All       180 183
## Matched   147 133
## Unmatched  33  50

#### Treatment: BB_qos_high ####

dqos <- d2 %>% select(BB_qos_high,
                    pct_nonfarm_bea_2019, vd_mean_20, havd_mean_20, nonfarmprop_percapita,
                    LNpop_2019, indstry_diversity, pctbachelors_2019, pct_unemployment_2019, IRR2010, metro_f)
dqos <- data.frame(na.omit(dqos))

## Overall imbalance check

imbalance(group = dqos$BB_qos_high, data = dqos[vars])

## 
## Multivariate Imbalance Measure: L1=0.919
## Percentage of local common support: LCS=5.0%
## 
## Univariate Imbalance Measures:
## 
##                          statistic   type      L1       min       25%      50%       75%       max
## LNpop_2019             1.840290979 (diff) 0.40363  1.004286  1.800797  1.66487  2.026217  4.048585
## indstry_diversity      0.007027928 (diff) 0.00000  0.119824  0.008401  0.00513  0.001449 -0.005158
## pctbachelors_2019      0.051348535 (diff) 0.00000  0.063000  0.025000  0.04300  0.061000  0.160000
## pct_unemployment_2019  0.002502580 (diff) 0.00000  0.012000  0.011000  0.00300 -0.004000  0.010000
## IRR2010               -0.097286715 (diff) 0.00000 -0.340000 -0.100000 -0.08000 -0.070000 -0.040000

## CEM match using user-choice coarsening

match_fcc_qos <- cem(treatment = "BB_qos_high", data = dqos,
                        drop = droplst,
                        cutpoints = mycp, keep.all = T)

## 
## Using 'BB_qos_high'='1' as baseline group

match_fcc_qos

##            G0  G1
## All       201 162
## Matched   160 108
## Unmatched  41  54

Causal Inference on the Matched Data

In this section, I will use linear model to estimate the sample average treatment effect on the treated (SATT) using weights produced by CEM stored in the matched dataset. Each subsection is laid out based on different broadband measures (i.e., treatment measures). There will be four models fitted for each treatment broadband variable. The model will estimate broadband treatment effect on the dependent variable. Additionally, the models estimate main effect of metro/non-metro classification and interaction effect as well.

Broadband 25/3Mbps on Entrepreneurship measures

#### Treatment: BB_25_3_high & Outcome: pct_nonfarm_bea_2019 ####

mod_25_3_nonfarm <- att(match_fcc_25_3, pct_nonfarm_bea_2019 ~ BB_25_3_high, data = d25_3)
summary(mod_25_3_nonfarm)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       120 243
## Matched    94 167
## Unmatched  26  76
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                Estimate Std. Error t value p-value    
## (Intercept)  2.5843e-01 6.5989e-03 39.1623  <2e-16 ***
## BB_25_3_high 7.8631e-05 8.2497e-03  0.0095  0.9924    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_25_3_high & Outcome: vd_mean_20 ####

mod_25_3_vd <- att(match_fcc_25_3, vd_mean_20 ~ BB_25_3_high, data = d25_3)
summary(mod_25_3_vd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       120 243
## Matched    94 167
## Unmatched  26  76
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##               Estimate Std. Error t value p-value    
## (Intercept)   1.884890   0.177555 10.6158  <2e-16 ***
## BB_25_3_high -0.074391   0.221970 -0.3351  0.7378    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_25_3_high & Outcome: havd_mean_20 ####

mod_25_3_havd <- att(match_fcc_25_3, havd_mean_20 ~ BB_25_3_high, data = d25_3)
summary(mod_25_3_havd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       120 243
## Matched    94 167
## Unmatched  26  76
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##               Estimate Std. Error t value p-value    
## (Intercept)   0.420377   0.038092 11.0358  <2e-16 ***
## BB_25_3_high -0.037303   0.047621 -0.7833  0.4341    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_25_3_high & Outcome: nonfarmprop_percapita ####

mod_25_3_nonfarmpc <- att(match_fcc_25_3, nonfarmprop_percapita ~ BB_25_3_high, data = d25_3)
summary(mod_25_3_nonfarmpc)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       120 243
## Matched    94 167
## Unmatched  26  76
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##               Estimate Std. Error t value p-value    
## (Intercept)  0.1358864  0.0042729 31.8021 < 2e-16 ***
## BB_25_3_high 0.0106670  0.0053417  1.9969 0.04688 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Broadband 100/10Mbps on Entrepreneurship measures

#### Treatment: BB_100_10_high & Outcome: pct_nonfarm_bea_2019 ####

mod_100_10_nonfarm <- att(match_fcc_100_10, pct_nonfarm_bea_2019 ~ BB_100_10_high, data = d100_10)
summary(mod_100_10_nonfarm)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       160 203
## Matched   131 135
## Unmatched  29  68
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                  Estimate Std. Error t value   p-value    
## (Intercept)     0.2718149  0.0055232 49.2134 < 2.2e-16 ***
## BB_100_10_high -0.0219542  0.0077529 -2.8317  0.004986 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: vd_mean_20 ####

mod_100_10_vd <- att(match_fcc_100_10, vd_mean_20 ~ BB_100_10_high, data = d100_10)
summary(mod_100_10_vd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       160 203
## Matched   131 135
## Unmatched  29  68
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                Estimate Std. Error t value p-value    
## (Intercept)    1.887963   0.161668  11.678  <2e-16 ***
## BB_100_10_high 0.044708   0.226933   0.197   0.844    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: havd_mean_20 ####

mod_100_10_havd <- att(match_fcc_100_10, havd_mean_20 ~ BB_100_10_high, data = d100_10)
summary(mod_100_10_havd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       160 203
## Matched   131 135
## Unmatched  29  68
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                  Estimate Std. Error t value p-value    
## (Intercept)     0.4189506  0.0350164 11.9644  <2e-16 ***
## BB_100_10_high -0.0067669  0.0491525 -0.1377  0.8906    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: nonfarmprop_percapita ####

mod_100_10_nonfarmpc <- att(match_fcc_100_10, nonfarmprop_percapita ~ BB_100_10_high, data = d100_10)
summary(mod_100_10_nonfarmpc)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       160 203
## Matched   131 135
## Unmatched  29  68
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                 Estimate Std. Error t value p-value    
## (Intercept)    0.1427568  0.0040512 35.2378  <2e-16 ***
## BB_100_10_high 0.0063926  0.0056867  1.1241   0.262    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Broadband 250_25Mbps on Entrepreneurship measures

#### Treatment: BB_250_25_high & Outcome: pct_nonfarm_bea_2019 ####

mod_250_25_nonfarm <- att(match_fcc_250_25, pct_nonfarm_bea_2019 ~ BB_250_25_high, data = d250_25)
summary(mod_250_25_nonfarm)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       178 185
## Matched   146 129
## Unmatched  32  56
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                  Estimate Std. Error t value p-value    
## (Intercept)     0.2639721  0.0054080 48.8111  <2e-16 ***
## BB_250_25_high -0.0104403  0.0078961 -1.3222  0.1872    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_250_25_high & Outcome: vd_mean_20 ####

mod_250_25_vd <- att(match_fcc_250_25, vd_mean_20 ~ BB_250_25_high, data = d250_25)
summary(mod_250_25_vd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       178 185
## Matched   146 129
## Unmatched  32  56
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                Estimate Std. Error t value p-value    
## (Intercept)     1.89618    0.14177 13.3753  <2e-16 ***
## BB_250_25_high  0.20625    0.20699  0.9964  0.3199    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: havd_mean_20 ####

mod_250_25_havd <- att(match_fcc_250_25, havd_mean_20 ~ BB_250_25_high, data = d250_25)
summary(mod_250_25_havd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       178 185
## Matched   146 129
## Unmatched  32  56
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                Estimate Std. Error t value p-value    
## (Intercept)    0.406713   0.030151 13.4894  <2e-16 ***
## BB_250_25_high 0.045464   0.044022  1.0328  0.3026    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: nonfarmprop_percapita ####

mod_250_25_nonfarmpc <- att(match_fcc_250_25, nonfarmprop_percapita ~ BB_250_25_high, data = d250_25)
summary(mod_250_25_nonfarmpc)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       178 185
## Matched   146 129
## Unmatched  32  56
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                 Estimate Std. Error t value p-value    
## (Intercept)    0.1444826  0.0035362 40.8585  <2e-16 ***
## BB_250_25_high 0.0057561  0.0051630  1.1149  0.2659    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Broadband 1000_100Mbps on Entrepreneurship measures

#### Treatment: BB_1000_100_high & Outcome: pct_nonfarm_bea_2019 ####

mod_1000_100_nonfarm <- att(match_fcc_1000_100, pct_nonfarm_bea_2019 ~ BB_1000_100_high, data = d1000_100)
summary(mod_1000_100_nonfarm)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       252 111
## Matched   199  91
## Unmatched  53  20
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                   Estimate Std. Error t value   p-value    
## (Intercept)      0.2490891  0.0043451 57.3260 < 2.2e-16 ***
## BB_1000_100_high 0.0278605  0.0077568  3.5918 0.0003861 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: vd_mean_20 ####

mod_1000_100_vd <- att(match_fcc_1000_100, vd_mean_20 ~ BB_1000_100_high, data = d1000_100)
summary(mod_1000_100_vd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       252 111
## Matched   199  91
## Unmatched  53  20
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                   Estimate Std. Error t value p-value    
## (Intercept)      2.5428191  0.2208176  11.515  <2e-16 ***
## BB_1000_100_high 0.0051431  0.3941958   0.013  0.9896    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: havd_mean_20 ####

mod_1000_100_havd <- att(match_fcc_1000_100, havd_mean_20 ~ BB_1000_100_high, data = d1000_100)
summary(mod_1000_100_havd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       252 111
## Matched   199  91
## Unmatched  53  20
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                  Estimate Std. Error t value p-value    
## (Intercept)      0.533124   0.042329 12.5946  <2e-16 ***
## BB_1000_100_high 0.023844   0.075565  0.3155  0.7526    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: nonfarmprop_percapita ####

mod_1000_100_nonfarmpc <- att(match_fcc_1000_100, nonfarmprop_percapita ~ BB_1000_100_high, data = d1000_100)
summary(mod_1000_100_nonfarmpc)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       252 111
## Matched   199  91
## Unmatched  53  20
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                   Estimate Std. Error t value   p-value    
## (Intercept)      0.1447441  0.0030059 48.1529 < 2.2e-16 ***
## BB_1000_100_high 0.0260261  0.0053661  4.8501 2.023e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Broadband Adoption on Entrepreneurship measures

#### Treatment: BB_100_10_high & Outcome: pct_nonfarm_bea_2019 ####

mod_adoption_nonfarm <- att(match_fcc_adoption, pct_nonfarm_bea_2019 ~ BB_adoption_high, data = dadoption)
summary(mod_adoption_nonfarm)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       180 183
## Matched   147 133
## Unmatched  33  50
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                    Estimate Std. Error t value   p-value    
## (Intercept)       0.2791793  0.0050709 55.0554 < 2.2e-16 ***
## BB_adoption_high -0.0295425  0.0073576 -4.0152 7.646e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: vd_mean_20 ####

mod_adoption_vd <- att(match_fcc_adoption, vd_mean_20 ~ BB_adoption_high, data = dadoption)
summary(mod_adoption_vd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       180 183
## Matched   147 133
## Unmatched  33  50
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                  Estimate Std. Error t value p-value    
## (Intercept)       2.50706    0.19293 12.9945  <2e-16 ***
## BB_adoption_high -0.35765    0.27994 -1.2776  0.2025    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: havd_mean_20 ####

mod_adoption_havd <- att(match_fcc_adoption, havd_mean_20 ~ BB_adoption_high, data = dadoption)
summary(mod_adoption_havd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       180 183
## Matched   147 133
## Unmatched  33  50
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                   Estimate Std. Error t value p-value    
## (Intercept)       0.538286   0.038742 13.8941  <2e-16 ***
## BB_adoption_high -0.071035   0.056213 -1.2637  0.2074    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_100_10_high & Outcome: nonfarmprop_percapita ####

mod_adoption_nonfarmpc <- att(match_fcc_adoption, nonfarmprop_percapita ~ BB_adoption_high, data = dadoption)
summary(mod_adoption_nonfarmpc)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       180 183
## Matched   147 133
## Unmatched  33  50
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##                   Estimate Std. Error t value p-value    
## (Intercept)      0.1536612  0.0038629  39.779  <2e-16 ***
## BB_adoption_high 0.0046801  0.0056048   0.835  0.4044    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Broadband Quality of Service on Entrepreneurship measures

#### Treatment: BB_qos_high & Outcome: pct_nonfarm_bea_2019 ####

mod_qos_nonfarm <- att(match_fcc_qos, pct_nonfarm_bea_2019 ~ BB_qos_high, data = dqos)
summary(mod_qos_nonfarm)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       201 162
## Matched   160 108
## Unmatched  41  54
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##               Estimate Std. Error t value p-value    
## (Intercept)  0.2687571  0.0054175 49.6095 < 2e-16 ***
## BB_qos_high -0.0191461  0.0085340 -2.2435 0.02569 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_qos_high & Outcome: vd_mean_20 ####

mod_qos_vd <- att(match_fcc_qos, vd_mean_20 ~ BB_qos_high, data = dqos)
summary(mod_qos_vd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       201 162
## Matched   160 108
## Unmatched  41  54
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##             Estimate Std. Error t value p-value    
## (Intercept)  2.27336    0.17623  12.900  <2e-16 ***
## BB_qos_high -0.16739    0.27761  -0.603   0.547    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_qos_high & Outcome: havd_mean_20 ####

mod_qos_havd <- att(match_fcc_qos, havd_mean_20 ~ BB_qos_high, data = dqos)
summary(mod_qos_havd)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       201 162
## Matched   160 108
## Unmatched  41  54
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##              Estimate Std. Error t value p-value    
## (Intercept)  0.484142   0.036154 13.3912  <2e-16 ***
## BB_qos_high -0.027525   0.056952 -0.4833  0.6293    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### Treatment: BB_qos_high & Outcome: nonfarmprop_percapita ####

mod_qos_nonfarmpc <- att(match_fcc_qos, nonfarmprop_percapita ~ BB_qos_high, data = dqos)
summary(mod_qos_nonfarmpc)

## 
## Treatment effect estimation for data:
## 
##            G0  G1
## All       201 162
## Matched   160 108
## Unmatched  41  54
## 
## Linear regression model estimated on matched data only
## 
## Coefficients:
##              Estimate Std. Error t value p-value    
## (Intercept) 0.1385996  0.0032522 42.6172  <2e-16 ***
## BB_qos_high 0.0056309  0.0051231  1.0991  0.2727    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Coarsened Exact Matching (CEM) Analysis

Jaewon R. Choi