Applied Statistics

Survival Analysis

Published:

This post walks through survival analysis on a panel dataset. It replicates a model that estimates the duration of democratic regime survival on a panel dataset incorporating 105 countries from 1950 to 2004.

Unordered Categorical Models

Published:

This post replicates and extends a flagship American Political Science Review article that uses a multinomial logit to predict the likelihood of an insurgent group’s mode of warfare given the presence or absence of the Cold War.

Binary Regression

Published:

In October 1988, a plebiscite vote was held in Chile to determine whether or not Augusto Pinochet should extend his rule for another eight years. The package carData contains Chilean national survey data collected in April and May 1988. In this analysis, we evaluate the effect that several variables have on a voter’s likelihood to keep Pinochet in power using binary regression models.

Time Series Analysis

Published:

This post applies time series analysis to data provided by Kwon (2015) which seeks to empirically understand the causal relationship between political polarization and income inequality in the U.S.

Linear Models

Published:

What causes differences in people’s life satisfaction across countries? The Association of Religion Data Archives (ARDA) has assembled a dataset that stitches together economic, social and demographic variables across 252 countries. In this analysis, we inspect factors associated with life satisfaction with linear models.

Ordered Categorical Models

Published:

In 2009, a scandal broke out across England. Many British Members of Parliament (MPs) were exposed as misusing their allowances and expenses permitted to them as an elected official. In 2010, British voters were surveyed as to whether they think the MPs implicated in the scandal should resign or not prior to parliamentary elections. At the time of the scandal, the Labour party led by PM Gordon Brown was in power. The conservative party, led by David Cameron, won the largest number of votes and seats in the 2010 general election on the heels of the scandal. In this analysis, we use generalized linear models for ordered categorical data to further explore the survey data.

Count Model Analysis

Published:

Gelman and Hill (2007) collected New York City (NYC) “stop and frisk” data for 175,000 stops over a 15-month period in 1998-1999. In this analysis, count models are used to model the data.

Multiple Imputation

Published:

Missingness refers to observations or measurement that, in principle, could have been carried out but for whatever reason, failed to occur (Ward and Ahlquist 2018). Data that we collect, whether observational or experimental data, will with near certainty, have some values that are missing. The general idea behind missing data imputation algorithms is that they all fill in the missing data with estimates of what real data would look like if it were available. Because the estimated data is by nature uncertain, we replicate the missing data many times to incorporate the uncertainty into the analysis. This post walks through the latest cutting edge multiple imputation technique developed by Hollenbach et. al (2018).

Numerical Optimization

Published:

Maximum likelihood fixes our observed data and asks: What parameter values most likely generated the data we have? In a likelihood framework, our data is viewed as a joint probability as a function of parameter values for a specified mass or density function. In this case, the joint probability is being maximized with respect to the parameters. A maximum likelihood estimate is one that provides the density or mass function with the highest likelihood of generating the observed data. Numerical optimizers provide the means to estimate our parameters by finding the values that maximize the likelihood of generating our data. This guide helps us understand how optimization algorithms find the best estimates for our coefficients in a step-by-step process.