Posts by Tags seekinginference

Nutritionalcart

Published: January 13, 2020

How healthy is the average Instacart user? Are certain types (i.e., vegetarians, carnivores) of food buyers healthier than others? I bring new data to bear on these questions to better understand how healthy the average Instacart user is and to better understand the health benefits afforded to Instacart users who choose some types (i.e., plant-based, meat-based) of foods over others. To determine the relative health of Instacart users, I matched the top 10 most ordered products by aisle with USDA nutrient data by using USDA-provided API access to their database through JavaScript Object Notation (JSON). To view this project, please click here. An upgraded algorithm that better searches the USDA database can be found here.

NLP: Natural Language Propaganda

Published: July 31, 2020

Who are the targets of insurgent propaganda? I investigate the ability to classify the targets (e.g, the U.S. or Kabul) of insurgent propaganda messages using a novel corpus containing over 11,000 Taliban statements from 2014 to 2020. In experiments with Convolutional Neural Network (CNN) and transformer architectures, I demonstrate that the audiences of insurgent messages are best captured by transformers, likely owing to its encoder-decoder architecture. This paper’s contribution is twofold: First, it offers a new and novel data set with utility in classification and summarization tasks for machine learning. Second, it suggests that since the audience of messaging can be reliably identified, new opportunities are afforded to analysts to look closer at the contrasts in language to better understand the targets of information.

BERT-Vision

Published: April 26, 2021

What compression methods can extract regularity from BERT during fine-tuning? Drawing on research that demonstrates the utility of information found across all of BERT’s layers, we propose a compression method, BERT-Vision, that captures the regularities produced by BERT during fine-tuning. BERT-Vision’s contribution is two-fold: First, we show that compression during fine-tuning can yield comparative and sometimes better performance than BERT, and second, we show that this performance is realized with a model that is 209x smaller than BERT in terms of its parameters. To view this project, please click here.

Typos: A Survey Experiment

Published: January 03, 2021

Command of language is one of the most significant cognitive abilities we possess and is often the most pervasive signal we encounter in a social media setting. When we notice overt and unintentional grammatical errors in social media posts, do we make unconscious assumptions about the authors’ general intelligence? Do we attribute difficulty with written language with other indicators such as lower-performing verbal acuity or overall intelligence? Further, are some categories of grammatical errors more injurious than others – or do we take in stride all these trespasses? To view this project, please click here.

Map Off

Published: November 13, 2019

Map Off is a game designed to test your geography skills in the United States or around the World. The inspiration for this game comes from my wife, Hannah, because we often test our spatial skills against one another in the presence of a map. In turn, we now have access to maps and competition anytime we want.

PetaFlights

Published: December 11, 2020

What accounts for flight delays in the U.S.? This project portrays the machine learning end of a large data engineering project that merged 630 million rows of weather data against 31 million rows of flight data. I use the state-of-the-art in distributed deep learning by leveraging Petastorm, Horovod, and PyTorch to produce a multilayer perceptron model that is distributed across 8 workers in DataBricks. Importantly, I use novel approaches to transform categorical data into continuous features through an embedding table. To view this project, please click here.

Typos: A Survey Experiment

Published: January 03, 2021

Command of language is one of the most significant cognitive abilities we possess and is often the most pervasive signal we encounter in a social media setting. When we notice overt and unintentional grammatical errors in social media posts, do we make unconscious assumptions about the authors’ general intelligence? Do we attribute difficulty with written language with other indicators such as lower-performing verbal acuity or overall intelligence? Further, are some categories of grammatical errors more injurious than others – or do we take in stride all these trespasses? To view this project, please click here.

Latent Control: Hidden Markov Models

Published: January 03, 2021

Who controls territory in civil war? This is a central variable in the research and analysis of civil wars – yet it is incredibly difficult to measure. In this post, I model territorial control as a latent variable – an unobserved variable that presumes it is the cause of its indicators. This project models the latent variable across the entire country of Afghanistan using sub-national event data, a Hidden Markov Model, Uber’s hexagonal spatial index, and logistic spatial and temporal decay functions to treat serially correlated data in time and space. To view this project, please click here.

Latent Control: Hidden Markov Models

Published: January 03, 2021

Who controls territory in civil war? This is a central variable in the research and analysis of civil wars – yet it is incredibly difficult to measure. In this post, I model territorial control as a latent variable – an unobserved variable that presumes it is the cause of its indicators. This project models the latent variable across the entire country of Afghanistan using sub-national event data, a Hidden Markov Model, Uber’s hexagonal spatial index, and logistic spatial and temporal decay functions to treat serially correlated data in time and space. To view this project, please click here.

Latent Control: Hidden Markov Models

Published: January 03, 2021

Who controls territory in civil war? This is a central variable in the research and analysis of civil wars – yet it is incredibly difficult to measure. In this post, I model territorial control as a latent variable – an unobserved variable that presumes it is the cause of its indicators. This project models the latent variable across the entire country of Afghanistan using sub-national event data, a Hidden Markov Model, Uber’s hexagonal spatial index, and logistic spatial and temporal decay functions to treat serially correlated data in time and space. To view this project, please click here.

PetaFlights

Published: December 11, 2020

What accounts for flight delays in the U.S.? This project portrays the machine learning end of a large data engineering project that merged 630 million rows of weather data against 31 million rows of flight data. I use the state-of-the-art in distributed deep learning by leveraging Petastorm, Horovod, and PyTorch to produce a multilayer perceptron model that is distributed across 8 workers in DataBricks. Importantly, I use novel approaches to transform categorical data into continuous features through an embedding table. To view this project, please click here.

Reexamining Civilian Preferences in Civil War: A Survey in Afghanistan

Published: May 31, 2016

How do civilians react to changing authority in civil war? We investigate this question in Afghanistan using survey data from The Asia Foundation following the end of U.S.-led combat operations in 2014. I demonstrate that there is clear evidence that civilian attitudes are indeed conditional on the following three-way interaction: territorial control, ethnicity, and survival. For instance, there is a notable and statistically significant distinction between Pashtuns and non-Pashtuns under Taliban control in their approval of the Afghan Government. I bring largely unused country-wide individual-level data to bear on analyzing civilian wartime beliefs. To view this research project, please click here.

NLP: Natural Language Propaganda

Published: July 31, 2020

Who are the targets of insurgent propaganda? I investigate the ability to classify the targets (e.g, the U.S. or Kabul) of insurgent propaganda messages using a novel corpus containing over 11,000 Taliban statements from 2014 to 2020. In experiments with Convolutional Neural Network (CNN) and transformer architectures, I demonstrate that the audiences of insurgent messages are best captured by transformers, likely owing to its encoder-decoder architecture. This paper’s contribution is twofold: First, it offers a new and novel data set with utility in classification and summarization tasks for machine learning. Second, it suggests that since the audience of messaging can be reliably identified, new opportunities are afforded to analysts to look closer at the contrasts in language to better understand the targets of information.

BERT-Vision

Published: April 26, 2021

What compression methods can extract regularity from BERT during fine-tuning? Drawing on research that demonstrates the utility of information found across all of BERT’s layers, we propose a compression method, BERT-Vision, that captures the regularities produced by BERT during fine-tuning. BERT-Vision’s contribution is two-fold: First, we show that compression during fine-tuning can yield comparative and sometimes better performance than BERT, and second, we show that this performance is realized with a model that is 209x smaller than BERT in terms of its parameters. To view this project, please click here.

PetaFlights

Published: December 11, 2020

What accounts for flight delays in the U.S.? This project portrays the machine learning end of a large data engineering project that merged 630 million rows of weather data against 31 million rows of flight data. I use the state-of-the-art in distributed deep learning by leveraging Petastorm, Horovod, and PyTorch to produce a multilayer perceptron model that is distributed across 8 workers in DataBricks. Importantly, I use novel approaches to transform categorical data into continuous features through an embedding table. To view this project, please click here.

BERT-Vision

Published: April 26, 2021

What compression methods can extract regularity from BERT during fine-tuning? Drawing on research that demonstrates the utility of information found across all of BERT’s layers, we propose a compression method, BERT-Vision, that captures the regularities produced by BERT during fine-tuning. BERT-Vision’s contribution is two-fold: First, we show that compression during fine-tuning can yield comparative and sometimes better performance than BERT, and second, we show that this performance is realized with a model that is 209x smaller than BERT in terms of its parameters. To view this project, please click here.

PetaFlights

Published: December 11, 2020

What accounts for flight delays in the U.S.? This project portrays the machine learning end of a large data engineering project that merged 630 million rows of weather data against 31 million rows of flight data. I use the state-of-the-art in distributed deep learning by leveraging Petastorm, Horovod, and PyTorch to produce a multilayer perceptron model that is distributed across 8 workers in DataBricks. Importantly, I use novel approaches to transform categorical data into continuous features through an embedding table. To view this project, please click here.

Nutritionalcart

Published: January 13, 2020

How healthy is the average Instacart user? Are certain types (i.e., vegetarians, carnivores) of food buyers healthier than others? I bring new data to bear on these questions to better understand how healthy the average Instacart user is and to better understand the health benefits afforded to Instacart users who choose some types (i.e., plant-based, meat-based) of foods over others. To determine the relative health of Instacart users, I matched the top 10 most ordered products by aisle with USDA nutrient data by using USDA-provided API access to their database through JavaScript Object Notation (JSON). To view this project, please click here. An upgraded algorithm that better searches the USDA database can be found here.

Map Off

Published: November 13, 2019

Map Off is a game designed to test your geography skills in the United States or around the World. The inspiration for this game comes from my wife, Hannah, because we often test our spatial skills against one another in the presence of a map. In turn, we now have access to maps and competition anytime we want.

PetaFlights

Published: December 11, 2020

What accounts for flight delays in the U.S.? This project portrays the machine learning end of a large data engineering project that merged 630 million rows of weather data against 31 million rows of flight data. I use the state-of-the-art in distributed deep learning by leveraging Petastorm, Horovod, and PyTorch to produce a multilayer perceptron model that is distributed across 8 workers in DataBricks. Importantly, I use novel approaches to transform categorical data into continuous features through an embedding table. To view this project, please click here.

Reexamining Civilian Preferences in Civil War: A Survey in Afghanistan

Published: May 31, 2016

How do civilians react to changing authority in civil war? We investigate this question in Afghanistan using survey data from The Asia Foundation following the end of U.S.-led combat operations in 2014. I demonstrate that there is clear evidence that civilian attitudes are indeed conditional on the following three-way interaction: territorial control, ethnicity, and survival. For instance, there is a notable and statistically significant distinction between Pashtuns and non-Pashtuns under Taliban control in their approval of the Afghan Government. I bring largely unused country-wide individual-level data to bear on analyzing civilian wartime beliefs. To view this research project, please click here.

BERT-Vision

Published: April 26, 2021

What compression methods can extract regularity from BERT during fine-tuning? Drawing on research that demonstrates the utility of information found across all of BERT’s layers, we propose a compression method, BERT-Vision, that captures the regularities produced by BERT during fine-tuning. BERT-Vision’s contribution is two-fold: First, we show that compression during fine-tuning can yield comparative and sometimes better performance than BERT, and second, we show that this performance is realized with a model that is 209x smaller than BERT in terms of its parameters. To view this project, please click here.

Andrew Fogarty

Posts by Tags

API

Applied Machine Learning

BERT

Casual Inference

Class Programming

DataBricks

Experiments

Geospatial Analysis

H3

Hidden Markov Models

Horovod

Linear Regression

NLP

Natural Language Processing

Petastorm

PyTorch

Python

Spark

Survey Data

Transformers