Festival of Science 2016 Student Abstracts

Festival of Science 2016 was held April 29, 2016.

Josiah Bartlett Senior Year Experience, Computer Science  Lisa Torrey and Michael Schuckers
Title: Gameplay Analysis Utilizing Reactive Programming
Collecting performance data from sports gameplay can be an extremely tedious task. After information is compiled, it is in most cases double handled for analysis and graphical representation. This project entails the plotting of points (plays) on a two-dimensional image of the playing surface and binding key performance attributes to each type and occurrence of an event. This allows gameplay data to be immediately visually displayed as it is plotted in addition to storing the associated values for analysis and future use. Soccer was the game of interest for this project, the Shiny module of R was the medium for implementation and a mySQL database was the method for storing data. Shiny is a web application development library which was utilized for its reactive programming capabilities and accessibility to other data analysis functions within the R programming paradigm.

Elizabeth Escobar Senior Year Experience, Statistics  Ivan Ramler
Title: Methods and Applications of Quantile Regression Models
Quantile regression predicts changes in the different quantiles of the response variable based on changes in the predictors. This is in contrast to standard linear regression which models changes in the average response. Quantile regression allows us to analyze the complex manner in which predictors affect the response variable as we are able to consider the impact of a predictor on the entire conditional distribution of the response, not just on its conditional mean. Although it can be used in any case that linear regression is used, quantile regression is often applied by ecologists and in the analysis of growth charts and income. This presentation will introduce the concept of quantile regression and discuss how some of the common standard regression ideas, such as model selection and resampling based inference, can be incorporated into quantile regression.

Janelle Fredericks Senior Year Experience, Statistics Ivan Ramler
Title: An Introduction to Gaussian Mixture Modeling for Model-Based Clustering
There are many ways to understand how different datasets can be clustered together. First, I will provide an overview of cluster analysis by introducing the classic k-means and hierarchical clustering algorithms. Next, we will take an in depth look at mixture model clustering and illustrate how the Expectation-Maximization (EM) algorithm is used to determine how clusters are formed. Finally, I will apply mixture model clustering to a dataset from a sleep study conducted by two St. Lawrence Psychology professors (Serge Onyper and Pamela Thacher) and use the derived clusters to examine happiness, depression, anxiety, and stress ratings of the students.

Curtis Hurlbut Senior Year Experience, Statistics  Ivan Ramler
Title: Exploring Robust Alternatives to Least Squares Regression
Robust regression is a form of regression analysis designed to not be overly affected by violations of assumptions. Robust regression models are not as sensitive to outliers as ordinary least squares estimates. In the case of the presence of outliers, least squares estimation is inefficient and can be biased, in which case robust regression is a viable alternative. This poster will introduce the robust regression method and apply it to an NFL dataset, as well as examine how model selection and resampling based inference is done in the robust case. 

Ruoshi Li Senior Year Experience, Mathematics  Duncan Melville
Title: Volatilities across Sectors in U.S Stock Market
The purpose of this research is to calculate and compare the beta of stocks across different sectors. In finance, the beta of a sector represents the relative volatility of the sector portfolio in relation to the market. A higher sector beta indicates a higher systematic risk of investment in that particular sector. In this project, we collected price data and market capitalization data of 484 different stocks that have been listed on S&P 500 and have been actively operated during the last nine years (2006-2014). Using this dataset as our sample and the S&P 500 stocks as the market portfolio, we calculated the beta of ten different sectors with Capital Asset Pricing Model (CAPM). As indicated by our analysis result, the beta of different sectors is all close to 1 and there is not a significant difference between sector betas. Specifically, the Consumer Staples sector exhibits the lowest beta while the Financials sector holds the highest beta, which are 0.85 and 1.13, respectively. As this research qualitatively measures the risk of investing in different sectors, we hope to construct an accurate risk profile of sector portfolios and thus provide valuable information for future investment.

Nanjiang Liu Senior Year Experience, Statistics  Robin Lock
Title: Comparing Methods for Constructing Confidence Intervals Using Simulations in R
A confidence interval (CI) is a pair of numbers, based on sample data, designed to capture the value of some population parameter. There are several different methods for constructing CI. For example, we can find a CI for proportion using a formula based on normal distribution, a bootstrap distribution of simulation in proportions, a “plus 4” adjustments to proportion, or Bayesian credible interval. We use R simulations to generate many samples from different populations and then compare the coverage rates and widths of the intervals with each method. We vary the population proportion and sample sizes to explore which methods might work best in different situations.

Xiaoying Claire Lu Senior Year Experience, Mathematics  Jim DeFranza and Natasha Komarov
Title: Counting Triangles in Graphical Realizations of Degree Sequences with Unique k-Tuples
The number of triangles in a large graph has been studied extensively, since it can be used in the study of networking and complex systems. The degree sequence S of a graph G is a list of the degrees of the vertices of G in non-increasing order. A finite non-increasing sequence of positive integers S is called a graphical degree sequence if and only there is a graph G with degree sequence S. In 2014, DeFranza et al. showed that given a graphical degree sequence that contains a unique triple, a sequence with one term repeated three times, all other terms distinct and no gaps in degree sequence, the number of triangles in the graph can be expressed as a polynomial. Our work considers the extension of the results to graphical degree sequences with one term repeated k-times. The degree sequences have highest degree 4p or 4p+3 for k greater than or equal to 1, to guarantee they are graphical. Using a specific method of construction the data collected for k verses the number of triangles, when separated k mod (2p+2) or k mod (2p+3) yields a consistent pattern.

Brooke McGraw Senior Year Experience, Statistics  Ivan Ramler
Title: Using Linear Discriminant Analysis to Predict Beer Styles
Linear discriminant analysis (LDA) is a classification technique commonly used for dimensionality reduction. LDA uses existing information to compute latent explanatory variables that maximizes separation between multiple classes. Like other classification techniques (such as linear regression), LDA can be used for predictions and to determine important variables in the model. This talk will introduce LDA and apply it to the classic Fischer’s Iris dataset as well as classifying beer styles based on home-brew recipes. LDA does a very good job classifying species correctly with the Iris dataset. Within the beer dataset, LDA correctly classifies Stouts, IPAs and American Ales, however struggles to correctly classify Light Hybrid Beers and the Belgian and French Ales.

Xuehang Pan Senior Year Experience, Statistics  Robin Lock
Title: Giving Real People Access to Big Data – Analyzing Bike Rentals in NYC
We are now in an era of “Big Data” but challenged to find ways for people to extract meaning from the data effectively. We discuss the process of data scraping, building a database, and giving a user tools to investigate the data. We build these tools using R and provide a user interface with Shiny apps. We illustrate these ideas using data from the Citi Bike service in New York City that covers 330 stations and millions of rentals in 2015.

Scarlett Qi Independent Study, Statistics  Ivan Ramler
Title: Using Support Vector Machines to predict final grades in introductory Statistics
Supervised learning is the machine learning task of inferring a function from labeled training data. Support vector machines (SVM) are supervised learning models in which data are used for classification and regression. This project uses the “Blue Jays” dataset provided by the Stat2Data package in R to illustrate the SVM algorithm. In this dataset, body measurements of captured blue jays are used to predict their sex. Moreover, I will show how the R Shiny Web Applications can be combined with SVM to build a webpage that allows Stat 113 students to input their current exam grades to predict a final grade from an SVM model running in the background.

Scarlett Qi Senior Year Experience, Statistics  Robin Lock
Title: Shiny Bayes: Developing an App to Illustrate Bayesian Inference
Bayesian inference is a statistical method for using data to update beliefs about location of a parameter. I use the R shiny package to create an interactive web app that allows users to specify a prior distribution, input data and observe the resulting posterior distribution. I will discuss how this app was developed in R studio and how it can be used to demonstrate Bayesian inference for parameters such as a binomial proportion, Poisson mean, or normal mean.

Kersey Reed Senior Year Experience, Statistics  Ivan Ramler
Title: The Effect of Principal Components Analysis on the Accuracy of Clustering Algorithms
In economics it is fairly common practice to use principal components analysis (or its close relative factor analysis) to simplify large ‘real world’ data sets. The goal of my study is to show that the use of principal components analysis prior to clustering may in fact result in a decrease in the accuracy of different clustering algorithms’ ability to properly cluster data and thus provide inferior results. I designed a simulation study to analyze the performance of different clustering algorithms before and after the use of principal components analysis as a data reduction method. Following the simulation study, I will use my results to show that the use of principal components analysis to improve the accuracy of clustering algorithms may not be the best idea in empirical studies.

Kayla Trainer Senior Year Experience, Computer Science  Ed Harcourt
Title: Utilizing Sensors on Android Devices
Over two semesters of research and testing, I have implemented a two-part Android application that both simultaneously utilizes the sensors of a device and Bluetooth connectivity. This app connects two devices and sends the sensor data of one Android device--the server of this relationship--to the other Android device--the client.  After receiving the data from the other device, the client device’s application will graph the sensor data in order to give the user a visual representation rather than raw data.  Attaining the sensor data of a device is important for applications that involve multiple cooperating devices such as geo-location, weather, positional, and orientation applications. We decided to observe the accelerometer sensors of our server device.  The application was developed using Android Studio, an Integrated Development Environment provided by Google for creating Android applications that uses Java as the main programming language and XML for the graphical layout of the application.

Christian Yarros Senior Year Experience, Computer Science  Lisa Torrey
Title: St. Lawrence University Faculty Elections: A Web-Based Software Solution
To solve St. Lawrence University’s aggravating and complicated faculty election process of previous years.  I built a faculty election website with HTML, CSS, MySQL, and PHP.  I will discuss the web development process, the component technologies and why they were necessary, and the importance of design in the user interface and database.  Finally, I will discuss future improvements and features that the faculty election website could implement.