Festival of Science 2017 Abstracts
Festival of Science 2017 Abstracts
Mathematics, Computer Science and Statistics Department
Madison Rusch "APR2.1: A Google Chrome Extension"
Advisor: Choong-Soo Lee, Computer Science
As my Senior Year Honors Project, I worked on improving the end-user experience of APR2. APR2 is currently used by students, faculty, and staff, making this an ideal project to reach a wide audience within the St. Lawrence community. Using JavaScript, HTML, and CSS, I built and debugged a Google Chrome extension to augment the APR2 website. My extension provides a course catalog with additional features to the current website in an attempt to make the catalog more user-friendly and navigable. Throughout the project, I improved my understanding of web programming and computer networks as well as my ability to code. The goal of this extension is to help the St. Lawrence community utilize the APR2 website to its fullest potential.
Mary Beth Benzing "Sentiment Analysis of Amazon Customer Reviews"
Advisor: Ivan Ramler, Statistics
Since many people are transitioning to electronic commerce shopping options, like Amazon, customer product reviews play an increasingly important role in future customer purchasing decisions. Analyzing the sentiment of customer reviews is a useful method to address the overall feeling and emotion the reviewers possess about a particular product. I will begin by providing a graphical overview of the results of the sentiment analysis. Then, I will discuss the results of the Ordinal Logistic Regression, where I am able to predict the rating score (1-5 stars) based off the average sentiment score of the review.
Maxime Bost-Brown "A Sentiment Analysis: Star Wars versus Star Trek"
Advisor: Ivan Ramler, Statistics
For several decades, science fiction fans have been waging in the war of Star Wars versus Star Trek – which is better? This research investigates one aspect of the debate by analyzing the scripts of each of the movies (excluding animated movies). We use the R-language and the syuzhet package to calculate the sentiment scores (i.e., the difference in the number of positive and negative words) for each script. We compare and contrast both within and across series to draw conclusions based on their sentimental impact.
Maxime Bost-Brown "Player Tracking for Division I Women’s College Hockey"
Advisor: Michael Schuckers, Statistics
The purpose of this project is to analyze data from a Division I collegiate hockey team, the St. Lawrence University (SLU) Saints. Using video footage of multiple games from the 2016-17 season, students from the St. Lawrence University Sports Analytics Club recorded shot attempts by SLU women’s team. For each shot, several metric were recorded including shooter, outcome and (x,y) location. In this poster we will present some visualizations and results from this project.
Morgan Darby "Investigating Lyrics through Stylometric Techniques"
Advisor: Ivan Ramler, Statistics
This study investigates whether creativity of popular artists has changed over the past decade using stylometrics. After web scraping all lyrics for songs by Beyoncé, Justin Bieber, Adele, Drake, Kanye, and Taylor Swift, style measurements are used to understand the structure of the lyrics. For example, type/token ratios (TTR), which measure the total number of different words occurring divided by total number of words, can give insight about the structure of lyrics. TTR and other stylometric measures are used to compare and contrast lyrics both within and across artists.
Elsa Fecke "Who’s in the Money? Using a Random Forest to Predict Performance in a Horse Race"
Advisor: Robin Lock, Statistics
The goal of this project is to apply a Random Forest algorithm to a thoroughbred racing dataset in order to predict the placement of horses in a future race. The predictor variables are collected from a daily racing form that includes information such as post position, morning odds, previous workouts, and past performances. Since the data is in XML format, the first step of this project consists of data scraping the XML files and extracting the desired variables into a data frame. The Random Forest procedure uses this data frame to grow many classification trees, where each tree is based on a random subset of predictor variables. We then use a majority vote to assess the chances that a specific horse will place in the top three.
James Holley-Grisham "Developing a Sentiment Analysis to Improve Selection of NFL Draft Pick"
Advisor: Ivan Ramler, Statistics
In this poster, we try to improve the selection of players form the NFL draft. To do this, we use web scraping and text analytics to collect scouting reports on players who were potential NFL draft picks. These reports include the pros and cons of possible draft picks from the Walter Football Scouting Report. Using the idea of sentiment analysis, we developed a classification model to produce a score based off of the sentiment of the articles from the scouting report. For better results, the players were separated by position, due to the fact that the different positions require different body sizes, and skill sets.
Julia Holter "Sentiment Analysis of Individual Characters in the Works of JRR Tolkien"
Advisor: Ivan Ramler, Statistics
Considered by many to be the Father of Modern Fantasy, JRR Tolkien’s works have had tremendous impact on the genre of fantasy and on the world of literature at large. His characters have spoken to readers for decades, not only for the detailed world in which their stories take place, but also for their individualism, expressed primarily through emotion. In this project I quantify those emotions surrounding specific characters in Tolkien’s two best-loved, or at least most renowned, works: The Hobbit and The Lord of the Rings. I utilize neighborhoods of varying lengths to determine the ways in which Tolkien manipulates the emotions of his characters and his readers and the extent to which emotional language fluctuates from character to character. Of particular interest is Tolkien’s emotional treatment of his female characters—notoriously overlooked—as compared to his male characters, with whom he was obviously more comfortable.
Andrew Jarombek "Building a Cross-Platform Running Application"
Advisor: Ed Harcourt, Computer Science
For my computer science senior project, I built a website and Android phone app for the St. Lawrence Cross Country and Track & Field teams. The website and app allow users to log their runs and track mileage statistics. Users can join teams, allowing them to view and comment on their teammates runs. There are group leaderboards and message boards. Users can see their workouts in a number of different formats. They can look at a monthly calendar of their runs as well as a weekly graph. Individual runs are also viewable for more details. Each run is color coded based on how the person felt during the workout, which allows them to notice trends and give feedback to the coaching staff.
The website portion of the project uses the LAMP stack (Linux, Apache, MySQL, and PHP) with a heavy use of JavaScript and CSS for dynamic pages. The database has been separated out and is accessed by a REST API. Both the Android app and the website hit endpoints on the API to get information which is used to populate their respective applications. This separation of concerns allows for the project to be easily scalable for more platforms. With the use of the API users can make changes and submit logs on their Android app and immediately see those changes reflected on the website. Future improvements for the project include added features and an iOS app.
Shihao Li "Analysis of Dialogue in "Friends"
Advisor: Ivan Ramler, Statistics
Friends, one of the most popular television shows of all time, highlights the relationships revolving around six friends living in Manhattan. Throughout the 10 seasons, the show has generated a vast sum of scripts. Measuring at almost 900,000 words, there is an ocean of information to be examined from the script. After extracting dialogues from the scripts, I implement the various statistical tools such as text analysis and sentiment analysis to inquire the intriguing questions such as: Who got the most spotlight out of the six? How did the characters develop throughout the series?
Ina Maloney "Android College Search App"
Advisor: Ed Harcourt, Computer Science
I created an Android app for students to use in their search for colleges and universities in the United States. The app allows the user to search for and add schools to their list by gathering data from the U.S. Department of Education’s College Scorecard database. That data is then made available for the student to access via each individual college’s page. A school’s page consists of an overview of the school and an embedded view of the school’s main website, which allows the student to look up more information. The final component of the app is a map that marks the various schools on the user’s list. The user can optionally add their hometown in order to view the distance between their home and the various schools. Overall, the app makes student’s college searches easier by storing all of their information in one place.
Ketura Mason "A Statistical Look at Lane Effects in the 2016 Rio Olympic Swimming Events"
Advisor: Michael Schuckers, Statistics
The Olympic Swimming events are among the most highly anticipated international sporting competitions by spectators across the globe. Questions have been raised about the impact of individual swimming pools and the lanes therein on times of swimmers. Brammer, Cornett, and Stager (2013) analyzed results from the 2013 Swimming World Championships in Barcelona and identified a pattern. Athletes who were assigned to higher-numbered lanes tended to swim consistently slower when swimming away from the starting blocks, and faster back towards them. The present study created a statistical model and used it to analyze data from four longer freestyle events at the Rio Olympics: Men’s 400m and 1500m; and Women’s 400m and 800m. Our model included terms for athletes, the heat being swum, an athlete by heat interaction, an indicator for whether or not the length was from or toward the starting blocks, and a lane effects contrast. From our analysis, we did not find significant lane effects in these events. This differs from Brammer, Cornett, and Stager (2016), who provided evidence for a lane effect. We do see some indication of a lane effect, but the magnitude of the effect is not significant.
Samantha Ormsby "Applying Sentiment Analysis to the Harry Potter Series"
Advisor: Ivan Ramler, Statistics
Sentiment analysis in literature is the process of categorizing words within a piece of text according to emotions; when complete, the categorization reveals the overall tone of the text, whether positive or negative. Through the use of R and the Syuzhet package, emotions in novels can be classified into eight basic emotions: anger, fear, joy, anticipation, trust, disgust, surprise, and sadness. This process can be applied throughout a novel and demonstrates how the tone of a novel changes through the course of plot development. Sentiment analysis can be applied to a single novel or to a series of novels, such as the Harry Potter series. Modeling the sentiment of Harry Potter books will be used to illustrate the development across the entire series. This technique is applicable to fitting other novel series of various length and number and will allow for comparisons within and across novels.
Taylor Pellerin "The Data Collection Process and Play Selection in Division-I College Football"
Advisor: Michael Schuckers, Statistics
Over the course of a summer fellowship and fall semester senior research, I downloaded a slightly dated but nonetheless massive data set, scraped more recent data and then ran multiple different regression models. The original data set I downloaded from cfbstats.com contained play-by-play stats of every NCAA Division College Football game spanning the 2005 to 2013 seasons. I then built a set of linear and ridge regression models which looked at how well the run pass decision and a few other factors did in predicting the change in expected points caused by each play of the games, where expected points is taken using a nearest neighbor approach. Nearest neighbor is taken to be the average points gained at the end of a drive for each scenario of down, distance and spot on the field. All scoring and turnover possibilities were handled, with a hefty negative weight being given to turnovers and defensive points.
With this set of models, the next step turned to gathering more data. The website that had provided the first 9 years of stats became a paid service, so scraping the rest of the data became necessary. To do this, I built an R package that, given a season schedule containing the date and teams involved in each game, produces a table of all of the play by play stats, formatted in the same way as the data provided by cfbstats.com. The actual information is pulled from the same json that are used by ESPN.com to fill out their play-by-play stats pages. This was done primarily using the jsonlite and dplyr R-packages. With this extra data, I then reran all of the original models, as well as a handful of others with new predictive factors, in order to make the analysis more robust.
Carrie Pomainville "An Emotion based Sentiment Analysis of The Hunger Games Characters and Books"
Advisor: Ivan Ramler, Statistics
How do characters connect to the book you are reading? How does their emotion connect to the book as a whole? This research utilizes text analytics and sentiment analysis by taking the Hunger Games books and analyzing them. Using R and the syuzhet package, the emotion of the book and the emotion of the characters was calculated. The emotions of the characters are compared to the emotion of the book to see how much each character deviates from the book as a whole. The methods used in this analysis can be applied to any literature in which a set of main characters appear.
Molli Richards "Auctions from a Game Theory Perspective"
Advisor: Natasha Komarov, Mathematics
This study will look at different types of auctions and how they can be analyzed from a game theory perspective. For example, in a type of auction called the second-price sealed-bid auction, the high bidder only pays the second highest bid amount. The Nash equilibrium will be key to analysis and understanding strategies. I will also look at how auctions can be modeled as non-cooperative games where bidders can have conflicting interests.
Lilly Schwarz "Stylometrics and Sentiment Analysis of The Weekend Update SNL Scripts"
Advisor: Ivan Ramler, Statistics
Sentiment analysis refers to the task of natural language processing to determine whether a piece of text contains some subjective information and what subjective information it expresses, for example whether their attitude is positive, negative or neutrals. Adversely, stylometric analysis measures the features of literary style such as sentence length, vocabulary richness, and various frequencies. Over the past 13 years of SNL episodes, 8 different screenwriters have had the job of writing the Weekend Update part of the episode. The purpose of this research is to combine both sentiment and stylometric analysis to compare and contrast “The Weekend Update” scripts on Saturday Night Live with an emphasis on the differences in authors both within and across seasons.
Eric Sweetman "ECAC Recruits: The Future of SLU Hockey Players"
Advisor: Michael Schuckers, Statistics
One of the most important tasks a NCAA Division I coach has is to recruit players for the future of their respective program. Potential recruits have many attributes that coaches look at such as; positioning, strength, speed, goals, and assists. In this model, we are answering the question of who is a worthy candidate for the ECAC (Eastern College Athletic Conference) based on their “feeder” league? A feeder league is the place where players develop before they enter collegiate division I ice hockey. For example these leagues can be the USHL (United States Hockey League), NAHL (North American Hockey League), AJHL (Alberta Junior Hockey League), BCHL (British Columbia Hockey League), CCHL (Central Canada Hockey League), CHL (Central Hockey League), CISAA (Conference of Independent Schools Athletics Association), CJHL (Canadian Junior Hockey League), European teams, or EJHL (Eastern Junior Hockey League). Too see possible future ECAC players we will use a multiple linear regression analysis, where the response is goals per game in the NCAA. This means we are predicting the number of goals a possible recruit will score in division I ice hockey in the ECAC. This will help give coaches a better sense of which league and players to spend more time on scouting.
Leif Swenson "Understanding Design Principles and Procedures for Creating Consumer Level Virtual Reality Software"
Advisor: Ed Harcourt, Computer Science
As an avid consumer of modern technology, I chose to extend my Senior Year Seminar class that focused on software development for the Android mobile operating system to a field that has many applications beyond on-the-go computing. Virtual Reality (VR) is an emerging field that is currently being realized and achieved by several major technological entities. As such, the process for generating media on platforms powered by VR has not yet been normalized, and resources for learning how have not become as widely available as other aspects of mobile development. This project investigated the process behind the planning, implementation, and execution of creating a virtual reality application targeted for Android mobile phones. Over the course of half a year, I researched the inner workings of creating digital graphics by learning OpenGLES 2.0 and how it is integrated natively in the Android development environment, followed by utilizing Unity, a game design engine, to streamline the process and avoid the arduous process of creating graphics by hand. Simultaneously, this project aimed to explore different services and information that the Android OS provides to the developer, specifically by creating an SMS texting application that could be displayed by a virtual reality application. My results include a functioning “beta” application that is prepared to be deployed on the Google Play Store, as well as a detailed set of instructions for anyone looking to develop software for the GVR platform themselves, including my own project to structure future endeavors on, and warnings about common places that one might run into trouble.
John Tank "QMJHL Rinks and their Effects"
Advisor: Michael Schuckers, Statistics
Using data from the QMJHL, we constructed a dataset containing tens of thousands of individual game statistics for many junior hockey players from multiple recent seasons. The data consists of goals, assists, shots, and plus/minus that each player tallied in every game played throughout the course of a season. Using the Schuckers-Macdonald (2014) model for rink effects of NHL teams as a guide, a new log-linear regression model was constructed to investigate how rinks affected individual players instead of teams. This new model used the variables of player strength, team strength, opposing team strength, home/away, and rink played in. Like the Schuckers-Macdonald model, this model was used to predict outcomes, such as how many shots a player would record throughout a game, giving us a sense of how specific rinks affected junior hockey players’ performance.
Yunjia Wang "Applying Cluster Analysis to Festival of Science Abstracts"
Advisor: Ivan Ramler, Statistics
Cluster analysis focuses on grouping and comparing characteristics of a set of objects, which determines that objects in one group are similar to each other and different to those in other groups. This project applies cluster analysis to text mining, which derives information by analyzing text. The Festival of Science is an event for SLU Senior science students to present their Senior research projects. Prior to the event, students submit an abstract to the conference. Festival of Science abstracts from 2014 to 2017 are clustered and analyzed to find connections between project abstracts, major departments, and the sponsored faculty.
Zhongxin Wang "Statistical Methods for Stock Picking and Portfolio Construction"
Advisor: Robin Lock, Statistics
Quantitative hedge funds use data and statistical methods to make decisions about stock purchases and portfolio development. For example, two common strategies are momentum (betting that “winner” stocks will continue to do well and the “loser” stocks will continue to underperform) and mean-reverse (betting that “winners” are ready to fall and “losers” are ready to recover). We will discuss aspects of these strategies and use simulations to evaluate performance and assess risk.
Palchen Wangchuk, Seung Hyun Lee "Android: Using Technology to Enhance Learning Efficiency"
Advisor: Ed Harcourt, Computer Science
For our final SYE project, we created an Android app that will help make studying around campus, easier for students. We used Android studio to write our code and Nexus 7 tablets to run our app. With our app, users will be able to “check in/out" of the many different study rooms on campus, notifying other users about the availability of the study rooms. We implemented a basic client-server relationship to update and connect the individual users to one another. We used Google Maps for a heavy component of our app. A user is only allowed to change the status of a study room if they are in the room. Our app will use two main layouts, one that displays the actual map of where the user is and another layout that shows a list of the study rooms and their availability. Our plan is to start with Madill Science Library and map out the study rooms on each floor. Hopefully, after we have finished this we will just have to replicate our code and apply it to different buildings around campus (ODY, Hepburn, Carnegie, Johnson, etc.) and make our app include more of the St. Lawrence campus. Our goal with this app is to create an easier and more convenient way for students to study around campus by getting rid of the hassle of physically walking around campus to find a place to study.
James White "A Shiny App for Investigation of NHL Draft Pick Career Trajectories"
Advisor: Michael Schuckers, Statistics
The National Hockey League (NHL) uses an annual draft to allocate available players to its member teams. For teams making these selections, an understanding of how past selections have performed is helpful for future planning. Using data collected from hockey-reference.com on draft classes from 1998 to present, we have built an R Shiny application that allows for the comparison of players based upon the player’s selection. We average together annual performance in the NHL using Games Played and TOI for the user to get a better sense of how draft selection effects playing time by year since they were drafted. Users can manipulate the range of picks they would like to compare along with what draft classes they would like to see. More specifically the application also details the average number of minutes players played in each year of their career.
Shuai Xiao "Android Development: Gaming in Reality"
Advisor: Ed Harcourt, Computer Science
Video games have been one of the most popular stress gateways for coping with our dramatically changing world, but they also have the reputation for bringing gamers away from reality. Hence, the goal of my app, Baller, is to combine basketball games into our real world. We want users to spend time and have fun with each other face to face instead of screen to screen.
The nickname for my app is “the tinder for hoopers”, which implies that the benefits received after using the app rather than during the time when they use the app; I followed this strategy to design this app. The app is consisting of multiple activities for functionalities such as creating accounts and interacting with other users. There is also a client-server part for storing users’ information and responding to users’ actions. Precisely speaking, Baller presents as a fundamental structure of a game rather than a real game itself, but it is crucial to understand that this feature makes the concept “games in reality” happen. Users need to go through the routine, creating their accounts. After this, users can choose their positions Point Guard, Shooting Guard, Small Forward, Power Forward, and Center. In addition, they need to put in their basic information to generate a profile. The profile helps users to find ideal teammates match in the future if needed. Once users have basic information set up and the location is chosen, they can choose their play mode including 1 on 1 (solo), 3 on 3 (half court), and 5 on 5 (full court) to play basketball with other users in real world. The app also tracks users’ statistics as long as they are consistent about adding their statistics. Baller is not only an app but also a game model which provides a framework to users so they have the freedom to choose the settings of a game.
Huazhen Zhao "The Determinant of the Adjacency Matrix of a Graph"
Advisor: Jim Defranza, Mathematics
Let G be a simple graph. The adjacency matrix A = (aij ) of G with vertex set V (G) and edge set E(G) is defined as aij = 1 if there is an edge between vertices vi and vj and aij = 0 otherwise. The determinant of G is det(A). The determinant of a graph is used in various contexts describing properties of the graph. We consider the problem of finding the determinant of a class of matrices. The class of matrices we focus on are the adjacency matrices of UT graphs, graphs whose degree sequence contains a unique pair of repeated terms, with all other terms distinct. Our objective is to develop an algorithm and obtain a formula for the determinant of the adjacency matrix of UT graphs.