Button to scroll to the top of the page.

Events

Final Defense: Matt Guthrie
Friday, April 20, 2018, 01:00pm

Matt Guthrie, UT-Austin

"Grouping and comparing Texas high schools through machine learning and visualization techniques"

Abstract: The Texas Assessment of Knowledge and Skills (TAKS) was administered from Spring 2003 to Spring 2012 to every public school student in the state of Texas. Student test scores from the mathematics portion of the test were analyzed in this study. These standardized test scores form a large and complex data set which cannot be easily understood. Simplification of the educational system and analysis of the data in aggregate are necessary for nuanced features of the data set to be studied.

The Texas Education Agency utilizes a k-nearest neighbor algorithm with a small number of factors (e.g., the number of students relative to the largest school, the percentage of English language learners, and the percentage of students who qualify for free/reduced price lunch) to calculate a campus comparison group for every school in the state. However, there are a few challenges associated with this approach - the comparison groups are highly interconnected and asymmetric (vary from school to school). The asymmetry of the grouping does not simplify the system under study, and the asymmetry does not allow for the classification of the types of groups that are created.

In order to address these limitations, a method for forming disjoint groups of schools in Texas was developed using modern machine learning techniques (dimensionality reduction and k-means and agglomerative clustering). Test scores for students from each high school in the state were visualized and the stratification of test scores by student poverty concentration was observed. Inter-cluster comparisons allowed for the identification of a group of schools which significantly outperform other groups with similar poverty levels. Intra-cluster comparisons were conducted which identified exemplary schools based on their outstanding performance compared to that of the average performance of their cluster. In addition, the effect of students’ test language was shown to have significant effects for students changing from the Spanish version of the test to the English version. Overall, grouping schools through clustering algorithms was shown to be superior to the current method for the purposes of this study, and allowed for the identification of different types of exemplary schools in the state of Texas.

Location: RLM 11.204