Presidential Fellows in Data Science


How do we Interpret Climate Change with Big Data?

James Ascher, English

Bommae Kim, Psychology


James and Bommae study the ways people make arguments using data and about data. Popular media often treats data as unbiased, raw, and empirical but, in fact, all data is grounded in subjective observation and assumptions. Aggregated data compounds the role of these assumptions which may confirm each other across data sets. When we try to make claims using this data, especially across time and space, we are trying to do something much different from direct observation or traditional arguments. Understanding claims about data requires new modes of discourse and analysis.

 In this project, they will investigate discourse around climate change, where environmental data play a central role in arguments. Specifically, they are interested in rhetorical patterns in how climate change contrarians argue against climate change. They are qualitatively analyzing a small set of arguments based on traditional rhetoric and using this to apply text mining methods to a larger volume of online publication about climate change.

Bommae Kim

Bommae Kim is a fifth year Ph.D. student in quantitative psychology. Her research interests focus on the intersection of social science, statistics, and computer science. Her work is motivated by the fact that while data are being generated by users and machines constantly, such data are characteristically different from the traditional data systematically collected by researchers. As a methodologist with a background in social science, she seeks to take advantage of this non-traditional data for social and behavioral research. In her dissertation, she is working on a new data collection and analysis framework using smartphones in order to utilize behavioral patterns that are stored in smartphones, yet also protect user privacy. In this Data Science project, Bommae will apply text mining methods in online publications about climate change. Further, she hopes to expand this project to understand how people comprehend data-based arguments in the future.

James Ascher

Doctoral candidate in the English Department who studies 18th century literature and bibliography, James puts media studies, textual criticism, ephemera, and paperwork history under the big-tent of bibliography. He’s interested in resistance to the Enlightenment, epistolary communities, and the history of features in books, as well as the light they shed on clandestine publishing, the history of science, and the rhetoric of data. He is additionally a 2015 Praxis Fellow.

Previously, he was Assistant Professor in the Libraries and in English at the University of Colorado Boulder where he cataloged rare books, taught book history, and directed the ScriptaLab colloquium. His published work includes bibliographical methods, issues in diplomatic transcription, and processes for collection surveys. He also serves as a lab instructor teaching descriptive bibliography at the Rare Book School at the University of Virginia.

Exploring the Effects of Violence Against Women on Health Care Utilization and Academic Performance

Xiaoqian Liu, Data Science Institute

Colleen Sanders, Nursing


Sexual violence and Intimate Partner Violence have garnered national attention during the past few years. Even though College aged women are highly vulnerable to violence, there is little research on the relationship between violence and academic progress. This research will propose a comprehensive way to measure academic performance by collecting data from emergency room, student enrollment system (SIS) and a course evaluation website. In addition to traditional regression and classification methods, this research will also apply distance metric learning methods in discovering a set of evaluation metrics of academic progress and testing hypotheses on the impact of abuse history and health care utilization. Furthermore, this research aims to automatically extract geo-temporal information relevant with sexual assaults from text documents such as online news reports and emergency records. Findings will assist the development of evidence-based policies to promote the physical, psychological and social health of victims and foster their academic progress.

Xiaoqian Liu

Xiaoqian Liu is a second-year M.S. student in Systems Engineering. Her research interests are machine learning and text mining. She is working on analyzing and tracking the impacts of sexual abuse history on academic progression of college students. In this project, she is mainly working on data preprocessing and modeling. She will extract assault information from text documents and feed them into the model. Also she will work with her team partner on defining the features of academic performance.

Colleen Sanders


A Data Driven Approach for Uncovering Patterned Life Cycles of Ideas and Methods within the Sciences

Claire Maiers, Sociology

Nicholas Napoli, Systems and Information Engineering


Our project seeks to uncover the patterned ways by which ideas move through social systems.  We approach our analysis through an examination of the ways in which concepts and methods diffuse, are sustained, and decline within academic fields.  We will track the prevalence and movement of concepts and methods through a textual analysis of journal articles contained within the JSTOR database.  This database contains over 4.5 million academic articles that date between 1900 and 2015.  We will explore two different methods for tracking concepts and methods over time within this corpus.  First, we will use a supervised approach to identify words and concepts of interest.  We will compare these results with an unsupervised topic modeling approach.  Through this analysis we will develop a typology of the patterns by which concepts and methods move through individual disciplines and spread across disciplines.

Nicholas Napoli

Nicholas Napoli is a doctoral student in Systems and Information Engineering at the University of Virginia. His expertise in signal processing, statistical learning, and pattern recognition provides a novel approach for examining these social systems.  Currently, Napoli is developing tracking and detection methods in order to describe how concepts diffuse and move over time. He believes that if we are able to classify and understand why specific concepts are able to prevail and diffuse into other disciplines, we can develop approaches to further impactful research in the future.   

Napoli’s research interests are broad, and his work ranges from neuroscience to communication systems,to theoretical work in the field of probability.   For his dissertation, Napoli is working with NASA to detect degrading cognitive performance in pilots by examining their physiological signals.

Claire Maiers

Claire Maiers is a doctoral candidate in Sociology at the University of Virginia.  She brings knowledge of the current sociological literature that focuses on the spread and development of ideas to the project.  Maiers will develop methods for selecting search criteria required for supervised approaches in addition to guiding investigations of historical context relevant to significant results of the data analysis.   For her dissertation, Maiers examines data science as a mode of knowledge production.  Through a comparative study of medical informatics, predictive policing and security, and consumer research, her dissertation explores the constraints and affordances of data science practices.  In addition to considering how such practices influence knowledge processes and products, her work also brings attention to the variation between developer and user contexts.  

Multi-Agent Modeling and Analysis of Large Scale Brain Networks with a Big fMRI Data Set

Marlen Gonzalez, Psychology

Shize Su, Electrical and Computer Engineering

Qiannan Yin, Statistics


This project will analyze large-scale brain networks involved in the social regulation of emotion using both statistical methods and engineering tools. Using data from a social support functional neuroimaging study, we will model the brain as a dynamic network with nodes referring to different brain regions and lines representing the interactions between each pair of brain areas. Currently, the available tools in the literature can help us look at activations in individual brain areas, but not a holistic view of the interactions between them. This project will use the functional data analysis methods in statistics to estimate and evaluate the interactions between any pair of brain areas. However, because brain networks include a large set of data points that also change over time, computation using this method alone is not feasible. We will therefore also use effective engineering tools to help prune data to the more important interactions between nodes, cluster these interactions into functional networks, and meaningfully model the interactions between these networks as it matches with behavior.  Putting all of this together, the project will not only create a new method to analyze dynamic brain networks, but will also contribute to our understanding of how the brain processes the receipt of social support in a holistic and dynamic manner.  This important finding would also help us understand how social support confers many health benefits to individuals.

Marlen Gonzalez

Marlen is a 5th year PhD student in the psychology department working on affective neurosicence questions with Dr. James A. Coan. Her research focuses on how developmental context shapes neural endophenotypes of important constructs such as vigilance, reward sensitivity, and emotion regulation. Her recent work has focused on how lower adolescent neighborhood quality is associated with increased neural activation to rewards and to experiences of ostracism. She also works on an epigenetics study looking at the associations between genotype and methylation of the oxcytocin receptor, life history, and adult neural reward sensitivity. In this project, she is responsible for generating the topical questions on the social regulation of emotion, communicating the experimental manipulations behind the data, and interpreting the finds from the developed methodology. 

Qiannan Yin

Qiannan Yin, a 3rd year PhD student in Department of Statistics, is working on the development of new statistical models and algorithms for analyzing high-dimensional human brain data. She has developed a Bayesian model to do inference on cluster-structured high-dimension ordinary differential equations with applications to an auditory electrocorticography dataset under the guidance of her advisor. She is also interested in statistical computing and using statistical knowledge and properties to simplify the required computation for statistical models. In this project, she is primarily in charge of developing the statistical model to estimate the interactions between each pair of brain areas. She uses time series functions to estimate the fMRI data, and adopts cubic B-spline bases to estimate the parameters in the functions given the observed data. In the end, she tries to extracts some subject-specific characteristics to be further studied and interpreted by other team members for the interactions in the brain network.

Shize Su

Shize Su, a 4th year PhD student in the Department of Electrical and Computer Engineering, is mainly working on large scale network modeling and analysis research, under the guidance of his advisor. He is interested in and good at applying data mining and machine learning skills to Big Data application projects. In this project, he is primarily in charge of developing and applying engineering multi-agent network theory to model and analyze the patterns and behaviors of the large scale dynamic brain network, given that each edge of the network could be computed by Qiannan’s developed statistical model for interaction between each pair of brain areas. He is also working on utilizing engineering algorithms to effectively deal with the large data size problem to address the computational cost concern in this project.

Violent Beliefs, Violent People?  Data Driven Analysis of Religious Belief and Social Action in Heian Japan

Why did Buddhist institutions fight other Buddhist institutions in pre-modern Japan? Like other instances of religious violence, this phenomenon is hard to understand with any intellectual approach, but is especially difficult from an empirical perspective.  Conventionally, attempts to understand this and similar situations have relied on human collection and interpretation of a vast amount of data.  Machine learning tools could make this process more efficient and accurate, but are constrained by the need to have data in a clean, monolithic format.  In actuality, the information relevant to religious violence exists in a much more complex, relational format.  Our project is to create a new model that uses both the traditional data-driven methods used by data scientists and the contextual narratives used by humanities scholars to better model and understand complex social phenomena.

Emily Thomas

Emily is interested in religious violence. Some questions that motivate her include: What religious symbols and language are used to justify group violence? Do religious doctrines directly cause religious violence? Is there a connection between violence and the divine? If so, what is it? She is specifically interested in these questions as applied to Japan. Through this project, Emily has the opportunity to explore these questions in pre-modern Japan. This is an especially interesting context because Buddhist temples would attack Buddhist temples, while these same temples would use symbols of the native kami 神(“spirits”) to protest in the capital. Currently, she is finishing her M.A. in East Asian Studies at UVa, and plans to move to Japan after completing the program. Ultimately, she aims to get her Ph.D. and then teach on Japanese religion and religious violence.

Alex Pape

Alex is interested in identifying and developing means of data-driven analysis that are more relevant to socially-oriented problems. Rather than being limited by a paucity of information or insufficient computing power, data science is frequently held back simply by its own assumptions. Motivated in part by Dan Dennett’s aphorism that "There is no such thing as philosophy-free science, just science that has been conducted without any consideration of its underlying philosophical assumptions,” Alex is interested in how critical examination of the frameworks within which data science is done can reveal new and more relevant experimental methodologies. Specifically, Alex is currently working on exploring alternative assumptions about data ontology; that is, looking at assumptions about the structure within which the data can or should be represented. Alternative ontological assumptions can be used to form new hypotheses that are more germane to important social questions than their conventional counterparts and yet remain rigorously testable with the tools of data science.

Data Driven Design of Movement and Sound

Lin Bai, Electrical and Computer Engineering

Jon Bellona, Music   

Pictured here with Luke Dahl, faculty advisor, Music


Robots are usually thought of as tools with purely functional movements, however, all movements contain expression, which also enhances function. Our project is motivated by improving the perception of qualitative, expressive robotic movement. By capturing and analyzing human movement and sound data and studying the correspondences that articulate how humans move, the project will develop robotic movements synchronized with perceptually designed sounds as a way to make robotic movements more expressive and thus increase the level of human perception of quality in robotic movement. The improved feature variation of robotic movement makes robots more functional as well.

Our project combines the fields of engineering and music to accomplish this task. The ‘timing’ and ‘quality’ features will be extracted from synchronized sound and movement data with control algorithms and signal analysis tools. The project will study how sonic features map in accordance with features of movement. By improving the available variation that is algorithmically defined in a robot's movement while supplementing the movement with perceptually driven sounds, the project will enhance the quality and function of robotic movement. These advancements will make robotic movements more expressive and relatable to human perception, enabling the possibility for robots to carry out more functions in more diverse fields, thus, making robots better tools.

Jon Bellona

Jon Bellona is an intermedia artist/composer who specializes in digital technologies. (  Jon’s research includes data-driven control of electronic music performance; constructing musical spaces for deep listening; and collaborating with other disciplines to inform how we create musical experiences. With the current project, Jon's interests lie with the investigation of sound-movement correspondences, with unpacking how visual information informs sound perception, and with developing music compositions or a set of tools for composition derived from movement data.  Jon’s music and intermedia work have been shown internationally including Kyma International Sound Symposium (KISS); Society for Electro-Acoustic Music in the United States (SEAMUS); International Computer Music Conference (ICMC); with special performances at the Casa da Musica (Porto, Portugal) and CCRMA (Palo Alto, CA). Jon is currently pursuing his Ph.D. in Composition and Computer Technologies (CCT) at the University of Virginia and is part of the art collective, Harmonic Laboratory (


Lin Bai

Lin Bai is a Ph.D. student in the Robotics, Automation, and Dance (RAD) Lab and the Department of Electrical and Computer Engineering. Her research interests involve control algorithm designs to ensure desired system performances, system modeling, optimization and estimation methods. She has been working on improving the variation in robotic movement to make robots more expressive when interacting with humans and developing tools to make robots easier to use. Lin designed methods to make sure the robotic motion trajectories generated by high-level controllers are executable on physical robotic platforms and implemented the scheme on Rethink Robotics Baxter Research Robot. Her work interfaces with a web-based robotic application. As a result of this work, users can choose movement qualities and sequence of motion through an intuitive web-interface to easily control the robotic movements. This project will further improve robotic movement in human perception perspective as well as functionality of robots. Lin hopes this project will help catalyze the integration of robots into human society.

Applying Machine Learning to Text Communications to Model Suicide Risk in Real-time

Jeffrey Glenn, Psychology

Alicia Nobles, Systems and Information Engineering


Suicide is a serious and ongoing public health issue. However, the primary method for assessing acute suicide risk relies on patients’ self-report and clinicians’ judgments, which has been shown not to be accurate. In this study, we will apply text mining techniques to social media and personal communications to identify features indicative of heightened suicide risk.  This study is novel because it will apply text analytic approaches to communications temporally, allowing not only for prediction of suicide risk, but for insight into how these predictive attributes change as an individual draws closer to their suicide attempt. Such knowledge may lead to passive monitoring tools that can be utilized by patients and clinicians to more objectively assess level of risk and ultimately reduce rates of suicide.


Alicia Nobles

Alicia is a second year PhD student in Systems & Information Engineering. She recently changed careers from civil engineering to pursue her long-standing personal interest in public health, especially accessibility to medical care, women’s health, and mental health. She is interested in using computational health informatics, specifically machine learning and data mining, to create novel interventions and advance health policies. In this study, she is excited to expand her technical skills to include text analytics while acquiring more knowledge about mental health through collaboration with clinical psychology. In addition to this study, she is currently working on assessing the data quality for the College Health Surveillance Network, assessing health insurance literacy of college students, and identifying parental attributes associated with the vaccination of teenagers against HPV.

Jeffrey Glenn

Jeff is a fourth year Ph.D. student in Clinical Psychology. He is interested in understanding how cognitive and emotional processes contribute to psychopathology with a specific interest in the roles of future thinking, decision making, and implicit cognition in suicide.  This project provides an exciting opportunity for him to develop experimental and analytic skills in machine learning and data mining through an interdisciplinary collaboration. He hopes this research contributes to the development of novel, data-driven assessment tools for suicide prediction and prevention. Prior to arriving at UVA, Jeff worked as a research assistant for Dr. Matthew Nock at Harvard University. He holds bachelor’s degrees in English and Human Biology from Stanford University, a Master’s degree in Mind, Brain & Education from Harvard University, and a Master’s degree in Clinical Psychology from UVA.