I am a PhD candidate in Computer Science and Engineering at the University of Michigan, where I am a part of the LIT research group (part of the Michigan AI Lab), supervised by Dr. Rada Mihalcea. I earned my bachelor's degree in computer science at Grove City College in 2015 and my Master's degree from University of Michigan in 2017.

My research is in the area of natural language processing, where I am interested in word embeddings, computational social science, machine learning techniques, and multimodal problems with vision and language.

Material of Interest: Teaching Statement (pdf), Research Statement (pdf), DEI Statement (pdf), Teaching Demo (YouTube)

Publications

Analyzing Connections Between User Attributes, Images, and Text

Laura Burdick, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
Cognitive Computation, Forthcoming

This work explores the relationship between a person's demographic/psychological traits (e.g., gender, personality) and self-identity images and captions. We use a dataset of images and captions provided by approx. 1,350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. Additionally, we consider the task of predicting gender and personality using both single-modality features and multimodal features. We show that a multimodal predictive approach outperforms purely visual methods and purely textual methods. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day (Meeker M, 2014. Internet trends 2014-Code conference. Retrieved May 28, 2014).

Building a Flexible Knowledge Graph to Capture Real-World Events

Laura Burdick, Mingzhe Wang, Oana Ignat, Steve Wilson, Yiming Zhang, Yumou Wei, Rada Mihalcea, Jia Deng
Text Analysis Conference (TAC), 2019

PDF

Events and situations unfold quickly in our modern world, generating streams of Internet articles, photos, and videos. The ability to automatically sort through this wealth of information would allow us to identify which pieces of information are most important and credible, and how trends unfold over time. In this paper, we present the first piece of a system to sort through large amounts of political data from the web. Our system takes in raw multimodal input (e.g., text, images, and videos), and generates a knowledge graph connecting entities, events, and relations in meaningful ways. This work is part of the DARPA-funded Active Interpretation of Disparate Alternatives (AIDA) project, which aims to automatically build a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, building the first step of the overall system.

@article{Burdick19Building,
author = {Burdick, Laura and Mingzhe Wang and Oana Ignat and Steve Wilson and Yiming Zhang and Yumou Wei and Rada Mihalcea and Jia Deng},
title = {Building a Flexible Knowledge Graph to Capture Real-World Events},
journal = {Text Analysis Conference (TAC)},
year = {2019}
}

Identifying Visible Actions in Lifestyle Vlogs

Oana Ignat, Laura Burdick, Jia Deng, Rada Mihalcea
ACL, 2019

Oana Ignat won a Best Poster Award at the Eastern European Machine Learning Summer School 2019 for this work.

PDF Data

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present. We construct a dataset with crowdsourced manual annotations of visible actions, and introduce a multimodal algorithm that leverages information derived from visual and linguistic clues to automatically infer which actions are visible in a video. We demonstrate that our multimodal algorithm outperforms algorithms based only on one modality at a time.

@inproceedings{Ignat19Actions,
author = {Ignat, Oana and Laura Burdick and Jia Deng and Rada Mihalcea},
title = {Identifying Visible Actions in Lifestyle Vlogs},
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1643",
doi = "10.18653/v1/P19-1643",
pages = "6406--6417",
year = {2019}
}

Factors Influencing the Surprising Instability of Word Embeddings

Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea
NAACL-HLT, 2018

I wrote a blog post (Michigan AI Blog) for the general public about this work.

PDF Code Poster Slides

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.

@inproceedings{Wendlandt18Surprising,
author = {Wendlandt, Laura and Kummerfeld, Jonathan K. and Mihalcea, Rada},
title = {Factors Influencing the Surprising Instability of Word Embeddings},
pages = "2092--2102",
url = "https://www.aclweb.org/anthology/N18-1190",
doi = "10.18653/v1/N18-1190",
booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
year = {2018}
}

Named Entity Recognition and Linking with Knowledge Base. Phan Cong Minh. Diss. Nanyang Technological University, 2019. Web. January 7, 2020.

"Ideological Drifts in the U.S. Constitution: Detecting Areas of Contention with Models of Semantic Change." Abdul Z. Abdulrahim. NeurIPS Joint Workshop on AI for Social Good. 2019.

dish2vec: A Comparison of Word Embedding Methods in an Unsupervised Setting. Guus Verstegen. Erasmus University Rotterdam Master's Thesis in Econometrics and Management Science, Business Analytics and Quantitative Marketing. 2019.

Political Semantics: Methods and Applications in the Study of Meaning for Political Science. Pedro L. Rodriguez Sosa. New York University Dissertation. 2019.

"Weighted posets: Learning surface order from dependency trees." William Dyer. 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest). 2019.

"Tkol, Httt, and r/radiohead: High Affinity Terms in Reddit Communities." Abhinav Bhandari and Caitrin Armstrong. 5th Workshop on Noisy User-generated Text (W-NUT). 2019.

"Low Supervision, Low Corpus size, Low Similarity! Challenges in cross-lingual alignment of word embeddings." Andrew Dyer. Uppsala University Master's Thesis in Language Technologies. 2019.

Embeddings: Reliability & Semantic Change. Johannes Hellrich. IOS Press. August 8, 2019.

"A Metrological Framework for Evaluating Crowd-powered Instruments." Chris Welty, Lora Aroyo, and Praveen Paritosh. The seventh AAAI Conference on Human Computation and Crowdsourcing. 2019.

"Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language." Marina Santini, Benjamin Danielsson, and Arne Jönsson. 22nd Nordic Conference on Computational Linguistics. 2019.

"A Framework for Anomaly Detection Using Language Modeling, and its Applications to Finance." Armineh Nourbakhsh and Grace Bang. 2nd KDD Workshop on Anomaly Detection in Finance. 2019.

"Estimating Topic Modeling Performance with Sharma–Mittal Entropy." Sergei Koltcov, Vera Ignatenko, and Olessia Koltsova. Entropy. 2019, 21(7), 660.

"Data Shift in Legal AI Systems" Venkata Nagaraju Buddarapu and Arunprasath Shankar. Workshop on Automated Semantic Analysis of Information in Legal Text (ASAIL). 2019.

"Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection." Johannes Hellrich, Sven Buechel, and Udo Hahn. Workshop on Language Technologies for the Socio-Economic Sciences and Humanities (LaTeCH-CLfL). 2019.

"Investigating the Stability of Concrete Nouns in Word Embeddings." Bénédicte Pierrejean and Ludovic Tanguy. International Conference on Computational Semantics. 2019.

"Can prediction-based distributional semantic models predict typicality?" Tom Heyman and Geert Heyman. Quarterly Journal of Experimental Psychology. 2019.

"Density Matching for Bilingual Word Embedding." Chunting Zhou, Xuezhe Ma, Di Wang, and Graham Neubig. NAACL-HLT. 2019.

"CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling" Felipe Viegas, Sérgio Canuto Christian Gomes, Washington Luis, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 2019.

Computational approaches for German particle verbs: compositionality, sense discrimination and non-literal language. Maximilian Köper. Diss. Universität Stuttgart, 2018. Web. November 26, 2018.

"Transparent, Efficient, and Robust Word Embedding Access with WOMBAT." Mark-Christoph Müller and Michael Strube. COLING: System Demonstrations. 2018.

"What’s in Your Embedding, And How It Predicts Task Performance." Anna Rogers, Shashwatch Hosur Ananthakrishna, and Anna Rumshisky. COLING. 2018.

"Analyzing Hypersensitive AI: Instability in Corporate-Scale Machine Learning." Michaela Regneri, Malte Hoffmann, Jurij Kost, Niklas Pietsch, Timo Schulz and Sabine Stamm. IJCAI-ECAI Workshop on Explainable AI. 2018.

"Subcharacter Information in Japanese Embeddings: When Is It Worth It?" Marzena Karpinska, Bofang Li, Anna Rogers, and Aleksandr Drozd. Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP (RepL4NLP). 2018.

Entity and Event Extraction from Scratch Using Minimal Training Data

Laura Wendlandt, Steve Wilson, Oana Ignat, Charles Welch, Li Zhang, Mingzhe Wang, Jia Deng, Rada Mihalcea
Text Analysis Conference (TAC), 2018

PDF

Understanding current world events in real-time involves sifting through news articles, tweets, photos, and videos from many different perspectives. The goal of the DARPA-funded AIDA project is to automate much of this process, building a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, and we are building the first step of the overall system. Given raw multimodal input (e.g., text, images, video), our goal is to generate a knowledge graph with entities, events, and relations.

@article{Wendlandt18Entity,
author = {Wendlandt, Laura and Steve Wilson and Oana Ignat and Charles Welch and Mingzhe Wang and Jia Deng and Rada Mihalcea},
title = {Entity and Event Extraction from Scratch Using Minimal Training Data},
journal = {Text Analysis Conference (TAC)},
year = {2018}
}

Multimodal Analysis and Prediction of Latent User Dimensions

Laura Wendlandt, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
SocInfo, 2017

PDF Code Poster Slides

Humans upload over 1.8 billion digital images to the internet each day, yet the relationship between the images that a person shares with others and his/her psychological characteristics remains poorly understood. In the current research, we analyze the relationship between images, captions, and the latent demographic/psychological dimensions of personality and gender. We consider a wide range of automatically extracted visual and textual features of images/captions that are shared by a large sample of individuals (N ~ 1,350). Using correlational methods, we identify several visual and textual properties that show strong relationships with individual differences between participants. Additionally, we explore the task of predicting user attributes using a multimodal approach that simultaneously leverages images and their captions. Results from these experiments suggest that images alone have significant predictive power and, additionally, multimodal methods outperform both visual features and textual features in isolation when attempting to predict individual differences.

@inproceedings{Wendlandt17Multimodal,
author = {Wendlandt, Laura and Rada Mihalce and Ryan L. Boyd and James W. Pennebaker},
title = {Multimodal Analysis and Prediction of Latent User Dimensions},
booktitle={International Conference on Social Informatics},
pages={323--340},
organization={Springer},
year = {2017}
}

"Do Machines Replicate Humans? Toward a Unified Understanding of Radicalizing Content on the Open Social Web." Margeret Hall, Michael Logan, Gina S. Ligon, and Douglas C. Derrick. Policy & Internet. 2019.

"Detecting and classifying online dark visual propaganda." Mahdi Hashemi and Margeret Hall. Image and Vision Computing 89 (2019): 95-105.

Author Profiling in Social Media with Multimodal Information. Miguel Ángel Álvarez Carmona. Diss. Instituto Nacional de Astrofísica, Óptica y Electrónica, 2019. Web. March 28, 2019.

Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences

Jacob Abernethy, Cyrus Anderson, Chengyu Dai, John Dryden, Eric Schwartz, Wenbo Shen, Jonathan Stroud, Laura Wendlandt, Sheng Yang, Daniel Zhang
Bloomberg Data for Good Exchange, 2016

PDF

Performing arts organizations aim to enrich their communities through the arts. To do this, they strive to match their performance offerings to the taste of those communities. Success relies on understanding audience preference and predicting their behavior. Similar to most e-commerce or digital entertainment firms, arts presenters need to recommend the right performance to the right customer at the right time. As part of the Michigan Data Science Team (MDST), we partnered with the University Musical Society (UMS), a non-profit performing arts presenter housed in the University of Michigan, Ann Arbor. We are providing UMS with analysis and business intelligence, utilizing historical individual-level sales data. We built a recommendation system based on collaborative filtering, gaining insights into the artistic preferences of customers, along with the similarities between performances. To better understand audience behavior, we used statistical methods from customer-base analysis. We characterized customer heterogeneity via segmentation, and we modeled customer cohorts to understand and predict ticket purchasing patterns. Finally, we combined statistical modeling with natural language processing (NLP) to explore the impact of wording in program descriptions. These ongoing efforts provide a platform to launch targeted marketing campaigns, helping UMS carry out its mission by allocating its resources more efficiently. Celebrating its 138th season, UMS is a 2014 recipient of the National Medal of Arts, and it continues to enrich communities by connecting world-renowned artists with diverse audiences, especially students in their formative years. We aim to con tribute to that mission through data science and customer analytics.

@inproceedings{Abernethy2016Data,
author = {Abernethy, J. and C. Anderson and C. Dai and J. Dryden and E. Schwartz and W. Shen and J. Stroud and L. Wendlandt and S. Yang and D. Zhang},
title = {Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences},
booktitle = {Bloomberg Data for Good Exchange},
year = {2016},
}

"The Michigan Data Science Team: A Data Science Education Program with Significant Social Impact." Arya Farahi and Jonathan C. Stroud. IEEE Data Science Workshop. 2018.