In January 2021, I will be joining the Computer Science and Engineering department at the University of Michigan (U-M) as a Lecturer III.

I finished my Ph.D. in 2020 in the Computer Science and Engineering department at U-M, where I was a part of the LIT research group (part of the Michigan AI Lab), supervised by Dr. Rada Mihalcea. I earned my bachelor's degree in computer science at Grove City College in 2015 and my Master's degree from U-M in 2017.

My research is in the area of natural language processing, where I am interested in word embeddings, computational social science, machine learning techniques, and multimodal problems with vision and language.

Outside of work, I love spending time with family and friends, crocheting blankets, taking long walks outside, and reading novels.

Publications

Ph.D. Dissertation: Understanding Word Embedding Stability Across Languages and Applications

Laura Burdick
August 2020

Slides Code (Ch. 3) Code (Ch. 4) Code (Ch. 5)

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this thesis, we consider several aspects of embedding spaces, including their stability. First, we propose a definition of stability, and show that common English word embeddings are surprisingly unstable. We explore how properties of data, words, and algorithms relate to instability. We extend this work to approximately 100 world languages, considering how linguistic typology relates to stability. Additionally, we consider contextualized output embedding spaces. Using paraphrases, we explore properties and assumptions of BERT, a popular embedding algorithm.

Second, we consider how stability and other word embedding properties affect tasks where embeddings are commonly used. We consider both word embeddings used as features in downstream applications and corpus-centered applications, where embeddings are used to study characteristics of language and individual writers. In addition to stability, we also consider other word embedding properties, specifically batching and curriculum learning, and how methodological choices made for these properties affect downstream tasks.

Finally, we consider how knowledge of stability affects how we use word embeddings. Throughout this thesis, we discuss strategies to mitigate instability and provide analyses highlighting the strengths and weaknesses of word embeddings in different scenarios and languages. We show areas where more work is needed to improve embeddings, and we show where embeddings are already a strong tool.

Analyzing Connections Between User Attributes, Images, and Text

Laura Burdick, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
Cognitive Computation, February 2020

PDF

This work explores the relationship between a person's demographic/psychological traits (e.g., gender, personality) and self-identity images and captions. We use a dataset of images and captions provided by approx. 1,350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. Additionally, we consider the task of predicting gender and personality using both single-modality features and multimodal features. We show that a multimodal predictive approach outperforms purely visual methods and purely textual methods. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day (Meeker M, 2014. Internet trends 2014-Code conference. Retrieved May 28, 2014).

@article{10.1007/s12559-019-09695-3,
author = {Burdick, Laura and Mihalcea, Rada and Boyd, Ryan L and Pennebaker, James W},
title = {{Analyzing Connections Between User Attributes, Images, and Text}},
issn = {1866-9956},
doi = {10.1007/s12559-019-09695-3},
abstract = {{This work explores the relationship between a person’s demographic/psychological traits (e.g., gender and personality) and self-identity images and captions. We use a dataset of images and captions provided by N ≈ 1350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. Additionally, we consider the task of predicting gender and personality using both single modality features and multimodal features. We show that a multimodal predictive approach outperforms purely visual methods and purely textual methods. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day.}},
pages = {1--20},
journal = {Cognitive Computation},
year = {2020}
}

"The Personality Panorama: Conceptualizing Personality Through Big Behavioural Data." Ryan L. Boyd, Paola Pasca, Kevin Lanning. European Journal of Personality. 2020.

Building a Flexible Knowledge Graph to Capture Real-World Events

Laura Burdick, Mingzhe Wang, Oana Ignat, Steve Wilson, Yiming Zhang, Yumou Wei, Rada Mihalcea, Jia Deng
Text Analysis Conference (TAC), 2019

PDF

Events and situations unfold quickly in our modern world, generating streams of Internet articles, photos, and videos. The ability to automatically sort through this wealth of information would allow us to identify which pieces of information are most important and credible, and how trends unfold over time. In this paper, we present the first piece of a system to sort through large amounts of political data from the web. Our system takes in raw multimodal input (e.g., text, images, and videos), and generates a knowledge graph connecting entities, events, and relations in meaningful ways. This work is part of the DARPA-funded Active Interpretation of Disparate Alternatives (AIDA) project, which aims to automatically build a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, building the first step of the overall system.

@article{Burdick19Building,
author = {Burdick, Laura and Mingzhe Wang and Oana Ignat and Steve Wilson and Yiming Zhang and Yumou Wei and Rada Mihalcea and Jia Deng},
title = {Building a Flexible Knowledge Graph to Capture Real-World Events},
journal = {Text Analysis Conference (TAC)},
year = {2019}
}

Identifying Visible Actions in Lifestyle Vlogs

Oana Ignat, Laura Burdick, Jia Deng, Rada Mihalcea
ACL, 2019

Oana Ignat won a Best Poster Award at the Eastern European Machine Learning Summer School 2019 for this work.

PDF Data

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present. We construct a dataset with crowdsourced manual annotations of visible actions, and introduce a multimodal algorithm that leverages information derived from visual and linguistic clues to automatically infer which actions are visible in a video. We demonstrate that our multimodal algorithm outperforms algorithms based only on one modality at a time.

@inproceedings{Ignat19Actions,
author = {Ignat, Oana and Laura Burdick and Jia Deng and Rada Mihalcea},
title = {Identifying Visible Actions in Lifestyle Vlogs},
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1643",
doi = "10.18653/v1/P19-1643",
pages = "6406--6417",
year = {2019}
}

"Speech2Action: Cross-modal Supervision for Action Recognition." Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman. CVPR. 2020.

Factors Influencing the Surprising Instability of Word Embeddings

Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea
NAACL-HLT, 2018

I wrote a blog post (Michigan AI Blog) for the general public about this work.

PDF Code Poster Slides

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.

@inproceedings{Wendlandt18Surprising,
author = {Wendlandt, Laura and Kummerfeld, Jonathan K. and Mihalcea, Rada},
title = {Factors Influencing the Surprising Instability of Word Embeddings},
pages = "2092--2102",
url = "https://www.aclweb.org/anthology/N18-1190",
doi = "10.18653/v1/N18-1190",
booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
year = {2018}
}

Improving Computer Network Operations Through Automated Interpretation of State. Abhishek Dwaraki. UMass Amherst Ph.D. Dissertation in Electrical and Computer Engineering. 2020.

"On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks." Alexander Henlein, Alexander Mehler. LREC. 2020.

"Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora." Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg. ACL. 2020.

Natural Language Processing Methods for Language Processing. Dávid Márk Nemeskey. Eötvös Loránd University Ph.D. Dissertation in School of Informatics. 2020.

"SAMPO: Unsupervised Knowledge Base Construction for Opinions and Implications." Nikita Bhutani, Aaron Traylor, Chen Chen, Xiaolan Wang, Behzad Golshan, Wang-Chiew Tan. Automated Knowledge Base Construction. 2020.

Context matters: Classifying Swedish texts using BERT's deep bidirectional word embeddings. Daniel Holmer. Linköping University Bachelor's Thesis in Computer and Information Science. 2020.

"Stolen Probability: A Structural Weakness of Neural Language Models." David Demeter, Gregory Kimmel, Doug Downey. ACL. 2020.

"Towards Understanding the Instability of Network Embedding." Chenxu Wang, Wei Rao, Wenna Guo, Pinghui Wang, Jun Liu, Xiaohong Guan. IEEE Transactions on Knowledge and Data Engineering. 2020.

"Automated Event Identification from System Logs Using Natural Language Processing." Abhishek Dwaraki, Shachi Kumary, Tilman Wolf. International Conference on Computing, Networking and Communications (ICNC). 2020.

"Understanding the Downstream Instability of Word Embeddings." Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Re. 3rd MLSys Conference. 2020.

"Learning Variable-Length Representation of Words." Debasis Ganguly. Pattern Recognition. 2020.

Named Entity Recognition and Linking with Knowledge Base. Phan Cong Minh. Diss. Nanyang Technological University, 2019. Web. January 7, 2020.

"Ideological Drifts in the U.S. Constitution: Detecting Areas of Contention with Models of Semantic Change." Abdul Z. Abdulrahim. NeurIPS Joint Workshop on AI for Social Good. 2019.

dish2vec: A Comparison of Word Embedding Methods in an Unsupervised Setting. Guus Verstegen. Erasmus University Rotterdam Master's Thesis in Econometrics and Management Science, Business Analytics and Quantitative Marketing. 2019.

Political Semantics: Methods and Applications in the Study of Meaning for Political Science. Pedro L. Rodriguez Sosa. New York University Dissertation. 2019.

"Weighted posets: Learning surface order from dependency trees." William Dyer. 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest). 2019.

"Tkol, Httt, and r/radiohead: High Affinity Terms in Reddit Communities." Abhinav Bhandari and Caitrin Armstrong. 5th Workshop on Noisy User-generated Text (W-NUT). 2019.

"Low Supervision, Low Corpus size, Low Similarity! Challenges in cross-lingual alignment of word embeddings." Andrew Dyer. Uppsala University Master's Thesis in Language Technologies. 2019.

Embeddings: Reliability & Semantic Change. Johannes Hellrich. IOS Press. August 8, 2019.

"A Metrological Framework for Evaluating Crowd-powered Instruments." Chris Welty, Lora Aroyo, and Praveen Paritosh. The seventh AAAI Conference on Human Computation and Crowdsourcing. 2019.

"Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language." Marina Santini, Benjamin Danielsson, and Arne Jönsson. 22nd Nordic Conference on Computational Linguistics. 2019.

"A Framework for Anomaly Detection Using Language Modeling, and its Applications to Finance." Armineh Nourbakhsh and Grace Bang. 2nd KDD Workshop on Anomaly Detection in Finance. 2019.

"Estimating Topic Modeling Performance with Sharma–Mittal Entropy." Sergei Koltcov, Vera Ignatenko, and Olessia Koltsova. Entropy. 2019, 21(7), 660.

"Data Shift in Legal AI Systems" Venkata Nagaraju Buddarapu and Arunprasath Shankar. Workshop on Automated Semantic Analysis of Information in Legal Text (ASAIL). 2019.

"Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection." Johannes Hellrich, Sven Buechel, and Udo Hahn. Workshop on Language Technologies for the Socio-Economic Sciences and Humanities (LaTeCH-CLfL). 2019.

"Investigating the Stability of Concrete Nouns in Word Embeddings." Bénédicte Pierrejean and Ludovic Tanguy. International Conference on Computational Semantics. 2019.

"Can prediction-based distributional semantic models predict typicality?" Tom Heyman and Geert Heyman. Quarterly Journal of Experimental Psychology. 2019.

"Density Matching for Bilingual Word Embedding." Chunting Zhou, Xuezhe Ma, Di Wang, and Graham Neubig. NAACL-HLT. 2019.

"CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling" Felipe Viegas, Sérgio Canuto Christian Gomes, Washington Luis, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 2019.

Computational approaches for German particle verbs: compositionality, sense discrimination and non-literal language. Maximilian Köper. Diss. Universität Stuttgart, 2018. Web. November 26, 2018.

"Transparent, Efficient, and Robust Word Embedding Access with WOMBAT." Mark-Christoph Müller and Michael Strube. COLING: System Demonstrations. 2018.

"What’s in Your Embedding, And How It Predicts Task Performance." Anna Rogers, Shashwatch Hosur Ananthakrishna, and Anna Rumshisky. COLING. 2018.

"Analyzing Hypersensitive AI: Instability in Corporate-Scale Machine Learning." Michaela Regneri, Malte Hoffmann, Jurij Kost, Niklas Pietsch, Timo Schulz and Sabine Stamm. IJCAI-ECAI Workshop on Explainable AI. 2018.

"Subcharacter Information in Japanese Embeddings: When Is It Worth It?" Marzena Karpinska, Bofang Li, Anna Rogers, and Aleksandr Drozd. Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP (RepL4NLP). 2018.

Entity and Event Extraction from Scratch Using Minimal Training Data

Laura Wendlandt, Steve Wilson, Oana Ignat, Charles Welch, Li Zhang, Mingzhe Wang, Jia Deng, Rada Mihalcea
Text Analysis Conference (TAC), 2018

PDF

Understanding current world events in real-time involves sifting through news articles, tweets, photos, and videos from many different perspectives. The goal of the DARPA-funded AIDA project is to automate much of this process, building a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, and we are building the first step of the overall system. Given raw multimodal input (e.g., text, images, video), our goal is to generate a knowledge graph with entities, events, and relations.

@article{Wendlandt18Entity,
author = {Wendlandt, Laura and Steve Wilson and Oana Ignat and Charles Welch and Mingzhe Wang and Jia Deng and Rada Mihalcea},
title = {Entity and Event Extraction from Scratch Using Minimal Training Data},
journal = {Text Analysis Conference (TAC)},
year = {2018}
}

Multimodal Analysis and Prediction of Latent User Dimensions

Laura Wendlandt, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
SocInfo, 2017

PDF Code Poster Slides

Humans upload over 1.8 billion digital images to the internet each day, yet the relationship between the images that a person shares with others and his/her psychological characteristics remains poorly understood. In the current research, we analyze the relationship between images, captions, and the latent demographic/psychological dimensions of personality and gender. We consider a wide range of automatically extracted visual and textual features of images/captions that are shared by a large sample of individuals (N ~ 1,350). Using correlational methods, we identify several visual and textual properties that show strong relationships with individual differences between participants. Additionally, we explore the task of predicting user attributes using a multimodal approach that simultaneously leverages images and their captions. Results from these experiments suggest that images alone have significant predictive power and, additionally, multimodal methods outperform both visual features and textual features in isolation when attempting to predict individual differences.

@inproceedings{Wendlandt17Multimodal,
author = {Wendlandt, Laura and Rada Mihalce and Ryan L. Boyd and James W. Pennebaker},
title = {Multimodal Analysis and Prediction of Latent User Dimensions},
booktitle={International Conference on Social Informatics},
pages={323--340},
organization={Springer},
year = {2017}
}

"Inferring Social Media Users’ Mental Health Status from Multimodal Information." Zhentao Xu, Verónica Pérez-Rosas, Rada Mihalcea. LREC. 2020.

"Open Intent Extraction from Natural Language Interactions." Nikhita Vedula, Nedim Lipka, Pranav Maneriker, Srinivasan Parthasarathy. The Web Conference (WWW). 2020.

"Do Machines Replicate Humans? Toward a Unified Understanding of Radicalizing Content on the Open Social Web." Margeret Hall, Michael Logan, Gina S. Ligon, and Douglas C. Derrick. Policy & Internet. 2019.

"Detecting and classifying online dark visual propaganda." Mahdi Hashemi and Margeret Hall. Image and Vision Computing 89 (2019): 95-105.

Author Profiling in Social Media with Multimodal Information. Miguel Ángel Álvarez Carmona. Diss. Instituto Nacional de Astrofísica, Óptica y Electrónica, 2019. Web. March 28, 2019.

Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences

Jacob Abernethy, Cyrus Anderson, Chengyu Dai, John Dryden, Eric Schwartz, Wenbo Shen, Jonathan Stroud, Laura Wendlandt, Sheng Yang, Daniel Zhang
Bloomberg Data for Good Exchange, 2016

PDF

Performing arts organizations aim to enrich their communities through the arts. To do this, they strive to match their performance offerings to the taste of those communities. Success relies on understanding audience preference and predicting their behavior. Similar to most e-commerce or digital entertainment firms, arts presenters need to recommend the right performance to the right customer at the right time. As part of the Michigan Data Science Team (MDST), we partnered with the University Musical Society (UMS), a non-profit performing arts presenter housed in the University of Michigan, Ann Arbor. We are providing UMS with analysis and business intelligence, utilizing historical individual-level sales data. We built a recommendation system based on collaborative filtering, gaining insights into the artistic preferences of customers, along with the similarities between performances. To better understand audience behavior, we used statistical methods from customer-base analysis. We characterized customer heterogeneity via segmentation, and we modeled customer cohorts to understand and predict ticket purchasing patterns. Finally, we combined statistical modeling with natural language processing (NLP) to explore the impact of wording in program descriptions. These ongoing efforts provide a platform to launch targeted marketing campaigns, helping UMS carry out its mission by allocating its resources more efficiently. Celebrating its 138th season, UMS is a 2014 recipient of the National Medal of Arts, and it continues to enrich communities by connecting world-renowned artists with diverse audiences, especially students in their formative years. We aim to con tribute to that mission through data science and customer analytics.

@inproceedings{Abernethy2016Data,
author = {Abernethy, J. and C. Anderson and C. Dai and J. Dryden and E. Schwartz and W. Shen and J. Stroud and L. Wendlandt and S. Yang and D. Zhang},
title = {Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences},
booktitle = {Bloomberg Data for Good Exchange},
year = {2016},
}

"The Michigan Data Science Team: A Data Science Education Program with Significant Social Impact." Arya Farahi and Jonathan C. Stroud. IEEE Data Science Workshop. 2018.