Please note: As teaching faculty, I do not supervise Ph.D. students. Additionally, I am not currently looking for research students / interns right now. If I begin to look for research students, I will update this webpage; there's no need to e-mail to double-check.

My research is in the area of natural language processing, where I am interested in word embeddings, computational social science, machine learning techniques, and multimodal problems with vision and language.

Publications

Using Paraphrases to Study Properties of Contextual Embeddings

Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea
NAACL, 2022

PDF Slides Video Presentation

We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database's alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT's layers.

@inproceedings{burdick-etal-2022-using,
title = "Using Paraphrases to Study Properties of Contextual Embeddings",
author = "Burdick, Laura and
Kummerfeld, Jonathan and
Mihalcea, Rada",
booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = jul,
year = "2022",
address = "Seattle, United States",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.naacl-main.338",
pages = "4558--4568",
abstract = "We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database{'}s alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT{'}s layers.",
}

To Batch or Not to Batch? Comparing Batching and Curriculum Learning Strategies Across Tasks and Datasets

Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea
Mathematics (MDPI), September 2021

PDF Code

Many natural language processing architectures are greatly affected by seemingly small design decisions, such as batching and curriculum learning (how the training data is ordered during training). In order to better understand the impact of these decisions, we present a systematic analysis of different curriculum learning strategies and different batching strategies. We consider multiple datasets for three tasks: text classification, sentence and phrase similarity, and part-of-speech tagging. Our experiments demonstrate that certain curriculum learning and batching decisions do increase performance substantially for some tasks.

Analyzing the Surprising Variability in Word Embedding Stability Across Languages

Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea
EMNLP, 2021

PDF Code Slides Poster Video Presentation

Word embeddings are powerful representations that form the foundation of many natural language processing architectures, both in English and in other languages. To gain further insight into word embeddings, we explore their stability (e.g., overlap between the nearest neighbors of a word in different embedding spaces) in diverse languages. We discuss linguistic properties that are related to stability, drawing out insights about correlations with affixing, language gender systems, and other features. This has implications for embedding use, particularly in research that uses them to study language trends.

@inproceedings{Burdick21Analyzing,
author = {Burdick, Laura and Kummerfeld, Jonathan K. and Mihalcea, Rada},
title = {Analyzing the Surprising Variability in Word Embedding Stability Across Languages},
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
year = {2021}
}

"NEMO: Frequentist Inference Approach to Constrained Linguistic Typology Feature Prediction in SIGTYP 2020 Shared Task." Alexander Gutkin, Richard Sproat. Second Workshop on Computational Research in Linguistic Typology. 2020.

To Batch or Not to Batch? Comparing Batching and Curriculum Learning Strategies Across Tasks and Datasets

Laura Burdick, Jonathan K. Kummerfeld, Rada Mihalcea
Mathematics (MDPI), September 2021

PDF Code

Many natural language processing architectures are greatly affected by seemingly small design decisions, such as batching and curriculum learning (how the training data is ordered during training). In order to better understand the impact of these decisions, we present a systematic analysis of different curriculum learning strategies and different batching strategies. We consider multiple datasets for three tasks: text classification, sentence and phrase similarity, and part-of-speech tagging. Our experiments demonstrate that certain curriculum learning and batching decisions do increase performance substantially for some tasks.

@Article{math9182234,
AUTHOR = {Burdick, Laura and Kummerfeld, Jonathan K. and Mihalcea, Rada},
TITLE = {To Batch or Not to Batch? Comparing Batching and Curriculum Learning Strategies across Tasks and Datasets},
JOURNAL = {Mathematics},
VOLUME = {9},
YEAR = {2021},
NUMBER = {18},
ARTICLE-NUMBER = {2234},
URL = {https://www.mdpi.com/2227-7390/9/18/2234},
ISSN = {2227-7390},
DOI = {10.3390/math9182234}
}

Ph.D. Dissertation: Understanding Word Embedding Stability Across Languages and Applications

Laura Burdick
August 2020

PDF Slides Code (Ch. 3) Code (Ch. 4) Code (Ch. 5)

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this thesis, we consider several aspects of embedding spaces, including their stability. First, we propose a definition of stability, and show that common English word embeddings are surprisingly unstable. We explore how properties of data, words, and algorithms relate to instability. We extend this work to approximately 100 world languages, considering how linguistic typology relates to stability. Additionally, we consider contextualized output embedding spaces. Using paraphrases, we explore properties and assumptions of BERT, a popular embedding algorithm.

Second, we consider how stability and other word embedding properties affect tasks where embeddings are commonly used. We consider both word embeddings used as features in downstream applications and corpus-centered applications, where embeddings are used to study characteristics of language and individual writers. In addition to stability, we also consider other word embedding properties, specifically batching and curriculum learning, and how methodological choices made for these properties affect downstream tasks.

Finally, we consider how knowledge of stability affects how we use word embeddings. Throughout this thesis, we discuss strategies to mitigate instability and provide analyses highlighting the strengths and weaknesses of word embeddings in different scenarios and languages. We show areas where more work is needed to improve embeddings, and we show where embeddings are already a strong tool.

@phdthesis{burdickThesis2020,
author = "Burdick, Laura",
title = "Understanding Word Embedding Stability Across Languages and Applications",
school = "University of Michigan",
year = "2020"
}

Analyzing Connections Between User Attributes, Images, and Text

Laura Burdick, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
Cognitive Computation, February 2020

PDF

This work explores the relationship between a person's demographic/psychological traits (e.g., gender, personality) and self-identity images and captions. We use a dataset of images and captions provided by approx. 1,350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. Additionally, we consider the task of predicting gender and personality using both single-modality features and multimodal features. We show that a multimodal predictive approach outperforms purely visual methods and purely textual methods. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day (Meeker M, 2014. Internet trends 2014-Code conference. Retrieved May 28, 2014).

@article{10.1007/s12559-019-09695-3,
author = {Burdick, Laura and Mihalcea, Rada and Boyd, Ryan L and Pennebaker, James W},
title = {{Analyzing Connections Between User Attributes, Images, and Text}},
issn = {1866-9956},
doi = {10.1007/s12559-019-09695-3},
abstract = {{This work explores the relationship between a person’s demographic/psychological traits (e.g., gender and personality) and self-identity images and captions. We use a dataset of images and captions provided by N ≈ 1350 individuals, and we automatically extract features from both the images and captions. We identify several visual and textual properties that show reliable relationships with individual differences between participants. The automated techniques presented here allow us to draw interesting conclusions from our data that would be difficult to identify manually, and these techniques are extensible to other large datasets. Additionally, we consider the task of predicting gender and personality using both single modality features and multimodal features. We show that a multimodal predictive approach outperforms purely visual methods and purely textual methods. We believe that our work on the relationship between user characteristics and user data has relevance in online settings, where users upload billions of images each day.}},
pages = {1--20},
journal = {Cognitive Computation},
year = {2020}
}

"The Personality Panorama: Conceptualizing Personality Through Big Behavioural Data." Ryan L. Boyd, Paola Pasca, Kevin Lanning. European Journal of Personality. 2020.

"A novel approach to stance detection in social media tweets by fusing ranked lists and sentiments." Abdulrahman I.Al-Ghadir, Aqil M.Azmi, Amir Hussain. Information Fusion. Elsevier, March 2021.

"A meta-analysis of linguistic markers of extraversion: Positive emotion and social process words." Jiayu Chen, Lin Qiu, Moon-Ho Ringo Ho. Journal of Research in Personality. Elsevier, December 2020.

"Multimodal sentiment and emotion recognition in hyperbolic space." Keith April Araño, Carlotta Orsenigo, Mauricio Soto, Carlo Vercellis. Expert Systems with Applications. Elsevier, December 2021.

"Applying Attention-Based Models for Detecting Cognitive Processes and Mental Health Conditions." Esaú Villatoro-Tello, Shantipriya Parida, Sajit Kumar, Petr Motlicek. Cognitive Computation. 2021.

Building a Flexible Knowledge Graph to Capture Real-World Events

Laura Burdick, Mingzhe Wang, Oana Ignat, Steve Wilson, Yiming Zhang, Yumou Wei, Rada Mihalcea, Jia Deng
Text Analysis Conference (TAC), 2019

PDF

Events and situations unfold quickly in our modern world, generating streams of Internet articles, photos, and videos. The ability to automatically sort through this wealth of information would allow us to identify which pieces of information are most important and credible, and how trends unfold over time. In this paper, we present the first piece of a system to sort through large amounts of political data from the web. Our system takes in raw multimodal input (e.g., text, images, and videos), and generates a knowledge graph connecting entities, events, and relations in meaningful ways. This work is part of the DARPA-funded Active Interpretation of Disparate Alternatives (AIDA) project, which aims to automatically build a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, building the first step of the overall system.

@article{Burdick19Building,
author = {Burdick, Laura and Mingzhe Wang and Oana Ignat and Steve Wilson and Yiming Zhang and Yumou Wei and Rada Mihalcea and Jia Deng},
title = {Building a Flexible Knowledge Graph to Capture Real-World Events},
journal = {Text Analysis Conference (TAC)},
year = {2019}
}

"Toward Benevolent AGI by Integrating Knowledge Graphs for Classical Economics, Education, and Health: AI Governed by Ethics and Trust-Based Social Capital." Don MacRae. Technological Breakthroughs and Future Business Opportunities in Education, Health, and Outer Space. IGI Global, 2021. 163-186.

Identifying Visible Actions in Lifestyle Vlogs

Oana Ignat, Laura Burdick, Jia Deng, Rada Mihalcea
ACL, 2019

Oana Ignat won a Best Poster Award at the Eastern European Machine Learning Summer School 2019 for this work.

PDF Data

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present. We construct a dataset with crowdsourced manual annotations of visible actions, and introduce a multimodal algorithm that leverages information derived from visual and linguistic clues to automatically infer which actions are visible in a video. We demonstrate that our multimodal algorithm outperforms algorithms based only on one modality at a time.

@inproceedings{Ignat19Actions,
author = {Ignat, Oana and Laura Burdick and Jia Deng and Rada Mihalcea},
title = {Identifying Visible Actions in Lifestyle Vlogs},
booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1643",
doi = "10.18653/v1/P19-1643",
pages = "6406--6417",
year = {2019}
}

"Speech2Action: Cross-modal Supervision for Action Recognition." Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman. CVPR. 2020.

"Condensed Movies: Story Based Retrieval with Contextual Embeddings." Max Bain, Arsha Nagrani, Andrew Brown, Andrew Zisserman. ACCV. 2020.

"What is More Likely to Happen Next? Video-and-Language Future Event Prediction." Jie Lei, Licheng Yu, Tamara Berg, Mohit Bansal. EMNLP. 2020.

"Learning To Segment Actions From Visual and Language Instructions via Differentiable Weak Sequence Alignment." Yuhan Shen, Lu Wang, Ehsan Elhamifar. CVPR. 2021.

"Look Before You Speak: Visually Contextualized Utterances." Paul Hongsuck Seo, Arsha Nagrani, Cordelia Schmid. CVPR. 2021.

Video Understanding using Multimodal Deep Learning. Arsha Nagrani. Wolfson College, University of Oxford, Ph.D. Thesis. 2020.

Factors Influencing the Surprising Instability of Word Embeddings

Laura Wendlandt, Jonathan K. Kummerfeld, Rada Mihalcea
NAACL-HLT, 2018

I wrote a blog post (Michigan AI Blog) for the general public about this work.

PDF Code Poster Slides

Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.

@inproceedings{Wendlandt18Surprising,
author = {Wendlandt, Laura and Kummerfeld, Jonathan K. and Mihalcea, Rada},
title = {Factors Influencing the Surprising Instability of Word Embeddings},
pages = "2092--2102",
url = "https://www.aclweb.org/anthology/N18-1190",
doi = "10.18653/v1/N18-1190",
booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
year = {2018}
}

Improving Computer Network Operations Through Automated Interpretation of State. Abhishek Dwaraki. UMass Amherst Ph.D. Dissertation in Electrical and Computer Engineering. 2020.

"On the Influence of Coreference Resolution on Word Embeddings in Lexical-semantic Evaluation Tasks." Alexander Henlein, Alexander Mehler. LREC. 2020.

"Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora." Hila Gonen, Ganesh Jawahar, Djamé Seddah, Yoav Goldberg. ACL. 2020.

Natural Language Processing Methods for Language Processing. Dávid Márk Nemeskey. Eötvös Loránd University Ph.D. Dissertation in School of Informatics. 2020.

"SAMPO: Unsupervised Knowledge Base Construction for Opinions and Implications." Nikita Bhutani, Aaron Traylor, Chen Chen, Xiaolan Wang, Behzad Golshan, Wang-Chiew Tan. Automated Knowledge Base Construction. 2020.

Context matters: Classifying Swedish texts using BERT's deep bidirectional word embeddings. Daniel Holmer. Linköping University Bachelor's Thesis in Computer and Information Science. 2020.

"Stolen Probability: A Structural Weakness of Neural Language Models." David Demeter, Gregory Kimmel, Doug Downey. ACL. 2020.

"Towards Understanding the Instability of Network Embedding." Chenxu Wang, Wei Rao, Wenna Guo, Pinghui Wang, Jun Liu, Xiaohong Guan. IEEE Transactions on Knowledge and Data Engineering. 2020.

"Automated Event Identification from System Logs Using Natural Language Processing." Abhishek Dwaraki, Shachi Kumary, Tilman Wolf. International Conference on Computing, Networking and Communications (ICNC). 2020.

"Understanding the Downstream Instability of Word Embeddings." Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Re. 3rd MLSys Conference. 2020.

"Learning Variable-Length Representation of Words." Debasis Ganguly. Pattern Recognition. 2020.

Named Entity Recognition and Linking with Knowledge Base. Phan Cong Minh. Diss. Nanyang Technological University, 2019. Web. January 7, 2020.

"Ideological Drifts in the U.S. Constitution: Detecting Areas of Contention with Models of Semantic Change." Abdul Z. Abdulrahim. NeurIPS Joint Workshop on AI for Social Good. 2019.

dish2vec: A Comparison of Word Embedding Methods in an Unsupervised Setting. Guus Verstegen. Erasmus University Rotterdam Master's Thesis in Econometrics and Management Science, Business Analytics and Quantitative Marketing. 2019.

Political Semantics: Methods and Applications in the Study of Meaning for Political Science. Pedro L. Rodriguez Sosa. New York University Dissertation. 2019.

"Weighted posets: Learning surface order from dependency trees." William Dyer. 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest). 2019.

"Tkol, Httt, and r/radiohead: High Affinity Terms in Reddit Communities." Abhinav Bhandari and Caitrin Armstrong. 5th Workshop on Noisy User-generated Text (W-NUT). 2019.

"Low Supervision, Low Corpus size, Low Similarity! Challenges in cross-lingual alignment of word embeddings." Andrew Dyer. Uppsala University Master's Thesis in Language Technologies. 2019.

Embeddings: Reliability & Semantic Change. Johannes Hellrich. IOS Press. August 8, 2019.

"A Metrological Framework for Evaluating Crowd-powered Instruments." Chris Welty, Lora Aroyo, and Praveen Paritosh. The seventh AAAI Conference on Human Computation and Crowdsourcing. 2019.

"Comparing the Performance of Feature Representations for the Categorization of the Easy-to-Read Variety vs Standard Language." Marina Santini, Benjamin Danielsson, and Arne Jönsson. 22nd Nordic Conference on Computational Linguistics. 2019.

"A Framework for Anomaly Detection Using Language Modeling, and its Applications to Finance." Armineh Nourbakhsh and Grace Bang. 2nd KDD Workshop on Anomaly Detection in Finance. 2019.

"Estimating Topic Modeling Performance with Sharma–Mittal Entropy." Sergei Koltcov, Vera Ignatenko, and Olessia Koltsova. Entropy. 2019, 21(7), 660.

"Data Shift in Legal AI Systems" Venkata Nagaraju Buddarapu and Arunprasath Shankar. Workshop on Automated Semantic Analysis of Information in Legal Text (ASAIL). 2019.

"Modeling Word Emotion in Historical Language: Quantity Beats Supposed Stability in Seed Word Selection." Johannes Hellrich, Sven Buechel, and Udo Hahn. Workshop on Language Technologies for the Socio-Economic Sciences and Humanities (LaTeCH-CLfL). 2019.

"Investigating the Stability of Concrete Nouns in Word Embeddings." Bénédicte Pierrejean and Ludovic Tanguy. International Conference on Computational Semantics. 2019.

"Can prediction-based distributional semantic models predict typicality?" Tom Heyman and Geert Heyman. Quarterly Journal of Experimental Psychology. 2019.

"Density Matching for Bilingual Word Embedding." Chunting Zhou, Xuezhe Ma, Di Wang, and Graham Neubig. NAACL-HLT. 2019.

"CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling" Felipe Viegas, Sérgio Canuto Christian Gomes, Washington Luis, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 2019.

Computational approaches for German particle verbs: compositionality, sense discrimination and non-literal language. Maximilian Köper. Diss. Universität Stuttgart, 2018. Web. November 26, 2018.

"Transparent, Efficient, and Robust Word Embedding Access with WOMBAT." Mark-Christoph Müller and Michael Strube. COLING: System Demonstrations. 2018.

"What’s in Your Embedding, And How It Predicts Task Performance." Anna Rogers, Shashwatch Hosur Ananthakrishna, and Anna Rumshisky. COLING. 2018.

"Analyzing Hypersensitive AI: Instability in Corporate-Scale Machine Learning." Michaela Regneri, Malte Hoffmann, Jurij Kost, Niklas Pietsch, Timo Schulz and Sabine Stamm. IJCAI-ECAI Workshop on Explainable AI. 2018.

"Subcharacter Information in Japanese Embeddings: When Is It Worth It?" Marzena Karpinska, Bofang Li, Anna Rogers, and Aleksandr Drozd. Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP (RepL4NLP). 2018.

"What’s in Your Embedding, And How It Predicts Task Performance." Anna Rogers, Shashwath Hosur Ananthakrishna, Anna Rumshisky. COLING. 2018.

"Seed-driven Document Ranking for Systematic Reviews in Evidence-Based Medicine." Grace E. Lee, Aixin Sun. SIGIR. 2018.

"Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings." Yadollah Yaghoobzadeh, Katharina Kann, T. J. Hazen, Eneko Agirre, Hinrich Schütze. ACL. 2019.

"Medical Information Extraction in the Age of Deep Learning." Udo Hahn, Michel Oleynik. Yearbook of Medical Informatics. 2020.

"The Influence of Down-Sampling Strategies on SVD Word Embedding Stability." Johannes Hellrich, Bernd Kampe, Udo Hahn. RepEval. 2019.

"Writing habits and telltale neighbors: analyzing clinical concept usage patterns with sublanguage embeddings." Denis Newman-Griffis, Eric Fosler-Lussier. LOUHI. 2019.

"Follow the leader: Documents on the leading edge of semantic change get more citations." Sandeep Soni, Kristina Lerman, Jacob Eisenstein. Journal of the Association for Information Science and Technology. 2020.

"Characterizing News Portrayal of Civil Unrest in Hong Kong, 1998–2020." James Scharf, Arya D. McCarthy, Giovanna Maria Dora Dore. Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). 2021.

"Qualitative evaluation of word embeddings: investigating the instability in neural-based models." Bénédicte Pierrejean. Doctoral thesis, Université Toulouse le Mirail. 2020.

"Is Wikipedia succeeding in reducing gender bias? Assessing changes in gender bias in Wikipedia using word embeddings." Katja Geertruida Schmahl, Tom Julian Viering, Stavros Makrodimitris, Arman Naseri Jahfari, David Tax, Marco Loog. Workshop on Natural Language Processing and Computational Social Science. 2020.

"Diachronic Embeddings for People in the News." Felix Hennig, Steven Wilson. Workshop on Natural Language Processing and Computational Social Science. 2020.

"Short-term Semantic Shifts and their Relation to Frequency Change." Anna Marakasova, Julia Neidhardt. Probability and Meaning Conference (PaM). 2020.

"Visualizing and Quantifying Vocabulary Learning During Search." Nilavra Bhattacharya, Jacek Gwizdka. CIKM Workshops. 2020.

"Data Movement Is All You Need: A Case Study on Optimizing Transformers." Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, Torsten Hoefler. Machine Learning and Systems (MLSys). 2021.

"Sociolinguistically Driven Approaches for Just Natural Language Processing." Su Lin Blodgett. Doctoral thesis, UMass Amherst 2021.

"Detecting Different Forms of Semantic Shift in Word Embeddings via Paradigmatic and Syntagmatic Association Changes." Anna Wegmann, Florian Lemmerich, Markus Strohmaier. International Semantic Web Conference 2020.

"An Empirical Study of the Downstream Reliability of Pre-Trained Word Embeddings." Anthony Rios, Brandon Lwowski. COLING 2020.

"Revisiting the Context Window for Cross-lingual Word Embeddings." Ryokan Ri, Yoshimasa Tsuruoka. ACL 2020.

"Searching for ‘Austerity’: Using Semantic Shifts in Word Embeddings as Indicators of Changing Ideological Positions." John A. Bateman, Cécile L. Paris. Book Chapter, Multimodal Approaches to Media Discourses 2020.

"Knowledge-Guided Efficient Representation Learning for Biomedical Domain." Kishlay Jha, Guangxu Xun, Nan Du, Aidong Zhang. KDD 2021.

"Managing ML Pipelines: Feature Stores and the Coming Wave of Embedding Ecosystems." Laurel Orr, Atindriyo Sanyal, Xiao Ling, Karan Goel, Megan Leszczynski. PVLDB 2021.

"Microfoundations and Measurement for Ambiguity in Communication with Application to Social Networks." V. Govind Manian. Doctoral thesis, Stanford University 2021.

"Comparing the performance of various Swedish BERT models for classification." Daniel Holmer, Arne Jönsson. Swedish Language Technology Conference (SLTC) 2020.

"Understanding Political Communication with Contextualized Methods from Natural Language Processing." Leslie Huang. Doctoral thesis, New York University 2021.

"Embedding Structured Dictionary Entries." Steven Wilson, Walid Magdy, Barbara McGillivray, Gareth Tyson. Insights from Negative Results in NLP 2020.

"Understanding the stability of medical concept embeddings." Grace E. Lee, Aixin Sun. Journal of the Association for Information Science and Technology (JAIST) 2020.

"Measuring the Semantic Stability of Word Embedding." Zhenhao Huang, Chenxu Wang. International Conference on Natural Language Processing and Chinese Computing 2020.

"Understanding Stability of Medical Concept Embeddings: Analysis and Prediction." Grace E. Lee, Aixin Sun. CoRR 2019.

"Analyzing the Role of Natural Language in Neural-Symbolic Models." David Demeter. Doctoral thesis, Northwestern University 2020.

"Analyzing the Role of Natural Language in Neural-Symbolic Models." David Demeter. Doctoral thesis, Northwestern University 2020.

"GEPC: Global embeddings with PID control." Ning Gong, Nianmin Yao, Ziying Lv, Shibin Wang. Computer Speech & Language 2021.

"TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora." Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, Harry Hochheiser. PMC 2021.

Entity and Event Extraction from Scratch Using Minimal Training Data

Laura Wendlandt, Steve Wilson, Oana Ignat, Charles Welch, Li Zhang, Mingzhe Wang, Jia Deng, Rada Mihalcea
Text Analysis Conference (TAC), 2018

PDF

Understanding current world events in real-time involves sifting through news articles, tweets, photos, and videos from many different perspectives. The goal of the DARPA-funded AIDA project is to automate much of this process, building a knowledge base that can be queried to strategically generate hypotheses about different aspects of an event. We are participating in this project as a TA1 team, and we are building the first step of the overall system. Given raw multimodal input (e.g., text, images, video), our goal is to generate a knowledge graph with entities, events, and relations.

@article{Wendlandt18Entity,
author = {Wendlandt, Laura and Steve Wilson and Oana Ignat and Charles Welch and Mingzhe Wang and Jia Deng and Rada Mihalcea},
title = {Entity and Event Extraction from Scratch Using Minimal Training Data},
journal = {Text Analysis Conference (TAC)},
year = {2018}
}

Multimodal Analysis and Prediction of Latent User Dimensions

Laura Wendlandt, Rada Mihalcea, Ryan L. Boyd, James W. Pennebaker
SocInfo, 2017

PDF Code Poster Slides

Humans upload over 1.8 billion digital images to the internet each day, yet the relationship between the images that a person shares with others and his/her psychological characteristics remains poorly understood. In the current research, we analyze the relationship between images, captions, and the latent demographic/psychological dimensions of personality and gender. We consider a wide range of automatically extracted visual and textual features of images/captions that are shared by a large sample of individuals (N ~ 1,350). Using correlational methods, we identify several visual and textual properties that show strong relationships with individual differences between participants. Additionally, we explore the task of predicting user attributes using a multimodal approach that simultaneously leverages images and their captions. Results from these experiments suggest that images alone have significant predictive power and, additionally, multimodal methods outperform both visual features and textual features in isolation when attempting to predict individual differences.

@inproceedings{Wendlandt17Multimodal,
author = {Wendlandt, Laura and Rada Mihalce and Ryan L. Boyd and James W. Pennebaker},
title = {Multimodal Analysis and Prediction of Latent User Dimensions},
booktitle={International Conference on Social Informatics},
pages={323--340},
organization={Springer},
year = {2017}
}

"Inferring Social Media Users’ Mental Health Status from Multimodal Information." Zhentao Xu, Verónica Pérez-Rosas, Rada Mihalcea. LREC. 2020.

"Open Intent Extraction from Natural Language Interactions." Nikhita Vedula, Nedim Lipka, Pranav Maneriker, Srinivasan Parthasarathy. The Web Conference (WWW). 2020.

"Do Machines Replicate Humans? Toward a Unified Understanding of Radicalizing Content on the Open Social Web." Margeret Hall, Michael Logan, Gina S. Ligon, and Douglas C. Derrick. Policy & Internet. 2019.

"Detecting and classifying online dark visual propaganda." Mahdi Hashemi and Margeret Hall. Image and Vision Computing 89 (2019): 95-105.

Author Profiling in Social Media with Multimodal Information. Miguel Ángel Álvarez Carmona. Diss. Instituto Nacional de Astrofísica, Óptica y Electrónica, 2019. Web. March 28, 2019.

"Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts." Julia Kruk, Jonah Lubin, Karan Sikka, Xiao Lin, Dan Jurafsky, Ajay Divakaran. EMNLP. 2019.

"Brown Hands Aren’t Terrorists: Challenges in Image Classification of Violent Extremist Content." Margeret Hall, Christian Haas. HCII. 2021.

Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences

Jacob Abernethy, Cyrus Anderson, Chengyu Dai, John Dryden, Eric Schwartz, Wenbo Shen, Jonathan Stroud, Laura Wendlandt, Sheng Yang, Daniel Zhang
Bloomberg Data for Good Exchange, 2016

PDF

Performing arts organizations aim to enrich their communities through the arts. To do this, they strive to match their performance offerings to the taste of those communities. Success relies on understanding audience preference and predicting their behavior. Similar to most e-commerce or digital entertainment firms, arts presenters need to recommend the right performance to the right customer at the right time. As part of the Michigan Data Science Team (MDST), we partnered with the University Musical Society (UMS), a non-profit performing arts presenter housed in the University of Michigan, Ann Arbor. We are providing UMS with analysis and business intelligence, utilizing historical individual-level sales data. We built a recommendation system based on collaborative filtering, gaining insights into the artistic preferences of customers, along with the similarities between performances. To better understand audience behavior, we used statistical methods from customer-base analysis. We characterized customer heterogeneity via segmentation, and we modeled customer cohorts to understand and predict ticket purchasing patterns. Finally, we combined statistical modeling with natural language processing (NLP) to explore the impact of wording in program descriptions. These ongoing efforts provide a platform to launch targeted marketing campaigns, helping UMS carry out its mission by allocating its resources more efficiently. Celebrating its 138th season, UMS is a 2014 recipient of the National Medal of Arts, and it continues to enrich communities by connecting world-renowned artists with diverse audiences, especially students in their formative years. We aim to con tribute to that mission through data science and customer analytics.

@inproceedings{Abernethy2016Data,
author = {Abernethy, J. and C. Anderson and C. Dai and J. Dryden and E. Schwartz and W. Shen and J. Stroud and L. Wendlandt and S. Yang and D. Zhang},
title = {Data Science in Service of Performing Arts: Applying Machine Learning to Predicting Audience Preferences},
booktitle = {Bloomberg Data for Good Exchange},
year = {2016},
}

"The Michigan Data Science Team: A Data Science Education Program with Significant Social Impact." Arya Farahi and Jonathan C. Stroud. IEEE Data Science Workshop. 2018.