Projects
Misinformation in the Venezuela-Guyana Border Dispute and Its Impact on Venezuelan Migrants in Guyana
Sentiment Analysis on YouTube Comments | Completed July 2024



This project was created during a 10-week Research Experiences for Undergraduates (REU) internship and was developed in the Storymodelers lab at the Virginia Modeling, Simulation, and Analysis Center, a multidisciplinary research center at Old Dominion University. I was selected as 1 of 10 students from 142 applicants to participate in the REU program, which is funded by the National Science Foundation.
Throughout the internship, I worked closely with faculty mentors and graduate students to create this project, which was inspired by the work of my faculty advisor, Dr. Erika Frydenlund. Overall, the REU program helped me gain hands-on experience with real-world datasets and analytical tools in the field of data science.
Skills Developed Through This Project:
- Research: Applied research methodologies to create and investigate research questions.
- Data Collection and APIs: Used the YouTube Data API to scrape 21,241 comments from 30 YouTube videos.
- Exploratory Data Analysis: Conducted exploratory data analysis to identify key themes in comments and evaluate the impact of misinformation on sentiments towards Venezuelan refugees, cross-referencing findings with official news articles for accuracy.
- Data Cleaning and Preprocessing: Cleaned and preprocessed the collected comments to prepare them for sentiment analysis. This included converting text to lowercase, removing punctuation and special characters, tokenizing text, removing stop words, applying stemming or lemmatization techniques, and translating non-English comments into English using a custom-built translation program with ChatGPT 4.0.
- Natural Language Processing: Performed sentiment analysis using TextBlob to determine the polarity and subjectivity of the comments. Filtered comments using 392 keywords to create a new dataset with 6,201 relevant comments.
- Data Visualization: Created visualizations to represent sentiment distribution and analyze correlations. Used Matplotlib and Seaborn to create bar charts and line charts to visually interpret the analysis results.
- Machine Learning: Explored text classification models for misinformation detection. Although not fully implemented, gained an understanding of how pre-trained models like RoBERTa can be applied to detect misinformation in text data.