The work below was a part of my final project in my MS CS. It visualizes the relationship between various subreddits and groups of subreddits based on their word use, and its visualizations show, or at least strongly suggest, that each of these online social groups uses a different vocabulary. This project touches on many of my favorite topics, including dimensionality reduction, data visualization, linguistic style, big data, and NLP for mental health.
Research related to this project spanned most of 2018 and some of 2019. Initial versions of this document were completed in March of 2019. Related PySpark code, presentations, etc. can be found on Github.
Comments