Background of Data

The data for this analysis comes from publically available collegiate news postings hosted through the Smith College website. In order to communicate efficiently with students, families, faculty members, and the community, the College frequently opts to publish “letters”, or briefings related to current events on campus or on topics that are otherwise relevant to the workings of the College. Some of these letters come directly from Smith College President Kathleen McCartney, while others do not specify authorship and simply communicate information. The letters range from 56 words in length to 783 words*. These letters represent valuable data because of their centrality to the Smith Community; the publication of a letter is typically accompanied by an email to students, faculty, staff, and alumnx who choose to receive notifications about the College. Because they are designed to communicate the Smith administration’s position on current events and situations at the College, these letters are an excellent resource for assessing sentiment in the highest levels of the Smith administrative hiaerarchy.

  • Word counts do not include “filler” words, such as “of”, “the”, “to”, and “in.”

Data Collection

In order to collect this data, we employed a series of webscraping techniques through R, primarily using rvest and httr packages. Initial collection involved scraping all available links on the landing page for the community letters. These links are each associated with a unique letter, and were then individually accessed through the construction of a function designed to iterate over each URL and scrape the contents of the related webpage. Once the contents of each webpage was scraped, we were able to summarise each letter in one row of a dataframe, with a single column containing all of the letter’s text. With help from the tidytext package, designed for sentiment analysis and text parsing, we were then able to seperate out each individual word from each letter into its own row, leaving us with an expansive data set explicitly showcasing the entire textual composition of each letter. In order to perform sentiment analysis, we proceeded to download and join data sets associated with three seperate sentiment lexiscons (NRC, Afinn, and Bing) to our letters data set, producing additional dataframes wherein individual words identified with their lexicon-specific sentiment rating. These data sets can be manipulated, summarized, and plotted as any dataframe in R, and their content can be used to assess a variety of sentiment constructs within the letters.

Data Variables

The table below is a small sample of the complete data set with the title of each community letter, the link, the date it was published, each word within the letter, and the word position in each letter.

ID title link date word word_number
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 dear 1
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 students 2
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 staff 3
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 faculty 4
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 spring 5
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 time 6
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 renewal 7
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 anticipation 8
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 beautiful 9
1 Letter to the Community, April 20, 2020 https://www.smith.edu/president-kathleen-mccartney/letters/2019-20/april-20-2020 2020-04-20 campus 10