Proposal for Digital Archive on Mainframe Computers

Posted on November 22, 2021 by Kai Prenger

I’m proposing the creation of a digital archive for textual and visual material related to mainframe computers. As a part of the curation process, we will perform glitches or additional manipulations of the material as a method interrogate the assumptions we have about business computing and mainframe computers.

Both minicomputers and super computers are well understood in academia. Digital humanities emerged in a period when computing costs dropped precipitously through innovations related to the microprocessor. Scientific computing relied on supercomputers addressed on complex mathematical calculations. Although its birth place can be traced to the academy, mainframe computers mostly found application in business contexts. Given its pedigree, mainframe computing is an under explored topic in digital humanities. Yet concepts made popular by mainframe computing live on in today’s software infrastructure in the form of batch processing, multitenancy in systems, timesharing of compute / storage resources and transactional databases. The age of Big Iron also continued the contribution of women to information technology often made less visible than those made by men. For example, the theory of machine-independent programming languages and the creation of FLOW-MATIC language by Grace Hopper, which was extended as COBOL, the principal program language used program mainframe computers.

This project will handle subject matter germane to the many facets and specters of digital humanities. Using frameworks like Wax from the minimal computing group ensures some longevity of the archive in question. Mainframe computers continue to power the majority of credit card transaction processing. The prominence of this type of hardware in governmental payment process came to light when stimulus checks weren’t issued in a timely manner during the pandemic. Maintenance (or lack thereof) of our critical technology is a major concern of digital infrastructure scholarship. Historical research will be critical to reconstruct a cultural particularities of time and place. It would also be advisable to

I am also keen on inviting collaborators with different skill sets. Some may want to work through the issue of cataloging, while others write essays explaining how mainframes function, what work they do or what the outputs are like. If we want to take a deformative approach to the archive, we could have someone coming from a design perspective (Adobe tools, PhotoMosh), another from an informed naif (data bending with Audacity/converted text file), while a third collaborator could deploy an algorithmic approach (software libraries like glitch-this or pixelsort). Regardless of how we approach the archive, I expect curation and critical writing will be instrumental in guiding audiences through the material.

Grant Writing for DH

Posted on November 18, 2021 by Stephanie Barnes

Grant writing is always a tedious process. I’ve been able to learn about this process through my job at a historical house non-profit. As a small institution, we are grateful for being eligible for small-business grants, but sadly miss out on the bigger ones (typically government funders) can often be monetarily larger. Our Director of Development and Community Engagement is our grant writer and since I work closely with her, I’m able to gain some experience in assisting with the grant writing process. We recently received our first government grant and are also currently writing a proposal for NEH. Oddly, this NEH grant is very specifically NOT Digital Humanities. As we are slowly coming out of the pandemic, NEH is looking to fund institutions for in-person humanities projects.

Sample of an NEH grant basic outline template. This is one of many components issued to applicants.

From this experience and with the reading for this week there are a few things I’ve learned about NEH grants.

NEH grants are very time consuming! They contain a lot of components, with multiple sections that can be at least 10 pages (single space) long each. Not only that, NEH looks for academic writing, where most other grant-giving institutions prefer a conversational tone.
They are thorough! Having multiple components will make you (the person or institution writing the grant) stronger. These can include project format, resources, history, project leaders, and of course, a detailed budget.
They are a great skill to learn! Grant writing is very useful and can be beneficial in many job positions, whether you will be writing grants or not. For our class, it’s great for grad students to get funding for their academic projects.

Grant Tracking Resources

Grant writing requires a lot of time and organization. There are many applications out there to stay organized, such as Grant Hub or Airtable. You can also create a spreadsheet to help stay on task.

An example of tracking the components of an NEH grant with Google Sheets. For privacy purposes, I cut off columns to the right that list who the section is assigned to, when it’s due, and status. This is helpful for when you have multiple people working on one grant.

Maintain Significance

The posts we read about for today’s class were very informative. Even though at my work we are not currently working on a DH grant, there is still a lot I’ve learned that I can share with my team to help stay on track. One of the things that sticks out the most from the readings was keeping in mind the significance of your project and the criteria for each section.

“Second, as you weave together your prose to craft the narrative and other required documents, keep in mind the six evaluation criteria that peer reviewers will use to evaluate your application (each corresponding to different elements of the application):

1) The intellectual significance and impact of the project for the humanities
(corresponds to narrative sections Enhancing the humanities and Final product and dissemination)
2) The quality of the overall conception, organization, and description of how the proposed work sits within a broader context, and quality of the argument for new (or further) work in this area. (corresponds with the narrative sections Environmental scan and History of the project)
3) The feasibility and appropriateness of the activities, work plan, methodology, and use of technology, and the project’s plans for mitigating risk and addressing accessibility for its intended audiences (corresponds to narrative sections Activities and project team and Final product and dissemination and Attachment 3. Work plan)
4) The qualifications, expertise, and levels of commitment of the project director and key project staff or contributors (corresponds to narrative section Activities and project team and Attachment 4: Biographies)
5) The reasonableness of the proposed budget in relation to the proposed activities, staff compensation, the anticipated results, products, and dissemination (corresponds to narrative section Activities and project team, Attachment 3: Work plan, and the Budget)
6) The quality and appropriateness of project plans for data management and (if applicable) sustainability  (corresponds to Attachment 5: Data management plan, and for Level III applicants, Attachment 6: Sustainability plan) “
https://www.neh.gov/blog/planning-your-next-dhag-1-idea-audience-innovation-context

It is helpful to continue to reference these steps to make sure you don’t lose sight of the project’s needs and significance and what makes you stand out from the rest of the applicants.

Text Analysis Project- Teen Vogue Magazine of the Past and Future

Posted on November 18, 2021 by Jean Fischer

After researching proper methods in text analysis and text mining, a new concept for a Digital Humanities grad student like myself, I decided to retire my original idea of using text mining to make the claim that Calpurnica, the Finch family’s maid, from Harper Lee’s To Kill A Mockingbird, should be considered more of the voice of reason in the literature’s narrative than the Atticus Finch, the narrator’s father. This idea is to be best used for another type of project in the near future. Instead, I decided to refocus and research the difference in text between the Teen Vogue magazine issues of pre-2020 to the Teen Vogue magazine issues of 2021. Teen Vogue magazine, at its core, is a publication dedicated to expose a younger audience to a worldwide view of fashion, celebrity news, health, and artistry. Of course, generational differences come into play as well as with the magazine also including technology advances, major current events, inclusivity, psychology, fashion, and trends in the teenage world. I wanted to see if this new political stance that the current 2021 Teen Vogue had a stark difference or connection to previous Teen Vogues of the past.

Finding pre-2020 Teen Vogue digital archived publications was a bit difficult. Although there were bountiful technological resources in pre-2020, archiving print publications into a digital platform wasn’t a common phenomenon to procure for future findings in research. I did manage to find a few archived pre-2020 Teen Vogue publications that were digitally archived. These 2011 Teen Vogue issues, however, contained no real focus on politics or injustice, as 2021 Teen Vogue. Instead the focus is rather on celebrities, fashion, make-up, dating and horoscopes. The pre-2020 issues are also not as inclusive in exposure and narratives in regards to celebrities, makeup and fashion representing more white figures.

Teen Vogue Magazine, Edge of Glory: Young Hollywood 2011 Portfolio, 2011

Teen Vogue Magazine, Edge of Glory: Young Hollywood 2011 Portfolio Article, 2011

Combing through a few online published articles from Teen Vogue’s 2021 issues was easily accessible. The focus had shifted, significantly, to reflect current events, in a very strong approach to inform the readers of injustice in politics. Teen Vogue’s online archive was well organized and detailed. I used Voyant to help organize and clarify my data to come to this conclusion. I focused on a few current 2020-2021 articles, with most varying from politics, the 2020 Covid-19 Pandemic, and inclusive narratives ranging from race, culture, sex and identity in addition to fashion, celebrity, health and dating.

Teen Vogue Magazine, Trump Did Not Lose in a Landslide Because the U.S. is Racist, 2020

Teen Vogue Magazine, 5 Dominican Women Claiming Space in Music, Fashion, and Women’s Liberation, 2021

In addition to the text mining process, I did some outside research on the range of change with the new scope of publications and current events to find a huge conflict of interest in this new change of direction. This new change of direction stems from a major Public Relations crisis resolution, from Teen Vogue Magazine team, taking charge of the narrative of the magazine after the former editor in 2020 published inappropriate and racists tweets on Twitter. This discovery has led my own conclusion in a different direction comparing Teen Vogue of pre-2020 to Teen Vogue of 2020-2021 with this addition pre-text of the firing of Teen Vogue’s editor over racist remarks, seeing that instead of just adding a political aspect to their editorial, Teen Vogue Magazine has now added an inclusive narrative, ranging from not only it’s topics that to relate and educate a broader audience, but also it’s own writers, contributors and subjects, to stand in solidarity, understanding , and pride with their wide-range of young readers around the world. This differs very much from the pre-2020 narratives of Teen Vogue Magazine, that did not press upon any of these subject matters in regards to inclusion.

Below or some links to some archived pre-2020 Teen Vogue Magazine Articles:

Edge of Glory: Young Hollywood 2011 Portfolio
Young Hollywood’s brightest stars have good looks, killer style, and the hottest projects. Are you ready for the new sensations?

Below are archived links to recent articles of the current 2021 Teen Vogue Magazine archive:

5 Dominican Women Claiming Space In Music, Fashion and Women’s Liberation
Teen Vogue talks to Dominican musicians La Perversa, Red, La Moyeta, Rosaly Rubio and Ross Maria

Trump Did Not Lose in a Landslide Because the U.S. Is Racist
The 2020 presidential election results were no landslide for Joe Biden because the United States is ruled by a minority and many voters truly support Donald Trump.

Conjunto’s Vocabulary

Posted on November 12, 2021 by Valeria Alderete

A look at conjunto music lyrics through a text analysis lens.

Some background

Before diving into the focus of my project, I thought it important to provide some context surrounding the project topic. First, I should probably answer, what is conjunto music? While my project initially aimed to address this question, I found that further development of the project is needed to provide a clearer definition to my audience. So, I share with you a description of the music from the organization that I currently work with:

“Conjunto — the traditional music of South Texas — dates back to the end of the 19th century. European settlers moved into the area with their button accordions and began ‘making music’ with Mexican settlers who favored the bajo sexto (a lower sixth bass guitar traditionally used in Norteño music). The results were magical, and became the signature sound of South Texas — particularly San Antonio.”
Conjunto Heritage Taller

The Corpus

Compiling a body of text was one of the most difficult and timely steps in this process. Because conjunto music has widely been taught and passed down via storytelling methods and performed “by ear,” there are many elements of the music that are not widely known or available, especially in digital formats — including lyrics. So, I spent much of the early process for this project performing online searches and collecting transcriptions and lyrics. To my surprise, I compiled a list of 54 songs — more than I thought I would be able to find considering the short window of time I had to do so. While this gave me a decent “bag of words” to work with, I hope to eventually develop this project with several more songs.

The sources were entirely web-based, consisting mostly of the “lyrics” provided directly on Google’s Search result page, but not all song lyrics were available in the first Google search. For quite a few songs, I had to dig deeper into websites such as Genius.com and Musixmatch.com or online forums, where conjunto fans provided the lyrics.

Disclaimers:

All lyrics/text used in the project are in Spanish – Conjunto song lyrics are often in Spanish, with a few English versions, but I used songs with Spanish lyrics only, for a fair comparison (there would be much more Spanish lyrics than English ones – this would produce inaccurate outcomes)
- Many popular conjunto songs are instrumental – this project explores conjunto songs with lyrics that have been transcribed and are available online.

Methodology

Prepare text – I wanted to use Python or R for data prep/cleaning and then load the corpus into Tableau, but I ultimately decided to use Voyant for data prep and cleaning due to time constraints + my beginner level skills with R and Python.
Specify Stop words – common stop words with no particular meaning, including articles, were removed from the analysis so that the words analyzed were meaningful. Below is a list of some of the stop words:

a, al, la, el, ella, el, yo, tu, he, etc.

Experience/Reflection

The focus of this project is to provide a high level overview of conjunto song lyrics, revealing underlying sentiments and themes, with the primary audience being people who are new to the genre and seeking insight on the style and history of this genre. After exploring the corpus with Voyant’s capabilities, though, I discovered that a deeper sentiment analysis and some form of time-series analysis might be required for the particular end-goals I had in mind (more on this later). With that said, I thought Voyant had some powerful features for other high-level exploration.

This image has an empty alt attribute; its file name is image-10.png — Cirrus/Wordcloud

After specifying stop words, I was able to produce the above Cirrus visualization and term table, revealing the most frequently used terms in the corpus. Most of the top 10 terms may appear seemingly “positive” (sentiment) — translating to “love”, “life”, “desire”, “heart”, “man”, “soul”, “look”, “joy”, “angel”, “God” — however, in many of the songs within the corpus, the sentiment is actually “negative”, surrounding themes of heartache. While this visualization is interesting, more context is needed to provide a more accurate representation to people who are not familiar with the music.

The below collocate table to the right does a slightly better job at revealing more accurate sentiments and themes. Outlining terms that frequently appear in proximity of each other within the body of text gives a bit of context, in some cases, allowing for more accurate interpretation. For example, pairs #2 (“angel”, “return it/give back”) and #9 (“love”, “ungrateful”) might reveal some sense of negativity to users. This was probably my favorite Voyant feature, but still not quite accurate for a true sentiment analysis. Again, more context is needed.

Voyant seems to offer a type of sentiment analysis feature, allowing analysts/users to specify “positive” and “negative” categories, and while I was initially excited to use this feature, I quickly realized this too would require more context in order to avoid mis-categorizations and misleading my audience.

Ultimately, it seems as though any form of accurate sentiment analysis is not possible when treating each word within the corpus as just that — a single word, with no context. Perhaps other text analysis tools are capable of approaching sentiment analysis with phrases and context parameters for better results…

Other positives:

Excellent handling of different language grammar – made it super easy to work with accent marks in my corpus, something I initially struggled with when trying to format my text for use in Python.
- The idea behind the DreamScape feature is super cool (mapping the geographic references within the corpus to an actual map for visualization) but it has its drawbacks/quirks – many locations referenced within the corpus were not mapped, such as Tijuana, Minnesota, and Louisiana. If this were more accurate, it could be very insightful to people wanting to learn more about the roots of conjunto music, which are tied to South Texas and Mexico.

Other issues not related to Voyant:

Inconsistencies in the use of accents (some transcriptions used accents while others didn’t)
Small corpus – while I compiled a corpus of 54 songs, the wordcount was less than 8,000 after specifying stop words. I hope to continue this project, eventually compiling a much larger corpus.

Further Developments:

Aside from continuing to compile and clean a larger corpus for this, I think Voyant’s “Trends” timeline/line graph, could be very insightful in revealing trends and/or changes in conjunto music lyrics over time. Of course, this would also require organizing the songs in chronological order (which would likely prove to be yet another major issue considering the absence of proper documentation and details regarding these songs online), but something for organizations like mine to think about. Currently, the below line graph of “Trends” reveals nothing in terms of timespan because the songs are in random order (basically, in order of my search for them, which had no strategic approach other than the goal of finding lyrics).

Use of Future and Past language in High School Yearbooks from 1919 – 2015

Posted on November 11, 2021 by Caroline Kelly

Origin of the Assignment

With my natural gravitation towards studying what is not traditionally studied, I delved into the world of adolescent mementos in the form of Autograph Albums and Yearbooks.

I came across an old Autograph Book filled with quotes and messages to a “real swell girl” living in 1940’s upstate New York. Autograph Books date back to the 16th century and eventually became outdated by the 1970s with yearbooks. I was really taken by the difference in language between early 19th century teenagers and my own youth and even the present day. Far different from “H.A.G.S.” (have a good summer) messages the albums included life advice, hopes for the future, quotes, poems, and other messages from classmates.

Autograph album of Betty Jean Clarke of Clinton, New York

“Best wishes to you for alife of happiness, success and the realization of your ambitions”

“Yours till the sand of the desert grow cold, And the leaves of the judgement book unfold.

P.S. Lots of Luck and Happiness”

But I was also taken by such clout mixed with sentimentality at the critical age when seniors are simultaneously closing the chapter of their childhood and beginning their early adulthood. It is both a time of mourning, nostalgia and one of hope. A moment shining with opportunity and devoid of regret.

Why Yearbooks?

I set out to cast a net through teenage ephemera to observe the changes in the past century in the way youth communicate about their beliefs and values in the past and the future. I also selected a list of years that have been reported by historians as being difficult including: 1918, 1929, 1941, 1962, 1968, 2001, and 2020.

I approached the project as both an exploration, but also as an experiment seeking to test my hypothesis. My methodology was aimed at reducing as many confounding variables as possible and providing as much data to abstract statistical relevance.

I opted for yearbooks instead of the richer text of autograph albums for the following reasons:

Controlled sample
1. Yearbooks follow the same format and have not changed. They are relatively the same across state lines, districts, time and advances in technology (we still follow the same format in 2020 as 1920.
2. Age of “authors” are consistent – they are all teenagers
3. Confounding variables such as location, socio-economic class, and gender could be better controlled due to the large sample available. For this project yearbooks from the same co-ed school were used.

Accessibility
1. Yearbooks have been scanned by ancestry.com or classmate.com websites
2. Available in .txt formats
3. Common item know by most people

Of note
1. Yearbooks have a narrative structure with a clear publishing date
2. Public documents, not personal items
3. A commercialized industry with set standards and formats
4. Not typically studied

Tools

InternetArchive.org website
Voyant
LIWC – Linguistic Inquiry and Word Count
SPSS

Sample

Malden High School located in Malden, Massachusets was selected as the sample as there was a consistent record of yearbooks dating back to 1919. Other schools were eliminated as they did not have co-ed classes in earlier years. Years in the 1920s had 2 books per year titled A and B.

Methods

105 yearbooks dating from 1919 – 2015 were downloaded as txt files.
Voyant platform was used to analyze the documents
A list of stop words was created to exclude common words “Street”, proper names “John”, and clean out text from computer scanning like “pxl”.
The most prominent 300 Corpus Terms were downloaded and categorized in the LIWC dictionary
The full text of all 105 yearbooks were also processed through the LIWC platform
In SPSS Yearbooks from the years 1919, 1929, 1941, 1968, and 2001 were marked as critical years and Yearbooks published in 1920, 1930, 1941, 1969, and 2002 were also coded after and Yearbooks published in 1918, 1929, 1941, 1967, and 2000 were coded as year before critical year.
Preliminary SPSS tests were used to compare LIWC scores between critical years and years immediately proceeding and following.

Preliminary Results

Cirrus Word Cloud of top terms across yearbooks

“Future” and “Past” terms throughout Years. Overall the term “Future” is used more than the “Past” for most years.

Overall yearbook text has more language associated with the past than the future.

Across all yearbooks, LIWC analysis found that positively associated text was more common than negative associated text.

Of the negative emotions, sadness was more prominent than words associated with anxiety or anger.

A comparison of texts in critical years did not show a statistically significance difference in future or past-oriented text when compared to non-critical years. However, a comparison of all LIWC variables found a significant difference between words associated with feelings and death between critical and non-critical years.

Conclusion

Additional analysis utilized grouping terms would provide additional insight in the relationship between cognitive states as categorized in LIWC between yearbook years. Further development of a measure of “critical years” is also needed.

Other steps for this dataset would include combining the files for years in the 1920’s that were split in A and B yearbooks. Also, expanding the pool to other schools would allow for a more diverse dataset and allow for comparison between schools. Similarly comparing the popularity of words using Ngram may provide an additional frame of reference.

Anti “Critical Race Theory” Bills: A Text Analysis.

Posted on November 11, 2021 by Jose Michell Brito

My initial interest for this Text Analysis project was about Critical Race Theory (CRT). I was specifically interest in the language being used in bills passed against it. My main questions were 1. What is the language being used and how is it based on fear? What do these texts tell us about people’s understanding of what CRT is? What other issues come up that I did not anticipate? I had other interests in exploring, but this would require plenty of time and more technical skills to develop. For instance. I would be interested in exploring bills passed vs those proposed and not passed and if there is there a difference in language used. Liberal vs. Conservative media text analysis of how these bills and issues are being spoken about and the language used therein. How these bills vary state to state or if there is a concrete understanding of what in the world is actually happening here. It would also be interesting to see the developments of these bills through time.

What is the heck is CRT and what are these bills all about anyways?

Texas is just one of a handful of states that have approved legislation against the teaching of Critical Race Theory in grades K-12. Gov. Greg Abbott signed a bill that prescribes how Texas teachers can talk about current events, American history, and racism in the classroom. Other State lawmakers and education policymakers throughout the country have joined in the efforts to make this a nation-wide ongoing debate over how to teach this not-so-complicated to communicate history and truth about of race and racism, but also sexism, equality, and justice.

Critical Race Theory vs. “Critical Race Theory”

CRT is an academic term that dives into how race and racism have impacted social and local structures in the US. A nearly 40-year-old concept, its core idea is that racism isn’t merely the product of individual, interpersonal bias, or prejudice. It asserts that race is a social construct made systematic and embedded in legal systems and policies. It’s as American as…well nothing else, really. Apples are not even indigenous to North America unless you fancy a sour apple pie.

The basic tenets of CRT emerged out of a legal analysis framework of the late 70’s and early 80’s created by legal scholars Derrick Bell and Kimberlé Crenshaw, to name a few. An example where CRT was important was in tackling the issue of redlining, where government officials in the 30’s literally drew lines around areas deemed high financial risks. Often, race was the only factor influencing who was allowed to generate wealth and who was doomed to generational poverty. Banks refused, and were not allowed, to offer mortgages to Black people within lines drawn. These violent acts still haunt us today. Policies of the past haunt many of us today!

CRT has also influenced other intellectual fields concerned with issues within the humanities, social sciences and teaching like political power, social organization and language.

But CRT is being misunderstood by conservatives, almost exclusively, and it seems to be on purpose (assuming conscious and high-level intelligent strategy used). This academic term is being misused and conflated with issues and topics of inequality, anti-racism and social justice. Instead of helping to analyze abstract and almost meta ideas about how society is structured and its implication, it is being used to speak about liberal challenge to American ideals of group identity, nationality, pride, and unity. CRT is now cited as the basis of all efforts around diversity and inclusion. Topics around sexism, women’s rights, LGBTQ+ history and justice, the holocaust, eugenics etc, are also being lumped into these arguments. Insecurities around anti-government trust and policies as well as conspiracy theories are also included.

Over this past year, GOP leaders have decried teaching of CRT in public schools. The frenzy started when Trump banned federal employees from participating in trainings discussing “CRT” and white privilege, calling it propaganda, adding it to his conservative bucket of things anti-American. Since then, it has been downhill spiral into a nonsensical frenzy. It’s truly just sad and depressing.

I googled bills passed and came across a couple of sites that have compiled a list of and mapped states and their status concerning CRT. I used the most up-to-date article from EdWeek.org, which also provided links to each bill and their current standing. I choose to focus on bills passed. A larger project would contain all the bills, passed or not, for an even more in-depth analysis on the language used around banning CRT in classrooms.

Title: Map: Where Critical Race Theory is Under Attack

Source: https://www.edweek.org/policy-politics/map-where-critical-race-theory-is-under-attack/2021/06

Initial thoughts:

I realized that this was going to be a tough project from the start. I did not expect the ridiculous legal jargon and bill text aesthetic to be so annoying, distracting, and ugly. Too many roman numerals, tons of parenthesis, and way too much wording for simple ideas and phrases. Seemed like it was rule following over clear communication. But, alas, I needed to stick with it.

I thought of leaving all the other legal clutter because it just exemplifies how non-inclusive legal jargon is. This gets into issues of access, privilege, class, race, and racism. But that analysis will take a larger project to address in full.

The image above did little to highlight the core concerns in the anti-CRT bills.

I was not surprised that key words dominated the texts. Throughout the bills analyzed, the “protected” identity and life-style words were the most frequent. That makes sense considering that the bills tended to repeat these multiple times throughout the text, for some reason.

I decided to clean the texts of these words to better represent the core words and language used in the bills. Of course, these terms are core, but it seemed more like lip service (or text service) than real deep analysis on any—typical legalese performance. Thought my focus here is on race and identity, I though deleting these words from the text would highlight more unexpected or typically expected terms and ideas. I just went with it.

Of course I could have cleaned this up a bit more, but even after doing lots of deleting and inputing words into the StopWords section, i thought it interesting to leave the other clutter. A larger project would be cleaner, I’m sure. But, the above image shows me that after cleaning these legally required and performative key words, the larger sized and prevalence of terms like “inherently,” “school,” “individual,” “people,” in relation to the smaller terms of “racist,” “adverse,” “oppressive,” consciously,” seems interesting to me. I wouldn’t want to REACH and make outlandish sounding assumptions, yet, but it seems like the typical conservative focus on individualism rings loud enough to understand that what is being challenged by this bill is the default identity based on nationalistic pride and privilege of being just an American, which has historically been about a type of equality that does not center difference but focuses on assimilation.

In the context of an anti-CRT bill, the most used phrase here was “”discrimination in public workplaces and education.” This worries me because it just shows that criticality around the meanings and real life consequences of institutional and systematic racism is being flipped to mark white people, workers and students, as the victims. This is concerning because what this highlights is a lack of critical and deep understanding, an almost inability to abstractly and meta-ly, understand the world.

Another interesting highlight is the context in which the word “individual” is used. Above, we can clearly see that it was followed by the phrase like “discrimination against” which signals to me that the focus on the individual being discriminated against is more interpersonal than systemic. In other words, this New Hampshire anti-CRT bill, maybe unconsciously, acknowledges that white students and white workers are not being systematically discriminated against by another group of people. The focus is on interpersonal issues. Of course bills do not have examples to support their demands, but I wonder if those would be individual claims of “racist” discrimination versus a strategic and systemic and generational discrimination. I am sure that if I analyzed texts by systematically and historically discriminated folks, the issues would be more about a deliberate strategy to exclude and oppress than simply an issue of individual prejudice. Great food for thought here.

I decided to also upload the Texas bill, since that seems to be where most of the conversation is happening. Before cleaning the text of these protected identity terms, I uploaded it and found the text interesting. Instead of a complete ban on anything related to race and the American story of racism, what is the issue with the Texas bill, and most bills “banning” CRT is not so much the absence of an honest history, but about showing “the other side” of these issues. Meaning that speaking about the horrors of slavery will have to include how beneficial, if not necessary it was to the creation of this nation and how intelligent racists were for thinking of this idea. The image below shows the Texas text exactly:

https://capitol.texas.gov/tlodocs/87R/billtext/pdf/HB03979F.pdf#navpanes=0

As you can see, the limitation are, as mentioned earlier, on how “CRT” is to be taught, which is still a massive issue. The holocaust was horrible, but now teachers will have to explain and not be critical of the “justifying” reasonings for it. Women fought for rights! Right on! But they were also happy and they did it because they wanted equality because we all deserve it and white men aren’t inherently bad. These conversations sounds really scary to have with impressionable children developing their sense of right and wrong and learning, hopefully, to be empathetic and not developing sociopathic behaviors.

I think for a future and more prolonged project, I would only insert the lines in the bills that explicitly state the language to be used and not used by teachers, the specific demands and the examples of texts recommended. I do wonder if conducting distant readings of these bills is even effective. I found myself wanting to read the bills and just take the words that I saw interesting. This project did not necessarily justify what I was looking for. I think I would concentrate on popular phrases next time and have sections where I would note if a phrase or words were being (mis)used and misrepresented.

I did find myself enjoying, and reading into, this project. I think more time and energy spent on this will be more fruitful. I look forward to incorporating distant reading and text analysis in future projects. What a great way to see the bigger picture.

Word Clouds of Presidential Debate Transcripts (1960, 1992, 2020)

Posted on November 11, 2021 by David Leshinski

As someone who has never done any text analysis, it took me awhile to come up with exactly what I wanted to do for this assignment. After thinking of all the different categories I could choose from such as something sports related, movie related, politics related, etc., I decided I’d stick to something to do with politics. My first (and favorite) idea, was to find a compilation of all of Donald Trump’s tweets and see if there were any words or phrases that he used most often. While this would have been a pretty funny and entertaining idea, it didn’t work out. On a more serious note, my next idea, and the one I went with, was to use the transcripts from presidential debates and see if we can spot some of the popular topics that were on their minds during that time period. I decided to look at three different presidential debates across 60 years. The debates I chose were Kennedy v Nixon (1960), bush v Clinton v Perot (1992), and Trump v Biden (2020).

Here’s what I found:

Kennedy v Nixon (1960)

In the 1960 presidential debate, we had the young John kennedy and then the not so young and charming Richard Nixon. Just looking at the word cloud above, we can see words like debate, transcript, and october. I noticed when putting a link into the site I used for this project, Voyant Tools, it collects just about every word it can find on the webpage, not just what’s in the actual transcript you are interested in. This trend will carry on through the other debate word clouds that are included in the blog post. Other common words like, mr, Kennedy, Nixon, president, vice, and senator, are because names and titles are used in the transcript to show who is talking and also, they are also obviously said often throughout the debate to address one another. Looking deeper than that though, amongst a bunch of random words, we can see words like communists, war, soviet, union, islands. These debates were during a time of uncertainty and the U.S. was locked in a Cold War with the Soviet Union. On top of that, the United States feared Fidel Castro and the possibility of his regime spreading Communism across the Western Hemisphere. In the second debate, the two candidates argued about two islands islands a few miles off the coast of the Chinese mainland. Kennedy argued that the line of denese should be drawn at Taiwan while Nixon believed they should draw the line where the West has drawn the line against Communism. Nixon ran with the idea that Kennedy would allow the Communists to take the islands. Kennedy ended up just sliding by Nixon and winning the presidential election in what became one of the most remembered presidential debates in history.

Bush v Clinton v Perot (1992)

32 years later, the 1992 debate looked a little different. This was the first presidential debate that included three candidates all on the same stage. Questions were formatted differently and would now involve real voters allowing them to ask pressing questions. In this word cloud, it isn’t as easy to tell what the topics were as the 1960 debate. Although, three words I do see are tax, people and jobs. In the debates, Clinton stressed that America has not invested in its people. He said that when people lose their jobs in his state, he’d probably learn their names. He said people now are working harder for less money than they were making ten years ago. Stating 12 years of trickle-down economics is the result of this. On the topic of taxes, Clintons goal was to have the government stimulate the economy and reduce the deficit by 50 percent over his term. Bush’s proposal was for a balanced-budget amendment, a line-item veto and a taxpayer checkoff rule. Taxes were a heated argument throughout this debate. With Clinton pushing for taxing the rich and leaving the middle class out of it, Bush was weary and “warned” those to keep an eye on their wallets.

Biden v Trump (2020)

That brings us to our most recent debate, Trump v Biden in the 2020 presidential debate. What a trip this one was. In reality, this debate was more of a who can talk louder and not let the other person answer argument. Surprisingly, most of the words that made the chart aren’t very relevant. Although, there are three that are important- deal, million and China. These three words may not seem too crazy but, they encapsulate most of the topics talked about throughout the debates. The Green New “Deal” was brought up often and attacked relentlessly by team Trump. The Green New Deal was a progressive proposal to effectively fight against climate change by reducing greenhouse gasses, create higher paying jobs, invest in new infrastructure. Trump’s criticism was basically that doing this would cost too much money and hurt the economy (as if letting climate change run rampid would be any better). The word “million” was used a number of times. Pleading to America that the coronavirus is tearing the United States apart, Biden said, “Over 7 million people who have contracted this disease. One in five businesses closed. We’re looking at frontline workers who have been treated like sacrificial workers. We are looking at over 30 million people who in the last several months had to file for unemployment.” biden’s approach was to level with the people by providing numbers and talking to the camera often. China was another hot topic all through the year. 2020 was a very unique year with Covid-19 locking us all away in our homes and has probably been the most talked about topic across the world since then. With the virus originating from Wuhan, China, Trump often put blame on them for the pandemic and used the term “China Virus” when talking about Covid-19. On top of that, China was also talked about often when it comes to trade. Biden said he would make China play by the international rules. He believed Trump was too soft with foreign “thug” leaders like Putin and Xi Jinping and made it clear he wanted to stand up to them. this debate wasn’t about facts a lot fo the time and was mostly a screaming match which means, there was a lot of misinformation about China being shared on both sides. Biden falsely stated the deficit had increased with China and trump had falsely claimed Hunter Biden was given millions of dollars from China.

Eye(s) in Poe’s Short Stories

Posted on November 11, 2021 by Martin Glick

The impetus for this project was inspired by a quote from a recent article I read on Poe.

Poe’s fetish objects point towards a larger tradition of objectifying the terrors of the soul in Gothic literature. Old stone walls, devices of torture, evil eyes, casks of Amontillado, tufts of hair, purloined letters, and, above all, the ancient entanglement of death and beauty.
https://www.thesmartset.com/poe-boy/

I thought, since I love horror, the grotesque, gothic literature and film, that I’d like to see how body parts are treated in the prose of Poe. With Prof. Allred we thought it best to focus solely on the short stories, the reason being that language is more concise in shorter works of prose, and less functional.

On GitHub I have deposited the Txt file I used which compiles 69 of Poe’s shorts stories. The reason for including more than just his horror/thriller stories was to take the corpus and ensure I didn’t miss any mention of body part in the oeuvre. A bit of a brute force method, but at this point I didn’t have a very clear hypothesis, and needed to get some useful fragments of text. Using Voyant Tools, the following is the frequency list of terms highlighted by those worthy of note.

Rank : Term : Count
12 : eyes : 295
14 head 283
24 hand 214
33 body 196
34 feet 196
38 mind 187
50 death 167
68 : eye :151

A combined 446 times for eye/eyes. Eye* which includes eyelid, eye-glasses, etc… appears 471 times. Deciding to focus on the clear winner here, I exported the sentence fragments which contained “eye” or “eyes”; with a word context of 10 per each side of the term, then sent that back through Voyant to try to pinpoint characteristically grotesque phrases.

A semblance of a thesis I started out with was that Poe informed our notion of the grotesque, and I was hoping here to get some meaty adjective-noun parings, or at least sentence fragments which demonstrated this sensibility. I didn’t find that initially in Voyant, below is a “link” chart demonstrating most common occurrence in which “eyes” or “eye” and words which appear next to them.

I was really hoping for “dripping eye”, “groaning eyes”, “disgusting eye”, or “ugly eyes”. It seems Poe isn’t as obvious a grotesque writer as I once hoped and I would have to dig deeper into the sentence to find what I was looking for. Off to AntConc to take a closer look!

“Eyes” rather than “eye” revealed an interesting et of singular appearances

I was hoping to see a repetition of certain phrases, that would certainly cement the stamp of “grotesque” on an author right? Poe was more subtle than that, and his juicier bites of prose are saved for a select for of the horror works:

“deliberately cut one of its eyes from the socket” – The Black Cat “They were
wild, bold, ravenous—their red eyes glaring upon me” – The Pit and the Pendulum “deep-set eyes glared with unnatural lustre” – The Gold-Bug “The face was fearfully discolored, and the eye-balls protruded” – The Murders in the Rue Morgue.

Would it have been worth it to have the short stories arranged by publication date? I didn’t double check that when compiling my list, but something to keep in mind for the future. I could have traced easily his use of the term over the course of his writing. Another consideration when compiling a corpus is to nest the texts in a meaningful way. I could have done it my genre, or are least “obviously horror/mystery” and “the rest”.

JSTOR Text analyzer suggested I read up on the latest Ophthalmology research.

Interesting to note “Eye irritation” was labeled an identified term, and I take this to mean the exact term was found in the text. Dress hooks are a tool for sewing which utilize an eye closure.

What I’d like to propose is that Poe has not informed our notion of the grotesque in relation to the body in any consistent way. There are scattershot instances of it across his work, and while “eyes” and “eye” appear more than any other body part, this fact can be credited to his detective novels. His apparent influence stems from a few influential stories which loom large in the public eye.

A difficult mode of investigation that I chose from the outset, which I won’t repeat again was to tie up an accepted scholarly term like “grotesque” with text mining. The term reflects a mood or sensibility rather than a string of letters. I very well have missed notions of the topic in a fragment because it was the whole paragraph which spelled out the mood. the grotesque is found after all in the vulgar expansions in size or of use which doesn’t hinge on an identifiable term.

Last ditch effort! I took from the complete short story collection, phrases which mentioned a body part at all. I identified: arm, face, feet, hand, head, mouth which were statistically significant and thought I’d try to see which grotesque terms show up the most in relation to the set.

what this shows is general use, feet being the least used, and arm being the most.

It has slowly dawned on me that what I wasn’t searching for could be found in repetition or frequency, at all. Which makes the prospect of using a distant reading mode difficult. I was instead looking for singular instances of the grotesque which are used sporadically and for effect in Poe’s work. At least I was able to pinpoint which body parts were used most often in his work, which might be important for a specific kind of academic study.

After posting edit thoughts: What I wanted to measure was sentiment, which isn’t what these programs identify. Ideally there could be a program which shades sentences or even paragraphs in different colors depending on their intensity or coolness, parochial or transgressive qualities. There are sentiment analysis tools used for decoding Social Media posts, and I wonder if they could have helped me.

The Odyssey: “For the use of those who cannot read the original”?

Posted on November 11, 2021 by Ostap Kin

For this assignment, I decided to focus on the study of several translations. The subtitle of my blog post is actually a reference to a subtitle to one of the translations. Can we understand the translation or more broadly a text if we don’t read the original — that is, can we understand a text if we use distant reading, not close reading?

Voyant Tools helped me understand if it’s possible to see if (and how) a target language—i.e., a language into which work is translated—changes through the centuries. How does the language actually change? And what kind of influence does it make on the translated text? Do the vocabulary of those translated texts differ and if yes, then how? What are the actual differences? Is it even possible to ask these questions without closely reading the texts or this distant reading is the way to start thinking about these problems?

To investigate a set of these research questions, I concentrated on English-language translations of Homer’s Odyssey. This should be a good example to try to answer those questions because we have many translations of this classical work into English and, as we know from readings, the richer our dataset is, the more interesting our outcomes might be.

Thanks to Gutenberg Project, I was able to locate as many as seven translations of Odyssey—and these constitute the core of my project. There are many more translations of this work—a Wikipedia page about Odyssey, for example, lists published translations of the work and its number is around one hundred. Thus, it could be material for a large research project.

First, I located the five translations through Gutenberg Project—copied and pasted the texts into different Word files. I did this because the files on the website contain some information produced by Gutenberg Project, translators’ notes on translations, notes to their translations, notes to the text, various additional materials—i.e., those parts which are not in the original literary work. And I wanted to focus entirely on the literary text, those had to be excluded. This way the additional materials won’t interfere and make any influence on my research.

After I uploaded five translations, here are some of my findings. Thanks to Cirrus, it was possible to see a word cloud that visualizes the top frequency of a corpus—in this particular, case of all five translations combined. The top 55 word frequencies are depicted in this word cloud.

I wanted to explore and get more information about words in the whole dataset. A function called summary came up with the total number of words in all files (611,788) and the number of unique word forms (28,282). In addition to that, one of the features of the summary is that it could provide distinctive words in all five words. This demonstrates the changes in five translations and underlines to need to study the question of why these texts show so many different distinctive words. The summary provides some interesting information about the whole corpus which consists of five translations of the same text. We can study the longest texts (number included), and the shortest texts. It’s also possible to observe vocabulary densities and distribution of density across the whole corpus. Finally, over interesting information: average words per sentence, frequent words, and distinctive words.

The next question I delved into was how particular words trended and how these could be depicted through a line graph. We already have the five top words in the whole dataset and could be the ways these five words appeared and were used in the respective translations. Curiously, the use of the word “spake” in the translations increased tremendously when you compare the translation produced in 1614-16 and 1726 with those in 1879. Also, quite unsure of what happened with the use of the word “Ulysses” in one of the translations in 1879. Interestingly, both “son” and “shall” were used more or less on the same level.

The most widely used word in all these translations is “Ulysses,” and therefore I became interested in how it is used within certain terms. Microsearch visualizes the frequency and distribution of the word in all five texts—you can view the “map” of this word in all five texts and think about if its frequency changes. If it does change, then what might be the reasons for that?

Conclusions. Without knowing a source language, it’s always tricky to work with translated works. However, if one intends to compare just the translated texts, these results might be of certain interest and might help to pose further research questions which one can solve with the help of traditional close reading. Distant reading can certainly diversify the study of literature.

Troy – Text Mining

Posted on November 11, 2021 by Troy Smith

My research focuses on the impact on educational landscape of an historically complacent approach to fundamental mathematics education in the U.S. Reinforcing literacy skills in children at an early age, has always appeared to supersede reinforcing numeracy and, as a result, English literacy is viewed as a defining characteristic of true Americanism. Numeracy, on the other hand, was for a long time considered more of a supplementary ability than a foundational one. Considering what the possibilities of Google Ngram are, and the vast corpora of literature available to Google, I want to explore the historical comparison of the mentions of numeracy and literacy in American literature.

Before I started using Ngram, I had to first figure out how it worked. Ngram uses over 8 million books, which contain over half a trillion words (Pechenick et. al, 2015). The books have been scanned by Google and, based on the words you enter into the search bar, Ngram informs you of that n-gram, what percentage of them contain the specific term you entered (Google).

Before I started searching, I had to decide on baseline parameters for my Ngram searches. I decided on the following:

Since my focus is on American history, I would use the American English 2019 corpus which is defined as ““Books predominantly in the English language that were published in the United States.” (Google)
I would search without case sensitivity. It makes no difference, for my purposes, if ‘numeracy’ or ‘Numeracy’ is written.
I would use a smoothing of 3 (which essentially outputs averages over 6-year ranges). Any smoothing smaller than that shows too many fluctuations (I realized from this why they use the term smoothing because the graphs look really rigid with low smoothing numbers), and since I am looking at the data over such an extended period of time, that is a sufficient range.

Now, I recall reading during my research that the terms numerate and numeracy were not widely introduced in America until the 1950s, so I wanted to see how accurately that is reflected in the literature mentions (Cohen, 1982). Based on Google Ngram, this historical note appears to be accurate.

Furthermore, since the term numeracy had not yet been coined early in American history, I found myself looking for analogues of components of literacy, preferably unigrams (so that they are compared against the same corpus), as compared to numeracy and its components to make comparisons. Comparing the terms literacy and numeracy, illuminates my point, but does not provide much useful data as the mentions of literacy significantly outpace those of numeracy.

Considering how infrequently numeracy was mentioned during the first two centuries of the American republic, I decided to focus my searches on the following terms.

Literacy: reading, writing, dyslexia

Numeracy: mathematics, arithmetic, algebra, dyscalculia

Early in my text mining, I realized I had a big problem when it came to the mentions of ‘math’ and the mentions of ‘mathematics’. For all intents in purposes, we know those two words to mean the same thing in America and, as such, they are used interchangeably—with ‘math’ generally being used for brevity. Let’s take a look at the mentions of ‘math’ and ‘mathematics since 1700. Somehow, around 2013, mentions of ‘math’ exceeded those of ‘mathematics’. I was not able to develop any theory for why that is the case other than it is shorter to type.

As I mentioned earlier, the mentions of literacy far outpace the mentions of numeracy so using one of the advanced usage features of Google Ngram, I sought out to compare the ratios. Moreover, I wanted to compare the convergence of the uses of ‘math’ versus ‘mathematics’ in 2013 to ensure the advanced usage feature functioned properly. As such, I used the “/” composition which “Divides the expression on the left by the expression on the right, and is useful for isolating the behavior of an Ngram with respect to another” (Google). I also used the “+” composition, combined with the “/” composition to demonstrate two things: how the word ‘math’ has increased in usage compared to ‘mathematics’ since the 1950s and how the word ‘numeracy’ has increased in usage compared to ‘literacy’ since the 1950s.

Even considering the learning disabilities associated with numeracy and literacy, dyscalculia and dyslexia, respectively, the latter is easily more recognizable in the American lexicon – pedagogical or otherwise. Let’s see what Ngram says about this.

Lastly, I wanted to take a look at which terms are most closetly associated with the word numeracy so I used the “*” function. The “*” function substitutes the most common words that follow a word you enter into a search.

Overall, Google Ngram supports my theories around the emphasis of literacy over numeracy in America. Numeracy was hardly mentioned prior to the 1950s. Furthermore, even before that, in the early years of the American republic, mentions of the components of literacy far exceed those for numeracy.

[TS1]confirm

DH 700 Fa21

Introduction to the Digital Humanities