Anti “Critical Race Theory” Bills: A Text Analysis.

Posted on November 11, 2021 by Jose Michell Brito

My initial interest for this Text Analysis project was about Critical Race Theory (CRT). I was specifically interest in the language being used in bills passed against it. My main questions were 1. What is the language being used and how is it based on fear? What do these texts tell us about people’s understanding of what CRT is? What other issues come up that I did not anticipate? I had other interests in exploring, but this would require plenty of time and more technical skills to develop. For instance. I would be interested in exploring bills passed vs those proposed and not passed and if there is there a difference in language used. Liberal vs. Conservative media text analysis of how these bills and issues are being spoken about and the language used therein. How these bills vary state to state or if there is a concrete understanding of what in the world is actually happening here. It would also be interesting to see the developments of these bills through time.

What is the heck is CRT and what are these bills all about anyways?

Texas is just one of a handful of states that have approved legislation against the teaching of Critical Race Theory in grades K-12. Gov. Greg Abbott signed a bill that prescribes how Texas teachers can talk about current events, American history, and racism in the classroom. Other State lawmakers and education policymakers throughout the country have joined in the efforts to make this a nation-wide ongoing debate over how to teach this not-so-complicated to communicate history and truth about of race and racism, but also sexism, equality, and justice.

Critical Race Theory vs. “Critical Race Theory”

CRT is an academic term that dives into how race and racism have impacted social and local structures in the US. A nearly 40-year-old concept, its core idea is that racism isn’t merely the product of individual, interpersonal bias, or prejudice. It asserts that race is a social construct made systematic and embedded in legal systems and policies. It’s as American as…well nothing else, really. Apples are not even indigenous to North America unless you fancy a sour apple pie.

The basic tenets of CRT emerged out of a legal analysis framework of the late 70’s and early 80’s created by legal scholars Derrick Bell and Kimberlé Crenshaw, to name a few. An example where CRT was important was in tackling the issue of redlining, where government officials in the 30’s literally drew lines around areas deemed high financial risks. Often, race was the only factor influencing who was allowed to generate wealth and who was doomed to generational poverty. Banks refused, and were not allowed, to offer mortgages to Black people within lines drawn. These violent acts still haunt us today. Policies of the past haunt many of us today!

CRT has also influenced other intellectual fields concerned with issues within the humanities, social sciences and teaching like political power, social organization and language.

But CRT is being misunderstood by conservatives, almost exclusively, and it seems to be on purpose (assuming conscious and high-level intelligent strategy used). This academic term is being misused and conflated with issues and topics of inequality, anti-racism and social justice. Instead of helping to analyze abstract and almost meta ideas about how society is structured and its implication, it is being used to speak about liberal challenge to American ideals of group identity, nationality, pride, and unity. CRT is now cited as the basis of all efforts around diversity and inclusion. Topics around sexism, women’s rights, LGBTQ+ history and justice, the holocaust, eugenics etc, are also being lumped into these arguments. Insecurities around anti-government trust and policies as well as conspiracy theories are also included.

Over this past year, GOP leaders have decried teaching of CRT in public schools. The frenzy started when Trump banned federal employees from participating in trainings discussing “CRT” and white privilege, calling it propaganda, adding it to his conservative bucket of things anti-American. Since then, it has been downhill spiral into a nonsensical frenzy. It’s truly just sad and depressing.

I googled bills passed and came across a couple of sites that have compiled a list of and mapped states and their status concerning CRT. I used the most up-to-date article from EdWeek.org, which also provided links to each bill and their current standing. I choose to focus on bills passed. A larger project would contain all the bills, passed or not, for an even more in-depth analysis on the language used around banning CRT in classrooms.

Title: Map: Where Critical Race Theory is Under Attack

Source: https://www.edweek.org/policy-politics/map-where-critical-race-theory-is-under-attack/2021/06

Initial thoughts:

I realized that this was going to be a tough project from the start. I did not expect the ridiculous legal jargon and bill text aesthetic to be so annoying, distracting, and ugly. Too many roman numerals, tons of parenthesis, and way too much wording for simple ideas and phrases. Seemed like it was rule following over clear communication. But, alas, I needed to stick with it.

I thought of leaving all the other legal clutter because it just exemplifies how non-inclusive legal jargon is. This gets into issues of access, privilege, class, race, and racism. But that analysis will take a larger project to address in full.

The image above did little to highlight the core concerns in the anti-CRT bills.

I was not surprised that key words dominated the texts. Throughout the bills analyzed, the “protected” identity and life-style words were the most frequent. That makes sense considering that the bills tended to repeat these multiple times throughout the text, for some reason.

I decided to clean the texts of these words to better represent the core words and language used in the bills. Of course, these terms are core, but it seemed more like lip service (or text service) than real deep analysis on any—typical legalese performance. Thought my focus here is on race and identity, I though deleting these words from the text would highlight more unexpected or typically expected terms and ideas. I just went with it.

Of course I could have cleaned this up a bit more, but even after doing lots of deleting and inputing words into the StopWords section, i thought it interesting to leave the other clutter. A larger project would be cleaner, I’m sure. But, the above image shows me that after cleaning these legally required and performative key words, the larger sized and prevalence of terms like “inherently,” “school,” “individual,” “people,” in relation to the smaller terms of “racist,” “adverse,” “oppressive,” consciously,” seems interesting to me. I wouldn’t want to REACH and make outlandish sounding assumptions, yet, but it seems like the typical conservative focus on individualism rings loud enough to understand that what is being challenged by this bill is the default identity based on nationalistic pride and privilege of being just an American, which has historically been about a type of equality that does not center difference but focuses on assimilation.

In the context of an anti-CRT bill, the most used phrase here was “”discrimination in public workplaces and education.” This worries me because it just shows that criticality around the meanings and real life consequences of institutional and systematic racism is being flipped to mark white people, workers and students, as the victims. This is concerning because what this highlights is a lack of critical and deep understanding, an almost inability to abstractly and meta-ly, understand the world.

Another interesting highlight is the context in which the word “individual” is used. Above, we can clearly see that it was followed by the phrase like “discrimination against” which signals to me that the focus on the individual being discriminated against is more interpersonal than systemic. In other words, this New Hampshire anti-CRT bill, maybe unconsciously, acknowledges that white students and white workers are not being systematically discriminated against by another group of people. The focus is on interpersonal issues. Of course bills do not have examples to support their demands, but I wonder if those would be individual claims of “racist” discrimination versus a strategic and systemic and generational discrimination. I am sure that if I analyzed texts by systematically and historically discriminated folks, the issues would be more about a deliberate strategy to exclude and oppress than simply an issue of individual prejudice. Great food for thought here.

I decided to also upload the Texas bill, since that seems to be where most of the conversation is happening. Before cleaning the text of these protected identity terms, I uploaded it and found the text interesting. Instead of a complete ban on anything related to race and the American story of racism, what is the issue with the Texas bill, and most bills “banning” CRT is not so much the absence of an honest history, but about showing “the other side” of these issues. Meaning that speaking about the horrors of slavery will have to include how beneficial, if not necessary it was to the creation of this nation and how intelligent racists were for thinking of this idea. The image below shows the Texas text exactly:

https://capitol.texas.gov/tlodocs/87R/billtext/pdf/HB03979F.pdf#navpanes=0

As you can see, the limitation are, as mentioned earlier, on how “CRT” is to be taught, which is still a massive issue. The holocaust was horrible, but now teachers will have to explain and not be critical of the “justifying” reasonings for it. Women fought for rights! Right on! But they were also happy and they did it because they wanted equality because we all deserve it and white men aren’t inherently bad. These conversations sounds really scary to have with impressionable children developing their sense of right and wrong and learning, hopefully, to be empathetic and not developing sociopathic behaviors.

I think for a future and more prolonged project, I would only insert the lines in the bills that explicitly state the language to be used and not used by teachers, the specific demands and the examples of texts recommended. I do wonder if conducting distant readings of these bills is even effective. I found myself wanting to read the bills and just take the words that I saw interesting. This project did not necessarily justify what I was looking for. I think I would concentrate on popular phrases next time and have sections where I would note if a phrase or words were being (mis)used and misrepresented.

I did find myself enjoying, and reading into, this project. I think more time and energy spent on this will be more fruitful. I look forward to incorporating distant reading and text analysis in future projects. What a great way to see the bigger picture.

Word Clouds of Presidential Debate Transcripts (1960, 1992, 2020)

Posted on November 11, 2021 by David Leshinski

As someone who has never done any text analysis, it took me awhile to come up with exactly what I wanted to do for this assignment. After thinking of all the different categories I could choose from such as something sports related, movie related, politics related, etc., I decided I’d stick to something to do with politics. My first (and favorite) idea, was to find a compilation of all of Donald Trump’s tweets and see if there were any words or phrases that he used most often. While this would have been a pretty funny and entertaining idea, it didn’t work out. On a more serious note, my next idea, and the one I went with, was to use the transcripts from presidential debates and see if we can spot some of the popular topics that were on their minds during that time period. I decided to look at three different presidential debates across 60 years. The debates I chose were Kennedy v Nixon (1960), bush v Clinton v Perot (1992), and Trump v Biden (2020).

Here’s what I found:

Kennedy v Nixon (1960)

In the 1960 presidential debate, we had the young John kennedy and then the not so young and charming Richard Nixon. Just looking at the word cloud above, we can see words like debate, transcript, and october. I noticed when putting a link into the site I used for this project, Voyant Tools, it collects just about every word it can find on the webpage, not just what’s in the actual transcript you are interested in. This trend will carry on through the other debate word clouds that are included in the blog post. Other common words like, mr, Kennedy, Nixon, president, vice, and senator, are because names and titles are used in the transcript to show who is talking and also, they are also obviously said often throughout the debate to address one another. Looking deeper than that though, amongst a bunch of random words, we can see words like communists, war, soviet, union, islands. These debates were during a time of uncertainty and the U.S. was locked in a Cold War with the Soviet Union. On top of that, the United States feared Fidel Castro and the possibility of his regime spreading Communism across the Western Hemisphere. In the second debate, the two candidates argued about two islands islands a few miles off the coast of the Chinese mainland. Kennedy argued that the line of denese should be drawn at Taiwan while Nixon believed they should draw the line where the West has drawn the line against Communism. Nixon ran with the idea that Kennedy would allow the Communists to take the islands. Kennedy ended up just sliding by Nixon and winning the presidential election in what became one of the most remembered presidential debates in history.

Bush v Clinton v Perot (1992)

32 years later, the 1992 debate looked a little different. This was the first presidential debate that included three candidates all on the same stage. Questions were formatted differently and would now involve real voters allowing them to ask pressing questions. In this word cloud, it isn’t as easy to tell what the topics were as the 1960 debate. Although, three words I do see are tax, people and jobs. In the debates, Clinton stressed that America has not invested in its people. He said that when people lose their jobs in his state, he’d probably learn their names. He said people now are working harder for less money than they were making ten years ago. Stating 12 years of trickle-down economics is the result of this. On the topic of taxes, Clintons goal was to have the government stimulate the economy and reduce the deficit by 50 percent over his term. Bush’s proposal was for a balanced-budget amendment, a line-item veto and a taxpayer checkoff rule. Taxes were a heated argument throughout this debate. With Clinton pushing for taxing the rich and leaving the middle class out of it, Bush was weary and “warned” those to keep an eye on their wallets.

Biden v Trump (2020)

That brings us to our most recent debate, Trump v Biden in the 2020 presidential debate. What a trip this one was. In reality, this debate was more of a who can talk louder and not let the other person answer argument. Surprisingly, most of the words that made the chart aren’t very relevant. Although, there are three that are important- deal, million and China. These three words may not seem too crazy but, they encapsulate most of the topics talked about throughout the debates. The Green New “Deal” was brought up often and attacked relentlessly by team Trump. The Green New Deal was a progressive proposal to effectively fight against climate change by reducing greenhouse gasses, create higher paying jobs, invest in new infrastructure. Trump’s criticism was basically that doing this would cost too much money and hurt the economy (as if letting climate change run rampid would be any better). The word “million” was used a number of times. Pleading to America that the coronavirus is tearing the United States apart, Biden said, “Over 7 million people who have contracted this disease. One in five businesses closed. We’re looking at frontline workers who have been treated like sacrificial workers. We are looking at over 30 million people who in the last several months had to file for unemployment.” biden’s approach was to level with the people by providing numbers and talking to the camera often. China was another hot topic all through the year. 2020 was a very unique year with Covid-19 locking us all away in our homes and has probably been the most talked about topic across the world since then. With the virus originating from Wuhan, China, Trump often put blame on them for the pandemic and used the term “China Virus” when talking about Covid-19. On top of that, China was also talked about often when it comes to trade. Biden said he would make China play by the international rules. He believed Trump was too soft with foreign “thug” leaders like Putin and Xi Jinping and made it clear he wanted to stand up to them. this debate wasn’t about facts a lot fo the time and was mostly a screaming match which means, there was a lot of misinformation about China being shared on both sides. Biden falsely stated the deficit had increased with China and trump had falsely claimed Hunter Biden was given millions of dollars from China.

Eye(s) in Poe’s Short Stories

Posted on November 11, 2021 by Martin Glick

The impetus for this project was inspired by a quote from a recent article I read on Poe.

Poe’s fetish objects point towards a larger tradition of objectifying the terrors of the soul in Gothic literature. Old stone walls, devices of torture, evil eyes, casks of Amontillado, tufts of hair, purloined letters, and, above all, the ancient entanglement of death and beauty.
https://www.thesmartset.com/poe-boy/

I thought, since I love horror, the grotesque, gothic literature and film, that I’d like to see how body parts are treated in the prose of Poe. With Prof. Allred we thought it best to focus solely on the short stories, the reason being that language is more concise in shorter works of prose, and less functional.

On GitHub I have deposited the Txt file I used which compiles 69 of Poe’s shorts stories. The reason for including more than just his horror/thriller stories was to take the corpus and ensure I didn’t miss any mention of body part in the oeuvre. A bit of a brute force method, but at this point I didn’t have a very clear hypothesis, and needed to get some useful fragments of text. Using Voyant Tools, the following is the frequency list of terms highlighted by those worthy of note.

Rank : Term : Count
12 : eyes : 295
14 head 283
24 hand 214
33 body 196
34 feet 196
38 mind 187
50 death 167
68 : eye :151

A combined 446 times for eye/eyes. Eye* which includes eyelid, eye-glasses, etc… appears 471 times. Deciding to focus on the clear winner here, I exported the sentence fragments which contained “eye” or “eyes”; with a word context of 10 per each side of the term, then sent that back through Voyant to try to pinpoint characteristically grotesque phrases.

A semblance of a thesis I started out with was that Poe informed our notion of the grotesque, and I was hoping here to get some meaty adjective-noun parings, or at least sentence fragments which demonstrated this sensibility. I didn’t find that initially in Voyant, below is a “link” chart demonstrating most common occurrence in which “eyes” or “eye” and words which appear next to them.

I was really hoping for “dripping eye”, “groaning eyes”, “disgusting eye”, or “ugly eyes”. It seems Poe isn’t as obvious a grotesque writer as I once hoped and I would have to dig deeper into the sentence to find what I was looking for. Off to AntConc to take a closer look!

“Eyes” rather than “eye” revealed an interesting et of singular appearances

I was hoping to see a repetition of certain phrases, that would certainly cement the stamp of “grotesque” on an author right? Poe was more subtle than that, and his juicier bites of prose are saved for a select for of the horror works:

“deliberately cut one of its eyes from the socket” – The Black Cat “They were
wild, bold, ravenous—their red eyes glaring upon me” – The Pit and the Pendulum “deep-set eyes glared with unnatural lustre” – The Gold-Bug “The face was fearfully discolored, and the eye-balls protruded” – The Murders in the Rue Morgue.

Would it have been worth it to have the short stories arranged by publication date? I didn’t double check that when compiling my list, but something to keep in mind for the future. I could have traced easily his use of the term over the course of his writing. Another consideration when compiling a corpus is to nest the texts in a meaningful way. I could have done it my genre, or are least “obviously horror/mystery” and “the rest”.

JSTOR Text analyzer suggested I read up on the latest Ophthalmology research.

Interesting to note “Eye irritation” was labeled an identified term, and I take this to mean the exact term was found in the text. Dress hooks are a tool for sewing which utilize an eye closure.

What I’d like to propose is that Poe has not informed our notion of the grotesque in relation to the body in any consistent way. There are scattershot instances of it across his work, and while “eyes” and “eye” appear more than any other body part, this fact can be credited to his detective novels. His apparent influence stems from a few influential stories which loom large in the public eye.

A difficult mode of investigation that I chose from the outset, which I won’t repeat again was to tie up an accepted scholarly term like “grotesque” with text mining. The term reflects a mood or sensibility rather than a string of letters. I very well have missed notions of the topic in a fragment because it was the whole paragraph which spelled out the mood. the grotesque is found after all in the vulgar expansions in size or of use which doesn’t hinge on an identifiable term.

Last ditch effort! I took from the complete short story collection, phrases which mentioned a body part at all. I identified: arm, face, feet, hand, head, mouth which were statistically significant and thought I’d try to see which grotesque terms show up the most in relation to the set.

what this shows is general use, feet being the least used, and arm being the most.

It has slowly dawned on me that what I wasn’t searching for could be found in repetition or frequency, at all. Which makes the prospect of using a distant reading mode difficult. I was instead looking for singular instances of the grotesque which are used sporadically and for effect in Poe’s work. At least I was able to pinpoint which body parts were used most often in his work, which might be important for a specific kind of academic study.

After posting edit thoughts: What I wanted to measure was sentiment, which isn’t what these programs identify. Ideally there could be a program which shades sentences or even paragraphs in different colors depending on their intensity or coolness, parochial or transgressive qualities. There are sentiment analysis tools used for decoding Social Media posts, and I wonder if they could have helped me.

The Odyssey: “For the use of those who cannot read the original”?

Posted on November 11, 2021 by Ostap Kin

For this assignment, I decided to focus on the study of several translations. The subtitle of my blog post is actually a reference to a subtitle to one of the translations. Can we understand the translation or more broadly a text if we don’t read the original — that is, can we understand a text if we use distant reading, not close reading?

Voyant Tools helped me understand if it’s possible to see if (and how) a target language—i.e., a language into which work is translated—changes through the centuries. How does the language actually change? And what kind of influence does it make on the translated text? Do the vocabulary of those translated texts differ and if yes, then how? What are the actual differences? Is it even possible to ask these questions without closely reading the texts or this distant reading is the way to start thinking about these problems?

To investigate a set of these research questions, I concentrated on English-language translations of Homer’s Odyssey. This should be a good example to try to answer those questions because we have many translations of this classical work into English and, as we know from readings, the richer our dataset is, the more interesting our outcomes might be.

Thanks to Gutenberg Project, I was able to locate as many as seven translations of Odyssey—and these constitute the core of my project. There are many more translations of this work—a Wikipedia page about Odyssey, for example, lists published translations of the work and its number is around one hundred. Thus, it could be material for a large research project.

First, I located the five translations through Gutenberg Project—copied and pasted the texts into different Word files. I did this because the files on the website contain some information produced by Gutenberg Project, translators’ notes on translations, notes to their translations, notes to the text, various additional materials—i.e., those parts which are not in the original literary work. And I wanted to focus entirely on the literary text, those had to be excluded. This way the additional materials won’t interfere and make any influence on my research.

After I uploaded five translations, here are some of my findings. Thanks to Cirrus, it was possible to see a word cloud that visualizes the top frequency of a corpus—in this particular, case of all five translations combined. The top 55 word frequencies are depicted in this word cloud.

I wanted to explore and get more information about words in the whole dataset. A function called summary came up with the total number of words in all files (611,788) and the number of unique word forms (28,282). In addition to that, one of the features of the summary is that it could provide distinctive words in all five words. This demonstrates the changes in five translations and underlines to need to study the question of why these texts show so many different distinctive words. The summary provides some interesting information about the whole corpus which consists of five translations of the same text. We can study the longest texts (number included), and the shortest texts. It’s also possible to observe vocabulary densities and distribution of density across the whole corpus. Finally, over interesting information: average words per sentence, frequent words, and distinctive words.

The next question I delved into was how particular words trended and how these could be depicted through a line graph. We already have the five top words in the whole dataset and could be the ways these five words appeared and were used in the respective translations. Curiously, the use of the word “spake” in the translations increased tremendously when you compare the translation produced in 1614-16 and 1726 with those in 1879. Also, quite unsure of what happened with the use of the word “Ulysses” in one of the translations in 1879. Interestingly, both “son” and “shall” were used more or less on the same level.

The most widely used word in all these translations is “Ulysses,” and therefore I became interested in how it is used within certain terms. Microsearch visualizes the frequency and distribution of the word in all five texts—you can view the “map” of this word in all five texts and think about if its frequency changes. If it does change, then what might be the reasons for that?

Conclusions. Without knowing a source language, it’s always tricky to work with translated works. However, if one intends to compare just the translated texts, these results might be of certain interest and might help to pose further research questions which one can solve with the help of traditional close reading. Distant reading can certainly diversify the study of literature.

Troy – Text Mining

Posted on November 11, 2021 by Troy Smith

My research focuses on the impact on educational landscape of an historically complacent approach to fundamental mathematics education in the U.S. Reinforcing literacy skills in children at an early age, has always appeared to supersede reinforcing numeracy and, as a result, English literacy is viewed as a defining characteristic of true Americanism. Numeracy, on the other hand, was for a long time considered more of a supplementary ability than a foundational one. Considering what the possibilities of Google Ngram are, and the vast corpora of literature available to Google, I want to explore the historical comparison of the mentions of numeracy and literacy in American literature.

Before I started using Ngram, I had to first figure out how it worked. Ngram uses over 8 million books, which contain over half a trillion words (Pechenick et. al, 2015). The books have been scanned by Google and, based on the words you enter into the search bar, Ngram informs you of that n-gram, what percentage of them contain the specific term you entered (Google).

Before I started searching, I had to decide on baseline parameters for my Ngram searches. I decided on the following:

Since my focus is on American history, I would use the American English 2019 corpus which is defined as ““Books predominantly in the English language that were published in the United States.” (Google)
I would search without case sensitivity. It makes no difference, for my purposes, if ‘numeracy’ or ‘Numeracy’ is written.
I would use a smoothing of 3 (which essentially outputs averages over 6-year ranges). Any smoothing smaller than that shows too many fluctuations (I realized from this why they use the term smoothing because the graphs look really rigid with low smoothing numbers), and since I am looking at the data over such an extended period of time, that is a sufficient range.

Now, I recall reading during my research that the terms numerate and numeracy were not widely introduced in America until the 1950s, so I wanted to see how accurately that is reflected in the literature mentions (Cohen, 1982). Based on Google Ngram, this historical note appears to be accurate.

Furthermore, since the term numeracy had not yet been coined early in American history, I found myself looking for analogues of components of literacy, preferably unigrams (so that they are compared against the same corpus), as compared to numeracy and its components to make comparisons. Comparing the terms literacy and numeracy, illuminates my point, but does not provide much useful data as the mentions of literacy significantly outpace those of numeracy.

Considering how infrequently numeracy was mentioned during the first two centuries of the American republic, I decided to focus my searches on the following terms.

Literacy: reading, writing, dyslexia

Numeracy: mathematics, arithmetic, algebra, dyscalculia

Early in my text mining, I realized I had a big problem when it came to the mentions of ‘math’ and the mentions of ‘mathematics’. For all intents in purposes, we know those two words to mean the same thing in America and, as such, they are used interchangeably—with ‘math’ generally being used for brevity. Let’s take a look at the mentions of ‘math’ and ‘mathematics since 1700. Somehow, around 2013, mentions of ‘math’ exceeded those of ‘mathematics’. I was not able to develop any theory for why that is the case other than it is shorter to type.

As I mentioned earlier, the mentions of literacy far outpace the mentions of numeracy so using one of the advanced usage features of Google Ngram, I sought out to compare the ratios. Moreover, I wanted to compare the convergence of the uses of ‘math’ versus ‘mathematics’ in 2013 to ensure the advanced usage feature functioned properly. As such, I used the “/” composition which “Divides the expression on the left by the expression on the right, and is useful for isolating the behavior of an Ngram with respect to another” (Google). I also used the “+” composition, combined with the “/” composition to demonstrate two things: how the word ‘math’ has increased in usage compared to ‘mathematics’ since the 1950s and how the word ‘numeracy’ has increased in usage compared to ‘literacy’ since the 1950s.

Even considering the learning disabilities associated with numeracy and literacy, dyscalculia and dyslexia, respectively, the latter is easily more recognizable in the American lexicon – pedagogical or otherwise. Let’s see what Ngram says about this.

Lastly, I wanted to take a look at which terms are most closetly associated with the word numeracy so I used the “*” function. The “*” function substitutes the most common words that follow a word you enter into a search.

Overall, Google Ngram supports my theories around the emphasis of literacy over numeracy in America. Numeracy was hardly mentioned prior to the 1950s. Furthermore, even before that, in the early years of the American republic, mentions of the components of literacy far exceed those for numeracy.

[TS1]confirm

Audio Book Blog Response

Posted on November 11, 2021 by Caroline Kelly

Caroline Kelly

In this era where reading is so ubiquitous, we forget that literacy was at one point the sole provenance of an established elite. Despite the democratization of literacy how, and why we read is as crucial as what we read. As The Untold Story of the Talking Book demonstrates the division of reading has become one of the primary methods by which we classify the educated from the uneducated.

Tracing the historical roots of the spoken word, or audio book, Matthew Rubery succinctly demonstrates the artificial divide that arose when the printed word became more readily available. Whereas spoken word pieces were once the norm and accessible to all within hearing distance, the printed word allowed the construction of artificial barriers and gatekeepers. The rise of both the industrial and information ages required the opening of the gates and of particular note is this dismissal of the audio book as being somehow less.

This is inherently a bias of neurotypical individuals as Rubery points out. As the book of John so aptly put it, “in the beginning was the word”. Spoken language was the first abstraction, and so of course eventually manifested in the spoken word narrative. It is such a critical part of our comprehension that it even manifests itself in how we perceive the world. The phenomena of onomatope is fascinating, but the truth is that language goes beyond even this literal abstraction. We say a world like bitter and just in saying the words our mouths mechanically need to almost pucker to say it. Did the etymology evolve as a consequence of this movement or was it created, that is this movement created by this definition? There would seem to be an inherent relationship between sound and language that shape and define one another. Such is our natural bias towards the spoken word that Rubery says that deaf children have a far more difficult time learning how to read than do the blind.

Even “silent reading” incorporates an element of the audio as many people read by hearing the disembodied narrator’s voice in their heads. We are all in essence listening to an audio book if we have an inner monologue. The difference between the reader providing the voice versus that of an artificial narrator in an audio book may lie less in the audio book format and more in the subtle nuances of meaning and who provides it. Whether the audio book changes the authorial intent is irrespective, as the reader will always alter the writers meaning to fit their own interpretation. It is no more difficult for a listener to alter an audio books meaning than it is for a reader to alter the writers meaning if this is our intent. Arguments that suggest audio books are somehow more prone to this than the written fail to account as to how even video footage can be altered in meaning and even in literal content by a viewer’s memory. This bias may arise from the fact that audio seems so much more ephemeral because it has no physical presence but it is in fact no less verifiable than a written note.

Rubery goes on the cite how neurological evidence even seems to confirm the lack of distinction between audio books listening versus reading the printed word. To dismiss the audio book then as lesser is to demonstrate a desire to delineate valid from invalid. There is little basis to do so beyond a need to divide. As we begin to understand reading as a process of abstraction and informational transmission and less as the literal act of silent phonetic recitation we approach the realm of Digital Humanities. This middle ground retains an appreciation for both the end goal and the process by which that goal is met.

A Cursory Exploration of Fielding’s Preoccupation with Violence

Posted on November 11, 2021 by Kai Prenger

Premise

A couple years ago, I read Henry Fielding’s The History of Tom Jones for an 18th century British novel class. One aspect about that work that interested me was how the protagonist is repeated on the edge of entering a duel or engaging in a fistfight, mostly as a result of his philandering. The framing of this dynamic in most scholarship revolves around Fielding’s abhorrence of the honor culture amongst the aristocracy in Britain, and its deleterious effect on the English soul by way of escapades abroad that included the colonization of lands and wars. That said, I found that most of his considerations about violence centered on lawlessness within England, where Highwaymen could pillage as they pleased. Fielding’s founding of the Bow Street Runners, what eventually became the Metropolitan Police Service in London, also highlighted for me what seemed an unusual preoccupation with public violence in 18th century Britain.

For this assignment, I wanted to begin interrogating my intuitions about his interest in addressing crime performing a distant reading of Henry Fielding in comparison to a small corpus of 18th century British literature.

Tools

While I played with the Natural Language Toolkit in Python, seeing how I could remove stop words, case everything in lower case, remove punctuation, I felt I’d get tangled too much in tweaking things to get on with the analysis in an initial exploration like this assignment. Using Voyant Tools not only allowed for quicker distant reading, but also serves as an excellent place to develop research questions that could be handled in detail with more flexible tools at another time. I also used n-gram as a gut check against a large corpus than I could put together by hand.

I chose line graphs and stacked bar charts for visualizations, as they are reasonably easy to reason about 😉 .

Analyses

Tom Jones

My first stop was to add a text file version of Tom Jones via Project Gutenberg, a ready at hand source to compile corpora for Voyant Tools to consume.

Here’s what a gleaned from viewing Tom Jones in Voyant Tools.

Searching with the wildcard violen* when compared to happiness, happy and joy reveals the prevalence of violence mentioned throughout Tom Jones. From memory, I can imagine the highest peaks of violence revolving around (1) when Tom saved Jenny Jones (Mrs. Waters) from Northerton attempted rape. As each of these search strings represent about 1% of unique words in the text (13,019 unique words in total), comparing the relative frequency of these terms seemed useful, without major distortions in scaling. It seems as though violen* features heavily in the text, though one might assume that it is as equally represented as happiness/joy if combining words with like sentiments. The idea of symmetry between these terms dovetails well with Fielding’s love of symmetry in plotting.

Violen* vs. Happy, Happiness and Joy in *Tom Jones*

Joseph ANdrews

I have also read Joseph Andrews, another novel by Fielding, and outside of one particular scene where Highwaymen rob a carriage Joseph is on, and strip him naked, I couldn’t remember there being too many threats of violence.

Here’s what I gleaned from viewing Joseph Andrews in Voyant

Two points to make when viewing Joseph Andrews work frequency for similar strings. One is that each of these search strings make up less of the unique words in this text vs.Tom Jones (each less than .3% of the 9454 unique words). However, we do is the same sort of symmetry, with the terms connoting joy and violence showing similar parabolic shapes in the middle of the document segments.

Violen* vs. Happy, Happy* in *Joseph Andrews*, by Henry Fielding

Corpus of other 18th British Novels

At this point, I wanted to see these terms in relationship to each other in a larger corpus. I created small corpus of other 18th century British novels I’ve read over the past few years, including, in this order:

Clarissa and Pamela by Samuel Richardson
Evelina and Camilla by Fanny Burney
Tristram Shandy by Laurence Sterne
Moll Flanders and Robinson Crusoe by Daniel Defoe
The Female Quixote by Charlotte Lennox
The Monk by Matthew Lewis
The Mysteries of Udolpho by Ann Radcliffe
Gulliver’s Travels by Jonathan Swift

Here’s what I gleaned from viewing this corpus in Voyant Tools

This corpus represents as broad cross section of published novels during this time, from the scandalous (Lewis) to the popular (Burney and Richardson) to the less canonical (Lennox and Radcliffe). It also delivers an order of magnitude higher number of words when compared to the individual Fielding novels (millions vs. hundreds of thousands). I found the line graph of violen* and the happy keywords to be a little harder to read in this instance, though it certainly seems like violence hovers below the peaks we saw in Fielding’s novels, and the trend is downward over even later part of the corpus, which includes some of the more grift spectrum of the corpus (order of the texts masters in this context). If I had to generalize, Fielding looks more concerned about violence instead of happier words,

Our keywords from a larger corpus of 18th century British novels

Stacked Bar charts of Fielding Novels

Viewing these keywords across reading time in Tom Jones offered an insight into the plot structure of comedy, though I suppose this is not something that distant reading alone as uncovered by comedy as a genre.

Violence is overrun by happiness over time in *Tom Jones*

What was more surprising is the persistent of violen* in Joseph Andrews. I wonder if this is because this novel revolves around Joseph trying to “preserve his chastity” through a series of encounters with desirous women. In the case of Tom Jones, the protagonist is having sex throughout the novel, with only occasional spikes in violence during his expulsion for his adoptive father’s estate, and the incident with Northerton and other encounters with violent men.

Violence persists throughout Joseph Andrews

N-Gram Viwer

As a further point of comparison and perspective, I looked at my keywords in Google Books N-Gram Viewer in British English from 1600-1900. I’m struck by the increas in happy, happiness and violence starting in the mid 1750 (around the time Fielding was writing). Given this larger corpus, I find it hard to dismiss the null hypothesis: that Fielding wasn’t more preoccupied by violence than his peers.

Reflections

I will share three reflections having completed this exercise

It is definitely the case that even a cursory exploration as shown above can generate more specific research questions. For instance, does the relationship between sex-seeking (Tom Jones) vs. sex-averting (Joseph Andrews) account for the differences seen temporally in each of those texts? How is that related prescribed gender expectations of men and women in 19th century England? Some of the critiques of distant reading straw man the method as an unreflecting and totalizing effort. But there are a number of sophisticated text analysis methods to perform, not contemplated here, that could be enlightening in conjunction will closer readings of each text.
Some of Voyant tools are helpful, while others seem to represent spaces for further analysis. One example: the context table is hard to make sense of in the tool as opposed to the visualizations. Though I suppose Voyant Tools is aware that they aren’t the last word in text analysis
Call me old fashioned, but a purely distant reading of any string of characters is missing the point of what a text is. As Roland Barthes put it in the opening of S/Z:

There are said to be certain Buddhists whose ascetic practices enable them to see a whole landscape in a bean. Precisely what the first analysts of narrative were attempting: to see all the world’s stories (and there have been ever so many) within a single structure: we shall, they thought, extract from each tale its model, then out of these models we shall make a great nar- rative structure, which we shall reapply (for verification) to anyone narrative: a task as exhausting (ninety-nine percent perspiration, as the saying goes) as it is ultimately undesirable, for the text thereby loses its difference.

And this is explicit task of distant reading: to lose the difference of individual texts and align them in larger sociologies of literature. Like Barthes, I view a text as product of the interaction of a reader with the work presented. I certainly couldn’t have reasoned about the “temporal” changes in the keywords in Joseph Andrews or Tom Jones having not experienced these books through reading.

Olivia Maccioni / Text Mining with Pandemic Food Writing

Posted on November 11, 2021 by Olivia Maccioni

Project Introduction

As someone who works in the restaurant industry, I am always thinking about food and dining. The COVID Pandemic had, and continues to have, a major impact on the industry, so I was excited to dig deeper into its effects, particularly in New York City, through this project.

Coming out every week, The New Yorker “Tables for Two” restaurant reviews have remained a staple in New York City food writing, even through the pandemic. While it might not be the most robust writing on the state of food more generally, I thought it would be a good place to start for analyzing trends in dining. It also has a dramatic impact on restaurants, and is sometimes responsible for huge booms in visitation.

As I’ve shared in class, I have trouble coming up with research questions, but always know the sorts of topics I’m interested in. I shared this with Filipa and she noted that sometimes with text mining, it’s best to simply upload the corpus in question and see what comes up. I chose to work with all of the “Tables for Two” reviews of 2019 and 2020 in order to have a bigger corpus to work from, and to be able to compare the before and after effects of the pandemic more specifically. I knew I wanted to focus more deeply on the corpus rather than learning a new software/tool for the assignment, so I choose Voyant for its ease of uploading and working with texts.

Starting with Voyant

Getting started, I tried simply uploading the links to all of the reviews, but came across a mess of words. Each New Yorker digital review contains links to other articles published in said edition, along with dozens of links to various New Yorker digital features and common website lingo. After inputting the webpages for analysis, the most common words that appeared in the word cloud were not very helpful:

While words from the site and related articles could certainly prove interesting to explore (and definitely heightened the use of the word pandemic), it proved too time consuming to try and remove each repetitive word using Voyant’s “Define” option. I also found that said option was not very successful, and often kept words and variants of words I hoped to remove in the cloud. In turn, I went the old fashioned way of cleaning my data, and copied and pasted only the text of each article into a Word Document. Of course, this still required some cleaning, so after some more massaging of removing words like “new” “restaurant” “food” and “yorker,” I got started exploring my texts.

(As a quick aside, I took some time to really think about what it means to remove words from a textual analysis. Every text comes with context, and it felt a little like cheating to be removing the reviews from their origin point – especially if I wanted to compare the state of dining to the rest of the world in 2020. That said, it ultimately felt like doing so would create a different project altogether — or maybe, could be something to focus on for my final project.

Before getting started, I also wanted to learn a bit more about the different features that Voyant offered as a new user, so I watched a few really helpful YouTube videos. I posted them below for anyone interested:

Working with Voyant

Most commonly used words in 2020 “Tables for Two” reviews

Most commonly used words in 2019 “Tables for Two” reviews

I wanted to start easy – what words have become more common in restaurant reviews this year than in the previous year? Unsurprisingly, words like “pandemic,” “home,” “frozen,” “closed,” “cooking,” and “takeout” came to the top in 2020. Alternatively, “pandemic” “frozen” and “closed” were not featured in any 2019 reviews, and “home” was only used in relation to discussing a chef or restauranteur’s “hometown”.

The popularity of the word “chicken” in 2020 was a surprise, so I did a bit more research, and came upon articles on the chicken shortage in America during COVID-19. Here, I was able to see popularity of an item during COVID that correlated to a larger food shortage in the country. Interesting! The popularity of the word “people” in 2020 also caught my eye, so I looked to compare its use with 2019 using the “Context” feature:

I’m not sure if you would call this a “sentiment” analysis, but you can certainly see the growth in relating the word “people” to more complex issues in 2020. In other words, concepts around “people” in dining and restaurants in 2020 has expanded beyond the world of food in 2020 into conversations of equity and need. Seemed like a plus!

That said, I was surprised to see the lack of conversations on racial equity in particular, given the BLM protests in 2020 that sparked discussions of white supremacy in the industry. Here is where I wished I would have done things differently, but kept my mistake for the sake of learning:

I was hoping to see if there was a trend in speaking about black-owned restaurants during the BLM protests that did not continue into the rest of 2020. As we’ve discussed in class, that summer often resulted in lip service to black populations, rather than actual moves towards equity. Since I did not categorize my reviews by month (which would have required separate Word Documents per month), I was only able to analyze trends as a whole in 2020. This made me realize that Voyant is really a tool used best when comparing different texts as whole units rather than comparing a single text as a unit. Since I didn’t go the time route, I looked at how the word “black” was used in the reviews in 2020 vs. 2019:

The screenshots are unclear since Voyant could not seem to finish loading this analysis, but it shows that in 2020 the word “black” was used with the word “entrepreneur” and “lives” matter” vs. 2019 with “tart” “avocado” and “pepper”. Of course, these results don’t look so good for The New Yorker, and I’m not surprised.

As a final exploration, I went to another corpus, Whetsone Magazine, and their 2020 digital articles on food during the pandemic. Whetsone Magazine is a black-led publication on food by Stephen Satterfield. Whetsone’s 2020 article word cloud did not even contain words like “pandemic” or “takeout” but rather words like “family” “father” “women” and “love”. This reminded me of conversations that we’ve also had in class around what types of content is shared by communities facing trauma, and where words like “joy” and “love” fit in. That said, of course it’s important to also mention that Whetsone’s 2020 articles range in content other than just restaurant reviews, but it shows a different sort of focus on eating during a global crisis.

Where to go from here

Overall, I struggled with this project in that I felt the tool really just helped to prove assumptions I had about texts, rather than surprise me with new learnings. Of course there is always a use case for proving yourself right! Next time I use a tool like Voyant, I would try to focus on further categorizing texts before I upload them for analysis by things like time, genre or author, in order to get some more nuanced readings of subjects through comparison.

If I were to continue this project for my final project in the course, I would be interested in comparing reviews from either different publications, cities, or topics rather than years…asking research questions like:

Which cities saw the biggest changes in approaches to dining out in 2020?
What publications most holistically reviewed the impacts of the BLM movement on restaurant equity in 2020?
How did changes in dining out compare to other service industries like theatre, film or hospitality more generally?

Workshop review

Posted on November 6, 2021 by Ostap Kin

Very often one sees the words “games” or “gaming” in phrases along with the words “learning” and “education.” Gaming is no longer a synonym to entertainment (or least, no longer one of the synonyms) but an area that has become a significant tool in (digital) pedagogy. Teachers, scholars, and practitioners of gaming who use it as a scholarly instrument typically are affiliated with English departments, Media studies, Communication studies, or Journalism. (Apparently, one can also get a Ph.D. in gaming now, too.) It’s fascinating to observe how and which games might be used in the classroom, library, or school programs–useful books are written on this topic including, for instance, the recently published volume Learning, Education and Games (2019), edited by Karen Schrier. It’s already the third book in the series and is available in open access here.

With that in mind, I registered for the workshop “Intro to Educational Game Design” facilitated by Zachary Loyd and offered by GC Digital Initiatives at CUNY. The workshop was based on several main goals. First, it was decided to discuss how games can lead to new learning outcomes; (2) explore some of the foundational concepts of game design for educational purposes and its implementation; and (3) provide an overview of the game design landscape–meaning, tools and software used for this.

One of the first questions discussed gaming is its relation vis-a-vis education and entertainment. Apparently, even when one plays for entertainment, one is still learning to do things–i.e., learning how to play and also develops a sense that allows one to retry, keep making efforts, and not to give up immediately (almost like a famous Samuel Beckett motto: “Ever tried. Ever failed. No matter. Try again. Fail again. Fail better”). Another important component is that one is also learning to navigate and use skills in one system (for instance, video games) that can have a beneficial effect on one’s learning in other areas, like history, literature, or science.

When discussing the approaches to education game design, two significant areas were pointed out: gamification and game-based learning. The key conceptual differences between the two are the following. Gamification is about adding to a scenario that can be called a non-game; game-like elements are there to improve a lesson. Whereas game-based learning makes use of games to construct the course from the very beginning. In other words, with gamification, as summed up by Michael J. Cripps elsewhere, one establishes Experience Points (XPs), badges/levels, and leaderboards whereas game-based learning usually embeds learning with game-like structures. While teaching and choosing a particular direction, both approaches might be considered and could be fruitful; things to think about in advance include one’s interest in the students’ specific learning outcomes as well as how an assessment can be incorporated. Not to forget: accessibility — are all students able to find tools, access, and work with them?

In 2013, James Gee published a piece “Good Video Games and Good Learning.” In it, the author pointed out sixteen principles of game-based learning: identity; interaction; production; risk taking; customization; agency; well-ordered problems; “just in time” or “on demand”; situated meanings; pleasantly frustrating; explore, think literally, rethink goals; smart tools and distributed knowledge; cross-functional teams; and performance before competence.

While talking about tools and software, the following are suggested for interactive stories (both have pros and cons):

— Ren’Py

— Twine

As for virtual spaces:

— Mozilla Hubs

— Second Life

Computer games:

— GameMaker Studio

— Unity

— PyGame

Sound Studies GCDI group

Posted on November 5, 2021 by Jeff Allred

I’m finalizing our readings in Sound Studies/DH for next week, and I just received (as did many of you, I’m sure) a list of upcoming meetings of groups hosted by the GCDI. Among them is a “sound studies” group that may interest some of you, so I’m throwing it out there:

Sound Studies + DARC

Working on a Sound or Digital Archiving project? Come to the upcoming joint Sound Studies + Digital Archives Working Group meeting on Wednesday Nov 10 at 12pm! This is an opportunity to build community and learn together about designing a variety of different kinds of projects including audio production, oral history recording, podcasting, digital archiving and curation, web design, database management, and so much more! Please join the Sound Studies and Digital Archives commons group to find out more and get the zoom link for the meeting.

DH 700 Fa21

Introduction to the Digital Humanities

Monthly Archives: November 2021