WHAT IS THE BECHDEL TEST?

THE BECHDEL TEST AND FEMALE CHARACTER ACTUALIZATION

This Project Builds Off Of David Bamman’s Natural Language Processing Pipeline, BookNLP (Https://Github.Com/Dbamman/Book-Nlp).

When I was thinking about this project in the beginning, my mind kept drifting back to the quote from Nathaniel Hawthorne that has now become this project’s banner. I first became aware of this quote in undergraduate school and I remember how shocking it was, at the time. It made me angry, and as I entered graduate school, I determined that I would do something to address it. For a few years now, I’ve been following the work of David Bamman who has developed an incredible tool called BookNLP. BookNLP takes a few well-loved tools of digital humanists and puts them all in one place, allowing for things like character names and dialogue attribution to be retrievable from one document rather than many.

As a scholar interested in women’s writing and female characters, I wanted to think about how I could use BookNLP in conjunction with my own code to think about how I could highlight the voice of women that BookNLP is able to parse out. After running through many different potential methods for doing this, I eventually settled on using the Bechdel Test.

While the Bechdel Test is fairly flawed in that it does not account for the various, intersectional power relationships that are influenced by (among other things) race, sexuality, ability, etc, I thought that the test was a good baseline for beginning to unpack so much of what bothers me about Nathaniel Hawthorne’s quote.

It has long been accepted that in the nineteenth century in America, many women wrote. However, the value of women’s writing has long been devalued through terms like “sentimental” or has been published under “Anonymous” rather than under any name that would allow researchers now to track them down. Since so many of the works of fiction are difficult to track down and it would be impossible to read every single work of fiction written by someone you suspect to be a woman in the nineteenth century, I wanted to find a way that I could quickly determine if a work of fiction has any women in it, at all.

For those who don’t know, the Bechdel Test is a series of three questions that were developed by Alison Bechdel as a way of determining and measuring the representation of women in fiction. The questions are as follows:

Is there a woman?
Does she speak to another woman?
Is the topic of their conversation something other than a man?

The Bechdel Test draws inspiration from Virginia Woolf’s A Room of One’s Own in which she writes:

“All these relationships between women, I thought, rapidly recalling the splendid gallery of fictitious women, are too simple. … And I tried to remember any case in the course of my reading where two women are represented as friends. … They are now and then mothers and daughters. But almost without exception they are shown in their relation to men. It was strange to think that all the great women of fiction were, until Jane Austen’s day, not only seen by the other sex, but seen only in relation to the other sex. And how small a part of a woman’s life is that”

Methodologies

METHODOLOGY AND TOOLS

BOOKNLP

https://github.com/dbamman/book-nlp

BookNLP is a natural language processing pipeline developed for book-length text by David Bamman, Ted Underwood, and Noah Smith. The output of BookNLP is a .CSV file, .HTML file, and a .JSON file. For this tutorial, we will only be using the .JSON file which contains all discernible character features that BookNLP could determine

PYTHON

https://jupyter.org/install

Python is an interpreted, general-use programming language. While running BookNLP requires some use of the Command Line, the rest of the tutorial will be done in Python. The IDE that we will be using is call Jupyter Notebooks. For some introductory tutorials on using Python, I recommend Code Academy https://www.codecademy.com/learn/learn-python

MACHINE LEARNING

https://towardsdatascience.com/beginners-guide-to-machine-learning-with-python-b9ff35bc9c51

We'll go into more detail later, but machine learning is a computational technique that focuses on providing computers with data that they can learn from and act on, themselves. While this tutorial will only make use of a very small section of the larger methodology, machine learning (and what machine learning is capable of) is much more vast

This section will go into more detail about the methodology and tools selected for this tutorial. This section is not necessary for completing the tutorial, but may be read as a supplement.

Although I began this project a year ago, with the recent controversy regarding machine vision and ImageNet, the questions that I asked myself a year ago, and that I now direct towards you, feel more relevant than ever. The primary question that I hoped to address in beginning this question, was whether or not a classifying test (the Bechdel Test) could successfully classify a nineteenth century novel in an “unbiased” way, if I distanced the human involvement as much as possible (machine learning). In short, my question was: in a test of gender classification of nineteenth century novels, who fails?

I think that often, when we think about classifying tests such as the Bechdel Test, we always think about them in terms of whether or not the object being tested passes or fails. A movie can pass or fail the test based on its script. A book can pass or fail the test based on its text. I wanted to use this project as a way to expand blame outward. If my code determines that a book I provide fails the test, using categorizing data that I fed the code, who is failing in this situation? Is it my code? Is it the book? Or is it me?

Because there are so many variables at play and so many instances in which failure is a very real possibility, I now extend my code outward and invite anyone reading this to improve upon the work that I have done, here. In a way, my goal for this project was for the website to become, in itself, a site of “training data,” that can improve and be improved upon by many programmers rather than just one. Below, I will explain why this became my goal.

Machine learning is a subset of Artificial Intelligence (AI) that automates decisions based on data provided by a programmer. Machine learning focuses on the idea that computers are capable of observing data and making decisions based on minimal human interference. Originally, machine learning methods were designed to recognize patterns, but more recent, machine learning models have been developed to learn from many different kinds of data. Whether we know it or not, we see machine learning all around us and we hear about it, everyday. For example, machine learning methodology is what is running self driving cars, fraud detection software, and even online recommendations on sites like Youtube or Amazon. As data becomes more quickly available and as it becomes easier to learn computational methods, machine learning has skyrocketed in popularity. However, this ease of use and popularity is not without risks.

Kate Crawford and Trevor Paglen argue in a recent piece entitled “Excavating AI: The Politics of Images in Machine Learning Training Sets” (https://www.excavating.ai/) that “Understanding the politics within AI systems matters more than ever, as they are quickly moving into the architecture of social institutions: deciding whom to interview for a job, which students are paying attention in class, which suspects to arrest, and much else.” Training data, or the human-compiled data that is provided to an AI system from which to learn, can introduce (intended and unintended) bias into a machine learning methodology. As Crawford and Paglen illustrate by excavating the AI training data of an image-based project called ImageNet, “Despite the common mythos that AI and the data it draws on are objectively and scientifically classifying the world, everywhere there is politics, ideology, prejudices, and all of the subjective stuff of history. When we survey the most widely used training sets, we find that this is the rule rather than the exception.” Every project--computational or otherwise--that involves classifying words or images or anything else that is really up to interpretation, involves the programmer subjecting their data to some form of ideology. If I believe that all fruit is yellow, and my training set reflects this belief, then the way that my machine learning model understands fruit will be impacted by this belief.

You might be wondering what the point of a digital humanities project which employs a methodology that can so easily be biased is. The truth is, we live in a world where the limits of computation are forever shifting and evolving. Machine learning is a powerful methodology that can answer some pretty complex questions, however, it is important to acknowledge the risks.

PYTHON

Although there are many programming languages that can be used for building machine learning projects, Python is perfect for a beginner because it is designed to be both minimalistic and intuitive. In addition to being one of the most popular programming languages out there, Python makes working with natural languages and machine learning fairly simple. There are two primary categories into which machine learning methodologies can be separated: unsupervised and supervised learning. Unsupervised learning means that the computer is left to determine characteristics on its own. Supervised learning means that the computer is provided a dataset with categories already labeled. The computer will study these labeled categories and learn to identify characteristics of new data.

Thankfully, data analysis and machine learning are just a small part of what Python is capable of, as a programming language.

MACHINE LEARNING AND BIAS

In today’s internet culture, most popular platforms (Google, Amazon, Facebook, etc) have some form of machine learning embedded in their code. Essentially, this means that these platforms are collecting data in a variety of ways, and using that data to improve the platform in some way. However, while this process seems pretty benign and while it may appear that machine learning operates with as little human interference as possible, there is still a risk of coding in such a way that a machine learning algorithm can become biased and influenced by the programmer’s bias. Thus, while the point of this tutorial is to teach users (and particularly women) how a pretty rudimentary version of the Bechdel Test can be automated to work on works of nineteenth century literature, this tutorial is also designed to make users more aware of how these systems which are so deeply embedded in our technology, operate.

In a 2016 study of human reporting bias, Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, and Ross Girshick discuss the susceptibility of supervised learning algorithms to bias. They write that “Images annotated with human-written tags or captions focus on the most important or salient information in an image, as judged implicitly by the annotator. These annotations lack information on minor objects or information that may be deemed unimportant, a phenomenon known as reporting bias” (2). I.e., when humans are responsible for developing categorical tags for a dataset, there is an element of reporting bias that is always going to be present. Namely, this is because by giving a human the choice about what label to use (in our case, gender labels on names) or which data to include in a set, you are asking them to apply their own subjective knowledge to the task. While this tutorial is primarily drawing names and genders from the US census, “working with data that someone else has acquired presents additional problems related to provenance and contextualisation. It may not always be possible to determine the criteria applied during the creation process” (Read More Here page 5).

Removing bias within machine learning means teaching the algorithm what that bias is, and eventually training the code to sway away from. While some feminists such as Caroline Sinders have responded to the problem of bias in machine learning algorithms by developing a data set of feminist words and phrases to feed machine learning algorithms, this methodology still depends on a level of recognition of bias. Unfortunately, there are many, many machine learning algorithms in which bias goes unrecognized. Sometimes, this bias can even present itself in the form of violence or harm. For example, in October 2018, Amazon announced that they were ditching an AI recruiting tool that they had developed to shift through resumes and CVs because it favored men for technical jobs. Ultimately, this is because the dataset that the developers at Amazon had used to train the AI was a set of resumes that had been submitted to the company over a period of ten years. Most of these resumes belonged to men, and so the machine learning algorithm taught itself that resumes submitted by male applicants were preferable. As a result, the algorithm penalized resumes that included references to women’s organizations as well as graduates from women’s colleges (Read More Here). This is just one example of how AI is increasingly becoming a regular part of the world that we live in, and the decisions that AI will make about our daily life experiences can often be influenced by human bias. Without this bias being checked, they can perpetuate and reassert systems of oppression and violence. However, the ease with which bias is introduced to AI shouldn’t be seen as a result of a “broken” methodology. Rather, machine learning works just the way that it is intended to; the flaws are in society, in general.

Some projects such as The Feminist Internet (https://feministinternet.com/) work to try and prevent the ways in which bias can influence AI by making the internet a more democratic space. However, there is still much work to be done regarding making clear how these systems influence our lives. In a recent interview with Charlotte Webb, Dazed reported that according to a feminist chatbot--a product designed by Webb in order to educate users on AI bias--only 22% of people building AI are women (Read More Here). In an April 2019 study detailing the results of a year long survey of literature, the researchers found that only 18%of authors at leading AI conferences are women and that women make up only roughly 10-15% of AI research staff at Google and Facebook. Only 2.5% of the AI staff at Google are black.

As I hope I have illustrated, machine learning is a feminist issue much in the same way that social bias, more generally, is a feminist issue. While these technologies seem to be acting with little interference from humans, they are actually directly impacted by and act on the biases that the humans who program these algorithms introduce. These biases often directly impact the lives (both online and offline) of marginalized communities. From preventing women in tech from being hired due to gender bias, to a Google image result of “unprofessional haircuts” primarily showing images of African American hairstyles, bias in machine learning systems is impacting the lives of people all over the globe. The current focus on getting women interested in tech is much too narrow, privileging white women over women of other identities. As a result, I have intentionally chosen to be as transparent about my methodologies and code as possible. AI systems don’t become less biased if they remain opaque and difficult to understand. In creating this tutorial, I am on one level trying to develop an automated Bechdel Test, but mostly, I am trying to address a more central concern which is: how can knowing how machine learning works help marginalized communities spot when AI bias is working against them?

GENDER IN NINETEENTH-CENTURY LITERATURE

The nineteenth-century is ripe for study by feminist digital humanists. In addition to the availability of digital editions of nineteenth century texts, the nineteenth century is an era in which gender divides ruled. In addition to most women being heavily discouraged from pursuing writing, the limited number of women who did write were often subject to their work being diminished through labels such as “sentimental” or “domestic fiction.” The nineteenth century is also a period in which women were not able to publicly earn their own money, according to laws in both the Americas and Europe. Some of the only legitimate ways in which a woman could write and earn money would be through publishing domestic manuals and other texts operating from the domestic sphere. All this is to say that the number of women writing fictional narratives in the nineteenth century was extremely limited. Because of these limitations, some of the only ways in which women readers could see themselves represented in a text, was through the male interpretation of women. These characters, if they exist at all, typically occupy a marginal space in the narrative and are rarely seen as actualized characters outside of their investments in domesticity and romantic pursuits. However, recent studies in the digital humanities have revealed the margins of a text to be significant and the ability to quantify narrative elements like dialogue and character to be crucial to the study of women’s roles in nineteenth century fiction.

By examining complex interactions among objects within a shared social space, side characters are no longer lost in the larger scheme of the narrative and their influence on narrative structure becomes less muddled by the viewpoint of the narrating character. This method has been explored in recent years by the Stanford Literary Lab in their study of character centrality in Shakespeare’s Hamlet; they concluded that Horatio, though a “flat” character, was so central to the plot that if removed the entire social network of the play fell apart. In terms of gender theory, when applied to works of fiction, this type of quantified analysis allows for the mapping of slowly developing social networks that are likely to be overshadowed by more male-centered developments in the plot. With respect to nineteenth-century female characters, quantified analysis removes preconceived notions of feminine presence and allows the researcher to focus on a view of the world of the text that is often “off-center.” That is, rather than only focusing on the main character and the main character’s world, methodologies such as natural language processing which includes dialogue tagging and gender attribution allows for a flattening of hierarchies in the text. In this view of the text, the maid with two lines of dialogue is considered on the same playing field as the prince whose plot of romance occupies the most readerly attention.

The nineteenth century is marked by both an increasing number of female writers as well as a fixation on the idea that women were meant to occupy limited social spheres. As Barbara Welter states in her study of 19th century feminine behavior, women were seen as merely passive responders to men. As a result, women were meant to remain silent and contribute to a social construct that promoted the lives of women as a series of suppressed emotions. These ideas in conjunction with my own readings of feminist and literary theorists has informed my understanding of the experience of womanhood-- and particularly in the nineteenth century--as having largely been constructed and enforced by patriarchal constructs. These constructs contribute to an envisioning of the female body as a script onto which passivity and a male-centered construction of identity has been written. When applied to literature, the feminine body becomes additionally influenced by the inherently patriarchal construction of language, a limited class of female writers, as well as the codified social norms which limit the space characters are allowed to occupy within a narrative. These ideas have led to the focus of this project on examining the fields of social interaction that nineteenth-century women participated in, as well as how these social fields can potentially exhibit themselves in literature given the computational analysis of a data set of novels containing both male and female writers.

Given that a book is a self-contained entity constructed by the author and put into motion by the reader, the question becomes whether or not female writers model the limited social spheres and levels of passivity that male writers do, or if they are resisting these social structures, in subtle ways that computation can bring to the surface. I hope to study how the application of social network analysis to a selection of nineteenth-century novels can begin to reveal not only whether or not a female character is considered passive, but whether or not this perceived passivity affects the ways in which they interact with other characters with whom they share a network and whether or not these connections can provide insight into the perceptions and beliefs of its participants. This analysis may furthermore inform the understanding of how the feminine narrative body is positioned and under what circumstances defiance of social norms is possible.

The application of techniques such as social network analysis and gender classification allows for an analysis of the social worlds of women in works of fiction, and the Bechdel Test’s investment in questions of female actualization beyond the pursuit of marriage and romance allows for an examination of whether a nineteenth century fictional woman can be actualized outside of the influence of men. This analysis attempts to reveal whether or not a novel’s social world can only be composed of women who speak off the topic of men, or if the novel as a form collapses under these constraints. Some of the work to which this project owes its inspiration are in projects such as the 2016 article entitled “Understanding Gender and Character Agency in the 19th Century Novel” by Matthew Jockers and Gabi Kirilloff. This study used a selection of over 3,000 nineteenth-century novels and evaluated them according to gendered personal pronouns and verbs in order to determine the passivity of female characters in comparison to male characters. Through this study, they were able to tentatively propose that agency understood in terms of the kinds of verbs that are associated with particular pronouns, are an important element of characterization and that these findings support the notions of gender propriety. In an effort to contribute to this crucial body of work, I chose to focus on the role of spoken dialogue, alone, in supporting gender propriety.

Theory Behind the Project

Scroll Down