Question Three

Are they talking about something other than a man?

Getting Started

-

We have finally arrived at the final question of the Bechdel Test: are two named women speaking about something other than a man?

There are many ways to go about asking this question, computationally, and I would like to forefront this portion of the tutorial by stating that (like other parts of this tutorial) my particular method for identifying conversational topics is not the only method, and likely not even the best. Initially, I considered using topic modeling in order to identify the topics of dialogue between two women. That methodology is still very much open to anyone wishing to modify this code. I eventually decided that topic modeling, because of the unstable nature of the topics produced by topic models, would not be the most effective methodology for getting at the question the Bechdel Test asks. What I mean by this, is that the fact that topics can change drastically given small changes to the input data meant that the topics provided by the code would most likely change on every run of the code. Additionally, topic modeling works best on large sets of data and given that the set of nineteenth-century, fictional dialogue between women is likely to be fairly small in any given novel, I determined that topic modeling was not the most effective methodology with my current understanding of how topic modeling works. You can read more about some of the drawbacks of topic modeling here and some of the strategies for improving topic modeling stability here.

Therefore, I determined that the most straightforward way to answer this third Bechdel Test question, is to modify the code from question two to run in reverse. In question two, we filtered the dialogue that we identified to be occurring between two or more characters to only keep dialogue wherein either female pronouns, occupations, or names we have determined to be female. For question three, we are additionally going to filter out any instances where a male name, male pronoun, or male occupation is mentioned. Effectively, we are going to change the bag of words that we developed in question two to reflect maleness.

Some of the limitations of this methodology, is that effectively we will miss almost every instance of inferred dialogue. This methodology only extracts instances where a female character directly mentions another female character by name and only filters out instances of dialogue where that character is speaking directly about a male character. However, while this method certainly has its limitations, it does provide a more definitive solution in the sense that it will accurately determine if a female character is speaking to another off the topic of men (unless they are speaking indirectly and unless there has been a mistake with gendering the name of the character).

Additionally, similarly to how there were many instances in question two where it was crucial to check our own biases as programmers, when constructing our bag of words in order to represent maleness, we must make sure that we are not recreating gender bias. Especially when we are adding occupations to our list of words, be careful to only pick occupations you can say with some certainty are male. For instance, we can say with certainty that the word “Master” is gendered male because of its female counterpart, “Mistress.” Similarly, we can say with some certainty that the word “butler” is male because of the counterpart “maid.” Other terms, like “chef,” “firefighter,” “police officer,” etc are much more vague though we can certainly make the argument that in the nineteenth-century, these occupations were definitively male. However, in making this argument, we are erasing the possibility that perhaps, there might be some nineteenth-century novel in which a woman is also a police officer, a chef, or a firefighter.

The line-by-line explanations of the code below will be relatively brief given that much of the code is coming from question two.

THE CODE

We are going to start by importing a couple of useful libraries. In Python, libraries are just pockets of code that allow you to perform actions without having to write the code, yourself.

LIBRARIES

JSON

Json is a Python library that allows a .JSON file to be read as a Python dictionary object. This library allows for users to work directly with a .JSON file rather than converting it

Six

Six is a library that allows for the smooth transition of libraries from Python 2 to Python 3. This library allows us to use iteritems which will iterate over a large list/dictionary

Default Dict

Defaultdict is a way of preventing a key error if we iterate over a dictionary and hit provide a key that doesn't exist. Rather than return an error, defaultdict will return a default value for a key error

import json
from six import iteritems

from collections import defaultdict

This part of the tutorial is going to continue on directly from question two. Although not included on this page, this part is dependent on the variables and genders that were determined in the previous section, so maintain those variable names in order to get this code working ut of the box.

Much like line part two, we are going to declare a variable, "male_terms" with the type list (this is indicated with the empty brackets [] )

This variable is going to hold the bag of words that we construct to represent "maleness." Again, similarly to how we addressed that problem in the last part, I'm presented two different options for constructing bag of words (though neither approach is comprehensive)

male_terms = []

Below are two options for constructing our terms list. These two options are not comprehensive, they merely represent both a more loose and a more conservative approach to this problem.

Choosing a conservative approach might mean that some instances of gender in dialogue will not be recognized by the code

This approach is a more conservative approach. This means that we are going to make zero assumptions regarding gender based on perceived gender of specific roles. For this approach, we will construct a list of male pronouns as well as a limited list of roles that are guaranteed to be male such as 'brother' or 'father.'

This approach is more loose. this means that we are going to make some assumptions regarding particular roles using an educated guess. In addition to obviously gendered roles such as 'brother' or 'father,' we will also include words such as 'butler' or 'chef,' etc.

Since the roles for men in the nineteenth-century were much more expansive, this list can be nearly endless. So, try to consider roles that fit the particular set of novels you are interested in. For example, if you are primarily interested in domestic novels, you might consider whether words such as "adventurer" or "pirate," etc. might be useful to you or not.

Choosing a more loose approach might mean that some of your educated guesses regarding gender roles are incorrect. In this case, you might pass over or include instances that shouldn't be

male_terms = ["father", "Father", "brother", "Brother", "husband", "Husband", "Son", "son", "he", "He",
"his", "His", "him", "Him", "Sir", "sir", "papa", "Papa"]

male_terms = ["father", "Father", "brother", "Brother", "husband", "Husband", "Son", "son", "he", "He",
"his", "His", "him", "Him", "Sir", "sir", "papa", "Papa," "master," "Master," "policeman," "Policeman," "chef," "Chef," "Steward," "steward," "officer," "Officer,"]

about_men = male_names + male_terms

BIAS WARNING

At this point in the tutorial, you should be asking yourself whether your decisions regarding gender are being informed by your own expectations regarding gender. How will you ensure that you limit the amount of gender bias present in your code?

Now, we are going to actually tell the code to filter through these two lists we've constructed ('to_women' and 'about_men'). For your own reference, I am disclosing that I am choosing to follow a more conservative approach and make limited assumptions regarding the words I have included in my lists.

For this bit of the code, basically what we are doing is entering in a loop that basically says "while we have not yet finished looking over all dialogue and while we have not yet reached the end of the "about_men" list, if the item in in the list we are currently on matches any word in the dialogue we have pulled out (by matching on the 'key' variable) remove it"

i = 0
while i < (len(keys)):
q = 0
while q < len(about_men):
res= {k: [v.remove(x) for x in v if about_men[q] in x] for k, v in attributes[keys[i]][2].items()}
q = q + 1
i = i + 1

Next, we are going to run basically the same loop again with minor changes.

We have fully removed certain instances of dialogue from our attributes variable in the previous step. Now, we are going to loop through the attribute variable again, but this time, if any of the words in the dialogue match a word in our to_women list, we are going to take that instance and save it to a new list called "final" with its key (which, remember, is the name of the character). Essentially, the 'final' variable will contain names of female characters, and any instances of dialogue associated with these characters which passes the Bechdel Test.

i = 0

##change this so that if ANY of the female names or pronouns are in dialogue, we save it

final = []

while i < (len(keys)):
q = 0
while q < len(female_names):
res= {k: [final.append({str(attributes[keys[i]][0]): x}) for x in v if to_women[q] in x] for k, v in attributes[keys[i]][2].items()}

q = q + 1
i = i + 1

print(final)

The Results page contains the results of running the code on a few novels, as well as some network visualizations for these novels. This page will also include the results of the code on Wuthering Heights, our test case.