We recently had the time to catch up with WLC-er Brody Dingel. Brody’s involved all over campus, including being President of the Linguistics Club, former President and Treasurer of the Spanish Club, tutoring in both English and Spanish, and acting as a research assistant for four different professors here on campus. One more thing we know is that Brody sure does a heck of a lot with data! We talked with him about his experience with language and data analysis for this winter’s newsletter. Read on to learn more.
WLC: Can you talk about what you study at Iowa State and how that would fit in with the theme?
Brody Dingel: I’m a double major in Spanish and Linguistics. I take a lot of language classes- a lot of Spanish, a little bit of a couple of other languages. I’m really interested in linguistics, the study of language as a whole rather than individual languages. This semester, I’m taking a lot of classes focused on data and how to use computers to work with data to learn about- to study language. I’m taking my first graduate course in Python right now to learn how to manipulate data with that language.
WLC: And what about the research that you work on, or have worked on in the past?
BD: I’ve been working as a research assistant for the past four years at Iowa State. I’ve worked for Evgeny Chukharev-Hudilainen [in the Linguistics program] the most. I’ve done a lot of stuff with both computers and corpora with him.
WLC: Could you expand on that a little?
BD: A corpus is a very, very large body of authentic text. Some of my work on his project, CyWrite (a tool developed in the Linguistics program to help second language learners to eventually become better writers, see video below) utilizes a corpus of English as a Second Language (ESL) learner texts that were collected by having those students sit at a computer with an eye tracker (to watch where the students were looking on the screen) and write essays in English. We compile those essays and then move on to the next phase. We’ve also had native English speakers who are learning Spanish complete the same task with both their first (English) and second (Spanish) languages and then compare the two for areas of improvement.
WLC: Interesting! What is that next step for that specific data use?
BD: After data collection, we move on to annotating (marking up) data sets on a computer for grammatical features. It would be like, marking whether a sentence has a specific type of error or not and then doing that a few thousand more times. Then, annotator reliability is checked with another person to make sure that we’ve found the same errors in the same sentences.
WLC: Sounds like a lot of work! You’ve worked in our department as well, right?
BD: Yes, for Dr. Pardo-Ballester. With her, I was doing more data compilation. [My job] was compiling a series of text messages between students so that she could more easily analyze them. I also took all of that data and compiled it into a website to help people who were thinking about traveling abroad to Valencia to know what to expect and what other people had experienced in their time abroad (check out more on the Valencia program here or see Brody’s website here). I also assisted with her Spanish-English Translation course.
WLC: How would you say data has shaped what you want to do after graduation in May?
BD: I actually just submitted my application for the Master’s program here at Iowa State. I’m hoping to enter into my Master’s here in the Fall. I’ve recently developed a really deep interest in computers and how we can use computers to help us to study language and help us to develop tools to facilitate language learning. Data is a very big part of that- there’s a big emphasis on corpus linguistics which is analyzing and drawing patterns from a dataset of millions- billions of words. That can then be used to shape how we want to teach, what we want to teach, when we want to teach it, etc.
WLC: Could you give an example of how that might work?
BD: A good example is one of my recent research projects. I evaluated four of the textbooks from our entry-level Spanish courses, and I analyzed all of the lists of nouns in those textbooks for how related the words in the lists are. It could be a list of random items like, “tree, hammer, clock” which are somehow incorporated into a story but are not very related, or it could be a list of fruits. Research out there very firmly suggests that having a list of related words is not best for learning because it kind of overloads that area of the brain. I found that a lot of the textbooks that we use here, and I would imagine this is the case in most second-language textbooks at the beginning stages, focus on related sets of words. This is a very, very small dataset, just four textbooks, but this shows me that there is more research to be done in that area. I’ve only done a very little bit of it, but this shows that there are some trends that I’d be interested in looking at further. It also opened up some other ideas to me about what other data exists out there that I would like to study. It’s all very interesting to me.
WLC: So then long-term, do you want to be in academia doing research, or is there something else you’d rather be doing with computers and data?
BD: I’m not really sure what I want to be doing long-term just yet. One idea is to continue down this road and become a professor and do research in corpus and computational linguistics. Another idea is to get my Master’s and teach at a university- I love teaching, especially Spanish and linguistics. Then, I would still get to be around all of the research-focused people. Of course, there’s also getting out into the workforce and applying what I will hopefully learn in my master’s programs; get into puzzles that corporations might have for me with their datasets.
WLC: Sounds like you’ve got some pretty great options ahead of you! Any final words of wisdom?
BD: I would recommend that anyone who is interested in computers, logic, data, or language, in general, should get started with a little bit of programming. I’m just starting with programming, but I’m already solving real-world problems with it. It’s really, really satisfying when a program works, and it analyzes your data for you automatically so you can study it further. It’s just such a different field from language study in and of itself, but its also just so connected. They really go hand-in-hand- computers and language. I’m learning how well they connect, and it’s really fascinating to apply.