I was browsing Twitter this summer when I came across a tweet by Adeline Koh (who is very much in the know about all things digital humanities) that read, “sweet new app by @melissaterras’s team: free text analysis on ur smartphone! http://www.textal.org/ #dhpocoss #dhpoco”
Hmm, that sounded intriguing. An app that Koh was excited about. I was already following Melissa Terras on Twitter and decided it might be wise to follow Textal on it, too. Melissa Terras was kind enough to take some time for a discussion about Textal and its text analysis capabilities.
First of all, Melissa I want to thank you for agreeing to this interview. Giving the rather breathless tone of your recent tweets, it sounds like you have had an exciting, busy last few days.
I must admit, I am a newbie when it comes to text analysis, so I found the info on your links page very helpful. It says there, “Text Analysis is used in authorship attribution, to try and identify unknown authors of text, and used to consider whether writings can be grouped by their stylistic attributes. Content-based analysis attempts to discover patterns in texts, identifying clusters and common usage of words.”
So, a few questions. Is text analysis what you do after you do text mining? And is text mining a subset of data mining?
They are horses from the same stable, really – and almost the same horse. Text mining is the use of computational techniques to analyze texts, and to find any patterns or underlying meaning in the data. I guess it’s the new term for text analysis – but its also used in a much more “big data” sense, that the texts that are mined or analyzed are usually a large collection. Text analysis is the term that has been used for the past 50 years or so. And both are a subset of data mining, yes: when the data is text. Perhaps we should have called Textal a “text mining app” and been a bit more fashionable!
How does text analysis relate to distant reading?
Distant Reading is understanding textual content not by studying particular texts (in the traditional manner – “close reading,” reading them yourself and paying attention to the nitty gritty), but by aggregating and analyzing massive amounts of text based data, computationally. So text mining, or text analysis, is the process, and distant reading is a name given to the overall activity, in some quarters (particularly in literary analysis): you would do distant reading by undertaking text analysis, or text mining. At the end of the day, text analysis, or distant reading, allows you another set of tools to see what themes and topics emerge from a body of texts.
I am curious why Textal is a smartphone app. Why did you not start with a program for desktops? Somehow I don’t think as scholars engaged in pretty arcane scholarship doing major mining on a smartphone. A tablet, maybe and Textal works with the iPad.
You say at Textal’s About page, “We also want to explore the opportunities available in mobile computing. To determine the potential audience for this type of service and to understand more about the kind of texts people want to analyse, we are conducting a reception study into Textal’s uptake, which we expect to be of great interest to the wider digital humanities audience.” Can you tell us about the study? Who are the subjects?
I wanted to make an app, is the short answer! I had found myself using my phone a lot – when I am trained in programming for the web, and for browser based setup. I wondered what affordances a smartphone would have for digital humanities – there didn’t seem to be many people doing work in this area – and I thought it should be explored. I was aware that when people who don’t know much about digital humanities saw me fiddling with my phone, they would say, “so show us something,” and I thought, well, there is need for an entry level DH tool for smartphone – something that will explain the concepts of basic data manipulation, and allow people to explore information in a different way.
There already are web-based tools for this type of text analysis (see Voyant) but nothing “in your pocket”. I thought the pinch and stretch affordances of smartphones would lead themselves well to text analysis, and so the idea for Textal was born. We’ve made something that is “traditionally” digital humanities – text analysis is a technique that stretches back to the 1950s, so conceptually we’ve not done something new there – but using an increasingly common platform. We like to think of it as a gateway drug into digital humanities, providing text analysis in your pocket.
Would Textal have applications outside the digital humanities? Would journalists use it? How does it relate to the buzzword of the day, “big data?”
First, its important to say that this stems from working with another two people: Rudolf Ammann, our designer at large at UCLDH, but also Steven Gray, who is the programming genius behind Textal. Steve is a researcher and a part time Ph.D. student at UCL, in the Bartlett Centre for Advanced Spatial Analysis. His Ph.D. studies are on Big Data – see the Big Data Toolkit for more information – and it became clear to him that part of doing Big Data analysis is doing text analysis – a lot of the big data he is dealing with is text, such as large-scale Twitter datasets. So the techniques behind Textal were designed to help with the Big Data Toolkit, and will be used and integrated into other things through our API. That’s the practical answer to the question.
But the more philosophical answer to the question – “big data” is about the analysis of information that is too large for any human to realistically do the analysis themselves in a realistic time frame. Its also about – well, the interesting bits of it are about — mashing up different sources and seeing what patterns emerge that you couldn’t otherwise see. So Textal does provide a tool for a certain kind of “big data” analysis – particularly in its visualization of twitter searches. Yesterday I ran a search on a hashtag, and within 5 minutes Textal had told me the most common 250 words in a 218,000 word corpus of individual tweets containing that hashtag. That’s information I couldn’t have gathered, or analysed, as easily or efficiently.
Do other people use it? We know people are visualizing politician’s speeches, or academic papers, or newspaper articles. It’s fascinating to see how people are using it and – going back to your earlier question – we are just at the start of analysis who is using this, and for what. We sent this out to the open world – a digital humanities tool on smartphone – without any preconceptions regarding who would use it, or what it would be used for. Over the next six months or so we’ll be gathering data, and doing a “Reception Study” as to how it has been received, and by who. So we’re trying different things, we’re still developing behind the scenes and tweaking things, we’re promoting it in different ways, and we’re watching the stats and the uptake. It’s early days yet to say who our audiences are and what the usage is: in many ways, that’s the whole research question!
While we are at it, can you define for us what the digital humanities consists of? How do you “do” the digital humanities?
I’m smiling at that. I have co-edited a 400+ page book that is coming out in October called “Defining Digital Humanities: A Reader”. Many, many people have tried to define DH, and say what it means to them – and it’s fascinating to look at the many different ways in which that question has been answered (we’re on twitter at @DefiningDH at the moment, if people are interested).
For me, digital humanities is about the use of computational techniques to undertake research in the Humanities that would otherwise be impossible. The important bit is the last part of that, for me. There are plenty of tools and techniques which allow you to do the same research as before, just better! Faster! More! However, what I’m personally interested in is the stuff that pushes the boundaries on what is possible, and the things that you can do that you couldn’t do without that technique.
I was having a hard time visualizing how Textal works. Luckily, there is a page called “How it works”. There, many potential users will be delighted to read, are the words, “First, you’ll need to download the free Textal App to your iPhone or iPad.” Free—cool. Please thank your funders for me! How did you pitch the idea to them? How does Textal jibe with their missions as organizations?
Well. We don’t have a huge amount of financial backing, but we do have a lot of institutional resources (such as the servers) and we do have a lot of institutional support. I’ve already explained how this fits in with Steve’s PhD, and his line manager agreed that he could do this as part of his doctoral work, so his time on this has been donated by his funders (NCRM, and thanks to them!). I did manage to get a small amount of funding – called Bridging the Gaps from the EPSRC – which was internally dispersed around college to try and make people work in an interdisciplinary manner, on pilot projects.
We pitched for that, and received £11,333. The whole thing has been done on that small budget. I’ve given my time free, and Rudolf, our designer, has also given a lot of time free to the project too. Its been an experiment on how much you can do with very little fiscal input: but as I say, we do have huge institutional backing and very supportive colleagues who have encouraged us to “play” with this. It’s a low-risk venture, really, aside from us investing our time into it.
We also read there, “To choose a text to analyse, you can enter the URL of a website, search Twitter from within Textal, paste in your own text, or use one of the hundred classic works of literature which come pre-loaded within the app.”
Let’s start this, “…enter the URL of a website.” What would be examples of websites I could mine with Textal and what might I be looking for on them? Could I search The Guardian’s website, for example and try to find any mentions of the word “Snowden?” Can you search any Website on the Open web? What websites would Textal not work with?
You can put in any webpage that has text on it! I’d search around, find the URL (website) you want to analyze, and then copy and paste the URL in. So that can be individual webpages, such as a newspaper report on Snowden. But you might also put together a text file of 30 different newspaper reports on Snowden – copying and pasting them together – and copy and paste the whole thing into Textal. It’s very flexible, really!
And can you be more specific here, “…search Twitter from within Textal.” What is a term might I be looking for and can I search all of Twitter and how far back would that search go?
Its limited under the usual twitter search conditions. So, if you put in a search in the twitter search box, you will get the same results from Textal. The search goes back about 30 days, but Textal is programmed to return around 250,000 words of tweets maximum. So if you do a popular, trending topic, the search might only go back an hour or so. If you do a less used hashtag, it might go back 30 days or so. We adhere very closely to the twitter terms and conditions. But this is why, for Textal to work, people have to sign in with their Twitter ID.
You say users can, “…share the text cloud via Twitter, use the statistics in reports, theses or papers; take screenshots of Textal visualizations and drop them into documents.” Could you elaborate a bit about what kind of statistics could be generated? I have read, for example, arguments from both sides on the matter of whether President Obama uses the word “my” and “I” far more than any of his predecessors. How could a journalist or political scientist use Textal to make the case one way or the other?
There’s a great website of all major presidential speeches by all the U.S. presidents. You could use Textal easily with this – pointing it at each speech, generating the stats, and then looking at each one. For Obama, it’s a simple case of clicking – in Textal – on “my” or on “I” to get the list of how many times he uses those terms, and you can email yourself any resulting stats. But even better, we give the collocates – that is the words either side of “my” or “I” to show how he uses these words in context. So, for each speech, you could see both how many times he uses those words, and also how he uses these words. And take the resulting visualization, and stats, and do what you would like with it – we’re giving this stuff away.
People have used it for this kind of thing. You can see how Textal was used to visualize Obama’s speech on Trayvon Martin. If you wanted to visualize any more Obama speeches, you just point Textal at obamaspeeches.com.
Also, are satirists a potential user group? I read an article, last night, that made fun of Caroline Kennedy for saying, “You know” excessively. Could I use Textal to mine video or audio clips someday to determine that?
You could use certainly use Textal to look at the instances of repetition in the transcript of speeches. We don’t do audio to text conversion – there is some other software available that does that, that can generate text from spoken words, so if you had a transcript then Textal could look at it.
Do you find it ironic that in era in which the death of print is often announced that you have built a tool that makes texts far more revealing than we have heretofore considered them to be?
Ha! I don’t believe in the death of print. Text still remains the most efficient way of conveying information in our current digital phase – its relatively low cost in terms of data storage and is highly efficient – so I see text being around for a long time as a means of communication! The print industry may be suffering changes, but that’s an entirely different story. Textal is in the business of analyzing text, and there is more and more of that in the digital age we live in.
Would Textal be a useful tool for those in the marketing biz? Or do they already possess tools that enable them to ferret out every conceivable mention of whatever they are trying to sell us and are already minutely examining the demographic data on who has said what, where and how often about the latest and greatest such and such?
There are a variety of tools for people in marketing to use to see how their brand is being talked about, etc. – but Textal will be very useful for those people (particularly looking at brand management and searches on twitter content). We are not only promoting Textal to the digital humanities community – that is where we started, for sure, but we will be actively marketing textal across a variety of sectors over the next few months, and the marketing industry is one of those we will be targeting, in due course.
Do you worry at all that word clouds might already be passé in the same way that QR codes seem to be and that infographics might be heading or are you confident that word clouds will always be useful for those who need a quick summary of the frequency of certain words?
Word clouds have certain issues. It’s hard to tell what the mechanism is for generating them, and what relation the size of words is to the stats underneath. We believe Textal has helped solve that problem, though, in using clickable word clouds for text analysis. We think we are tapping into a market and a need, using the popularity of word clouds combined with a more transparent and rigorous approach to the stats that underlie them. As for whether they are cool or not? The fact is, they are a well known format, now – but a format that needed revisiting to make it work better. We’ve done that.
Can Textal be used in languages other than English or with non-Roman alphabets? Could it be used to look for musical notation or computer code? Numbers?
At the moment we have native English, Spanish, German, Italian, Dutch, and French, in Roman alphabets. But we are looking to expand that into Arabic, next… and others. Computer code wouldn’t be a problem – that tends to be ASCII text, for the most part, but things like musical notation are not what it has been built for.
Is Textal basically and always will an app for those new to text analysis and who want to use it to do simple word frequency searches and to share the results in a graphic form via Twitter or to stick graphics into reports or academic papers? Or is it way more sophisticated than that and the first of its kind in the digital humanities?
At the moment, that sums up Textal – an app for those new to text analysis who want to use it to do simple word frequency searches – but we have plenty of ideas for further development, and we may take it further. There are other apps in DH, but those so far have been geographically based apps that do things like organize historical walks, or geotag sculpture in parks, or do augmented reality on historical sites. We think Textal is the first app in DH to do processing on the fly, and to allow people to do analysis as well as just access some kind of pre-organized visualization, or tag some existing thing. So I think it is a first, yes. We’re using it to take digital humanities processes to the smartphoned masses!
What do you hope to discover in your uptake stats? When might we see the first scholarly fruits of your work on Textal?
We’ll be monitoring things closely, and promoting widely for the first six months or so, then it will be write, write, write… We should be presenting at conferences next year (2014).
Any plans for a video or screencast showing in detail how Textal works?
Yes! We have plans to do this. Partly our plans are determined by the fact that we all do other things in our day jobs, but we have plans to do this.
What have been the reactions so far?
We’ve been very pleased so far! We’ve had a few really good reviews from app reviewing places (see Appstore Arcade, Cool iPhone/iPad Apps, Carlispina.wordpress.com, and The Chronicle of Higher Education) and we have also had good feedback via twitter, and emails to the team.
At time of writing, 63,000,000 words have been submitted for processing, which isn’t too shabby. I’m not too sure what I expected after the launch, but to have that many words go through in a little over a month, well, it shows that people are, indeed, using it!
What do you find particularly cool about Textal and is it what you hoped for back in April 2012 blog post?
It’s been great fun. I’ve learned so much along the way – about the app market, and how maps are made, and user-focused design. I’ve learnt a lot from both Steve and Rudolf, who bring their own expertise to the team. And I’m so proud we made a working, useful, app, and brought it to market (well, even if it is free). It is what it is – an introductory app to text analysis – but to see that it has been so used and talked about, that’s great. I’m pleased that our hunch was proved right!
What would you like Textal to be used for a year from now and what kind of use of it would you be most proud of?
We are still in development, so we have a few other tricks up our sleeves that we want to try over the next six months, as we actively promote it. I’d be most proud of seeing people use Textal’s outputs in their own writing, I suppose – but that has always been the difficulty of making digital resources in the humanities, understanding and knowing how they are used and how the outputs effect people’s thinking. So if it continues to be used, and used by people I don’t know, then that’s the main thing.
What scholar in the past would have killed for a tool like Textal? Whose work 75 or 100 or 250 years ago would have been greatly facilitated by it? Who will most love it now? That is, which fields? Lexicographers? Literary historians? Biographers?
Interesting question. I think that text analysis is becoming a tool for those interested in history, literature, languages, but also the social sciences. Part of our plan with Textal is to introduce this tool to people in a non-academic format – it behaves like a commercial iPhone app – so we see it as a public engagement activity. We hope that more and more people will add text analysis to their set of tools they are using to understand their own domain, and so we think that it will be useful to a variety of people, not just in academia. But seeing how it is used is our next task…
Thank you for your time.
No problem, a pleasure.