Links

links-200Textal is an intuitive, easy way to start experimenting with text analysis, but there are lots of other more advanced approaches that can be used to analyze text using digital techniques.

Monks were manually counting, reorganizing, and manipulating texts from the 13th Century, but since the 1940s computers have been used to carry out such analysis: an early forerunner was Father Roberto Busa who used a mainframe to analyze the complete works of Thomas Aquinas. Since then Text Analysis has become a common technique in the field which is now known as Digital Humanities. Text Analysis is used in authorship attribution, to try and identify unknown authors of text, and used to consider whether writings can be grouped by their stylistic attributes. Content-based analysis attempts to discover patterns in texts, identifying clusters and common usage of words.

If you would like to learn more about text analysis, there is a basic introduction provided by Stanford University, and the Text Analysis Developers Alliance, has an overview of “What is Text Analysis?”. Ted Underwood has provided a good introduction to “Where to Start with Text Mining”.

Tools

Many people will be familiar with Wordle, which is a simple visualisation tool, but many are frustrated at the limitations it offers.

Google has provided the Google Ngrams tool which allows text searches of phrases up to five words across 5 million digitized books, but again, the user cannot be fully in control of the texts or methods used.

Voyant and Tapor both offer browser interfaces for performing the most commonly used approaches to text analysis, and are a good place to start if you would like to manipulate text further than you have done in Textal.

MALLET is more advanced program for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Even more advanced, R is a free software environment for statistical computing and graphics, that is useful for managing large datasets, including texts.

Haven’t found what you need? There are many more text analysis tools listed at Project Bamboo.

Further Reading

Gregory Crane, “What Do You Do With a Million Books?” D-Lib Magazine. March 2006. Volume 12 Number 3.http://www.dlib.org/dlib/march06/crane/03crane.html

Shlomo Argamon, Mark Olsen, “Words, Patterns and Documents: Experiments in Machine Learning and Text Analysis”. Digital Humanities Quarterly, 2009. Volume 3, Number 2. http://www.digitalhumanities.org/dhq/vol/3/2/000041/000041.html

Hugh Craig, “Stylistic Analysis and Authorship Studies”. InA Companion to Digital Humanities”, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.

John Burrows. “Text Analysis” In “A Companion to Digital Humanities”, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.