Katherine Bellamy – DECM Project

Pathways to understanding 16th century Mesoamerica

Katherine Bellamy — Mon, 29 Jul 2019 09:25:53 +0000

Two of our team members, Raquel Liceras-Garrido and Katherine Bellamy have recently completed the project ‘Pathways to understanding 16^th century Mesoamerican geographies’, funded by the ��Ƶ Department of History.

This spin-off project has used , combining interactive texts, images and maps in a series of online interactive learning resources on the history, archaeology and geography of the Mesoamerican Postclassical and Colonial period of Central Mexico, beginning in the 14^th through to the mid-16^th century. These resources are divided into three main areas:

The first of the story maps explores the history of the Mexica people, beginning with their journey to the foundation of Tenochtitlan in 1325, which would become (alongside its neighbour city to the north, Tlatelolco) the heart of the Triple Alliance. Following this, the story map shows how the Mexica began to expand, featuring the lists of conquered settlements as recorded in the Codex Mendoza. This leads up to the arrival of the Spanish, and the ultimate meeting of Moctezuma II and Hernán Cortés in 1519. It then proceeds to describe how Cortés, with considerable assistance from his indigenous allies, conquered Tenochtitlan. This story map concludes with a look at the beginning of the colonial era, exploring how the Spanish began to impose their own institutions across ‘New Spain’, with varying success due to the continuing influence of indigenous institutions across Mesoamerica.

This story map explores the nature of historic place-names across what is currently Mexico, introducing the importance of place-names and language as a tool of colonisation and empire. The story map explores how this tool was used not only by the Spanish, but also by the Mexica and the Triple Alliance (not to mention other indigenous groups), as part of their systematic colonisation of conquered settlements and people. The story map goes on to explore how indigenous place-names continued to be used, despite the processes of colonisation at the hands of both the Triple Alliance and the Spanish. In addition, it explores the meaning of Nahua toponymy in particular – demonstrating the use of suffixes such as -tepec (which means ‘inhabited place’) and showing the distribution of some of these examples. Following this are some case studies of individual place-names, explaining their meaning and how they have been depicted in the historical record. The story map concludes by giving a brief overview of colonial naming, and how indigenous influences have continued.

The final story map discusses depictions of geographic space and place. This starts with an explanation of why this is an important discussion, with particular reference to, and problematisation of, the use of Geographic Information Systems for representing historical geographies. Following this, the story map introduces the idea of representations of space, which may be unfamiliar to the modern reader, and explores the various types of pre-Hispanic Nahuatl documents, including those which represented geographies. The story map gives an introduction to the state of Spanish cartography in the sixteenth-century, before going on to discuss how the Spanish and Nahua traditions of depicting geography began to merge during the conquest of Mexico. There is considerable evidence of this merging of traditions throughout the historical record, which the story map explains, giving two specific examples.

Subaltern Recogito: Annotating the sixteenth-century maps of the Geographic Reports of New Spain

Katherine Bellamy — Tue, 11 Jun 2019 12:50:43 +0000

We’re very pleased to have been awarded a Resource Development Grant to explore the annotation of a series of historic maps using . Our corpus of maps includes those produced in the sixteenth-century for the Relaciones Geográficas de Nueva España��across the area which is currently Mexico.

On Monday 17th June, in collaboration with our colleagues in the at , the (ENAH), (UNAM), the (INAH), and the , we will be delivering an online workshop which will deliver training on for the annotation of the sixteenth-century maps of the Relaciones Geográficas.

We will be working with 27 scholars from and , delivering training on , and presenting an introduction to the Spatial Humanities and the use of these technologies. From here, this will evolve into a citizen science project, where we will meet online every week to take part in ‘mappathons’ with all our participants, completing the annotation of our full corpus of sixteenth-century maps.

‘’
Relaciones Geográficas de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

We will be annotating the full corpus of the (), as well as a number of maps from the and the . These maps are a unique reflection on sixteenth-century settlements in Mexico, drawn using a combination of indigenous and European techniques and ideas. This interplay of indigenous and European voices is a key part of these maps’ significance, offering a unique insight into multiple perceptions of space and place during this crucial period in Mexico’s history. The maps of the RGs contain a great variety of information, both textual and pictographic, which offer invaluable insight into the historical and geographical contexts in which these maps were produced. This information includes proper names in the form of both traditions, logographic Mesoamerican toponyms and people’s names and European alphabetic glosses.

Glyphs and Glosses

Digitally annotating these maps using Recogito gives us a promising opportunity to analyse this corpus, which is not heavily text-based, but features text alongside pictographic depictions of space and place. Annotating both logographic toponyms and alphabetic descriptions and place-names will enable us to better understand the different ways in which Mesoamerican indigenous spatial knowledge and portrayals changed over time, and the processes through which these became ‘subaltern’ to European thinking.

Close-up view of a toponym glyph and alphabetic gloss for Cempoala
‘’. Relaciones Geográficas de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

Close-up view of a logographic depiction of the ‘Cerro de tigre’ with alphabetic gloss
‘’. Relaciones Geográficas de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

Close-up view of a toponym glyph and alphabetic gloss for Amiztlan
‘’. Relaciones Geográficas de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

You can read the announcement to read more about their Small Grant Awards and other awardees!

Workshop – Exploring AI for Humanities Research

Katherine Bellamy — Tue, 19 Feb 2019 15:54:15 +0000

The week before last, we organised a two-day workshop, ‘Exploring Artificial Intelligence for Humanities Research’, hosted by the in collaboration with , and funded by the ESRC IAA Business Boost.

Tagtog is an online Artificial Intelligence platform that uses Natural Language Processing and Machine Learning for the automated annotation of documents. The idea for this workshop stemmed from a collaboration between the TAP-ESRC project ‘’ (DECM) and , where tagtog is being used to assist in the annotation and extraction of information from historical documents which predominantly feature sixteenth-century Spanish, but also includes indigenous languages such as Nahuatl, Mixtec, and variants of Mayan. Natural Language Processing and Machine Learning are continually evolving fields, and humanities research which employs tools from these disciplines present new and interesting challenges.

The workshop brought together experts from numerous fields in both the humanities and computer sciences, with the aim of addressing the questions and problems that we encounter in Digital Humanities research, exploring the ways in which we can try to resolve these issues through collaborative working.

Our first day featured a variety of case study presentations by humanities researchers at ��Ƶ:

Patricia Murrieta Flores

Towards the identification, extraction and analysis of information from 16th century colonial sources

In this talk, Patricia explored the ways in which we are identifying, extracting, and analysing information in the Digging into Early Colonial Mexico project. This project is creating and developing novel computational approaches for the semi-automated exploration of thousands of pages of sixteenth-century colonial sources. These sources, known as the Relaciones Geográficas de la Nueva España, are a series of geographic reports which contain a great variety of information about local areas across New Spain, and this project will enable new ways of accessing and analysing the data within. You can read more about how this project has been using tagtog for corpus annotation here.

Find out more about Patricia’s work:

Clare Egan

Using the Records of Early Modern Libel for Spatial Analysis

Clare gave us an introduction to the world of medieval and early modern defamation, with a focus on verse libels. These libels contain a great deal of information, including many spatial references which, with computational methods, could be identified automatically. The records of libel are not digitised, however, work is underway to photograph and transcribe the handwritten sources. The aim of transcribing this handwritten material is to convert it into machine-readable text, which will then allow the process of computational analysis to begin. Extracting data from these sources would enable new analyses of, and new ways of spatially representing, the rich information contained within

Find out more about Clare’s work:

��

Anna Mackenzie

TagTogging Time Lords: using AI and computational methods in developing the first annotated Doctor Who* corpus*

In her talk, Anna demonstrated how she has started the process of annotating episode scripts of Doctor Who, with the aim of developing the first annotated Doctor Who corpus. As a science-fiction corpus, these scripts feature references to unique locations, items, species, and concepts, some of which exist only in the Doctor Who universe. As such, the annotation and subsequent analysis of these scripts present unique challenges to methods in computational text analysis. With over 750 episodes-worth of material, computational analysis of this expansive corpus could offer new insights into how various themes/concepts have been portrayed during the seven decades over which the series has been running.

Find out more about Anna’s work:

James Butler

The Intent, Content, and Context Narratives of Literary Namescapes: Mapping spatial inference

James’ talk gave us an introduction to the ways in which the ��Ƶ research project, Chronotopic Cartographies, is investigating ways of using digital tools to analyse, map, and visualise the spaces of literary texts. References to fictional spaces which cannot be geographically located pose interesting challenges for computational analysis of text. James, with the Chronotopic Cartographies team are exploring ways of tackling this problem. James is also working on refining name roles in order to better contextualise their usage within fiction, which will enable more complex understandings and analyses of these texts.

Find out more about James’ work:

Raquel Liceras Garrido

Archaeological Reports: The case of Numantia

Raquel presented the potential for using computational text analysis for extracting information from historic archaeological reports, with specific reference to the case of Numantia, a site of significant archaeological significance in North-central Spain. There were a series of excavations in the period between 1906 and 1923, which produced a series of reports with crucial spatial, stratigraphic and material textual information. Automatic extraction of information contained within these reports would enable new exploration of the spatial distributions, stratigraphy and materials of this site.

Find out more about Raquel’s work:

��Ƶ

Deborah Sutton

Mapping the Eighteenth-century Carnatic through Digitised Texts

In this talk, Deborah introduced us to cartographies of the eighteenth-century Carnatic (southern India), and some contemporaneous English texts produced in relation to military campaigns, alliances and conquests. These texts make spatial references both in terms of topography and in relation to the value of lands seized through conquest. Computational analysis of these texts would allow these texts to be mapped out and related to living landscapes and allow the relationship between English texts and Indian nomenclatures to be explored.

Find out more about Deborah’s work:

James Taylor

Money talks: the language of finance in the nineteenth-century press

James presented the case of analysing financial columns in the nineteenth-century press, exploring the great variety of information which could be extracted from these texts. Whilst these newspapers have been digitised, automatically isolating the specific sections of the text which feature the financial columns will be the first challenge for extracting the relevant data. Once extracted, analysing these texts could offer the potential for new insights into the way financial information was presented in these nineteenth-century financial columns, as well as how they referred to broader news and themes.

Find out more about James’ work:

Ian Gregory

Geographical Text Analysis

In this final talk, Ian explained the processes used for computational text analysis of a corpus of Lake District writing,��which was employed during a five-year project at ��Ƶ from 2012-2016: Spatial Humanities: Texts, GIS & Places. The corpus consisted of 80 texts published from 1622 to 1900, amounting to 1.5 million words. The text was annotated using an XML schema, and place names extracted and geoparsed, producing a Geographic Information System which could then be used to visualise aspects of data contained within the text. For example, displaying the co-occurrence rate of the word ‘beautiful’ with the identified place-names. This approach enabled a great deal of information to be extracted and analysed, but there is still progress to be made with these computational methods.

Find out more about Ian’s work:

��Ƶ

The second day was hosted by Juan Miguel Cejuela and Jorge Campos of tagtog, with a presentation exploring Machine Learning and Natural Language Processing. The slides from this presentation can be viewed . This was followed by a hands-on session which introduced participants to using the tagtog platform for automated annotation of documents, exploring the ways in which this approach could aid humanities research.

If you’re interested in using tagtog, but you’re not sure where to start, they have some on their website which give just a few examples of the ways in which their tool can be used to computationally analyse and extract data from textual information.

Find out more about tagtog: | |

These two days were a fantastic opportunity to get together with researchers in the humanities and computer sciences, exploring the different ways in which we can work together. We heard about some fascinating Digital Humanities projects and learned a great deal from Juan Miguel and Jorge at tagtog about how Machine Learning and Natural Language Processing work, as well as how best to utilise their wonderful annotation platform at .

We hope to put together some more opportunities for workshops like this – keep an eye out for ��Ƶ Digital Humanities Hub updates: |

DECM at the Spatial Humanities Conference

Katherine Bellamy — Tue, 02 Oct 2018 09:06:42 +0000

A couple of weeks ago, the was held at ��Ƶ (20^th—21^st September). This conference focused on exploring what geospatial technologies such as Geographic Information Systems (GIS) have to contribute to humanities research. The main aim was to explore and demonstrate the contributions to knowledge enabled by these technologies, approaches and methods within and beyond the digital humanities. To read a little more about how the conference progressed, take a look at the published on the ��Ƶ History Department website, and find further discussions at on Twitter.

At this conference, we presented two papers related to the Digging Into Early Colonial Mexico project. On the first day, (University of Lisbon) presented ‘Exploring the challenges of Named Entity Recognition in an historical multilingual corpus: Digging into Early Colonial Mexico’. This paper focused on two of our project’s key aims: the creation of the first digital 16th century Spanish-Nahuatl Colonial Gazetteer, as well as a comprehensive Geographic Information System of New Spain. As we have mentioned in a , our corpus presents a key challenge for Natural Language Processing (NLP) – how do we accurately perform Named Entity Recognition tasks in a multilingual corpus, and particularly one with a combination of European and non-European languages? Bruno’s presentation addressed these challenges and explored ways in which they can be resolved, as well as showing some preliminary results of the NLP experiments carried out so far in our project.

On the second and final day of the conference, (��Ƶ) presented ‘Development of an Historical Place-Name Gazetteer for the Viceroyalty of New Spain’. This paper described the main principles behind the development of our gazetteer, the process used for collecting and integrating data from multiple sources, the resulting software for managing and exporting the data (available as open-source at ), and lessons learned from our efforts that could be useful for the development of similar gazetteers. Our gazetteer has adopted the data model of the , which has already considered the association of places to multiple alternative place-names, feature-types, detailed geographical extents, quality and provenance information, and temporal ranges for all the aforementioned elements. For a visual on how these wonderful tables fit together in our project, click .

In addition to these two presentations on our project, our Principal Investigator, , presented the opening keynote of the conference, ‘Subaltern Spatial Thinking: Reflections on the technological integration of non-western and non-cartographic thinking in Humanities research’. An exploration of how GIS has been adopted in Humanities research and the problems this might bring, Paty spoke about how cartography can be seen as a tool of colonial hegemony and power, and how the Humanities need a critical view on the adoption of this or any technology. Advocating for a decolonial approach to technology, this paper considered examples of Mesoamerican and colonial spatial thinking which may be unfamiliar to a modern and western-centric gaze. The talk included a walk through the , and some of the maps of the 16^th century Relaciones Geográficas.

This opening keynote set a valuable tone for conference, highlighting non-western and non-cartographic thinking in Spatial Humanities research, and the importance of recognising alternate methods of representing and analysing space and place in historical research.

Corpus Annotation with Tagtog

Katherine Bellamy — Mon, 30 Jul 2018 12:26:07 +0000

A key element of our research on the Relaciones Geográficas is the analysis of the textual information contained within these sixteenth century reports. To do this, we will be drawing on techniques from Computer Science, namely Natural Language Processing (NLP) and Machine Learning. Whilst these areas have been studied a great deal, the vast majority of this research has been conducted using modern languages, and largely in English.

Our corpus is, of course, neither modern nor English. The Relaciones Geográficas that we are currently studying were written in the sixteenth century by Spanish officials and contributed to by indigenous people across Mexico. The mix of Spanish and indigenous languages throughout the Relaciones poses a challenge to these computational methods which have, for the most part, been trained on modern text. ��We are therefore faced with the task of training our own NLP system which takes into account the unique challenges presented by the Relaciones Geográficas.

Corpus Annotation

We have recently established a partnership with , an NLP-tech company that has developed an online text annotation tool with the capacity to train models to annotate large quantities of textual information. Tagtog offers a free version which allows a single user to work with up to 100 documents and to make use of its Machine Learning automatic annotation capabilities. For more information on their free and paid plans, head on over to their .

So far, we have been using tagtog to annotate a few excerpts from our corpus. Annotation essentially means assigning metadata about specific terms or phrases in order to train the machine to recognise key words. For example, in the excerpt below, we have tagged “Yenynguia” as a place-name (you may note that this place is also known as Coyula, which can be recorded through the use of dictionaries – as explained later in this post).

Before annotating anything, it is important to define the types of entity that need identifying within the text. We started with a few key categories (such as place-names, institutions and geographic features), and have since expanded to include around forty categories which reflect the diverse nature of information contained within the Relaciones. Whilst this is a considerable number of categories to be using, tagtog is coping wonderfully so far!

Below is an excerpt from the Relacion de Papaloticpac (Antequera) which gives an indication of the sort of information we have been annotating with tagtog.

Within the first 800 words or so of this Relación, we already have useful information highlighting numerous pueblos in the area and their location in relation to one another, the relevant “ilustre” and “muy excelente” señores involved in the production of this report, as well as some indication of geographic features in the area: cerros, sierras and quebradas. This is all valuable information which we want to be identifiable for textual analysis.

Dictionaries

As mentioned earlier, in cases where we have alternate names for one place (Yenynguia = Coyula), it is possible to use dictionaries to tell the machine that these entities are one and the same. With the inconsistencies of spelling throughout the Relaciones Geográficas, normalization of entities is essential. You may be able to see that, within the first few lines of the excerpt above, we are given three different spellings of the named pueblo. After ‘Papaloticpac’, we have ‘Papaloticpaque’ as well as ‘Papalotiquipaque’. Of course, this is referring to the same place, but the machine needs to be told this explicitly. In tagtog, this is made possible through the use of dictionaries which enable the normalization of entities. So, in the case of ‘Papaloticpac’, we would include each spelling in the ‘dictionary’ as follows:

(note that the all-caps instance of the word also has to be included for the machine to recognise this as a match)

Our next steps, once we have annotated some more of the corpus, will be to train a model using the annotations we have created. To do this, we will import some ‘raw’, un-annotated text, which the machine will annotate automatically given what it has learned from our manual annotations and dictionaries. Of course, this will not produce 100% accuracy. We will proceed to manually correct any errors, repeating this process until we produce a high level of accuracy. Training a model which can accurately produce automatic annotations will enable far more intuitive interaction with our multilingual corpus of over 3 million words.

Location, Location, Location

Katherine Bellamy — Wed, 13 Jun 2018 08:16:08 +0000

Following our last post, , in which I mentioned the problems we face in automatized identification of place-names, I thought it would be worthwhile to take a look at the toponyms we are working with, and why using computational approaches will allow us to further our understanding of the Relaciones Geográficas.

One of our first, and ongoing, challenges with this project is the identification of thousands of place-names across Mesoamerica. The source materials for the gazetteer we are currently compiling include:

Rene Acuña’s

Mercedes de la Garza’s

Alejandra Moreno Toscano’s

Francisco del Paso y Troncoso’s

Peter Gerhard’s and

Our first task was cleaning and converting each of these sources into a computer-readable format, allowing us to extract data more easily. OCR was (sometimes) our friend for this part of the process. We were then able to extract all the place-names listed in the indexes of these works (correcting OCR mistakes along the way), leaving us with a list of almost 14,500 toponyms. Of course, many of these are duplicates or alternate spellings of the same place. We are currently disambiguating these place-names to ensure we are referring to the correct location. (I described this process in our post if you’d like a little more detail about this.)

The wordcloud below was created from the full list of toponyms listed in Rene Acuña’s editions of the Relaciones Geográficas, excluding alternate spellings for the same place. If I had included the alternate spellings, the list would have been over 6,200 names. As it was, I inputted a list of around 4,900 toponyms.

The influence of the Spanish language is clear, though not surprising, with names of saints featuring prominently alongside common descriptors such as Río, Valle and Laguna. However, indigenous toponyms remain prevalent, with frequent mentions of specific locations such as Acámbaro, Tlaxcala and Ixtlahuacan. Yucu, a Mixtec word meaning ‘hill’, appears 33 times, no less frequently than Valle. The occurrence of Yucu in this source material was also exclusively within the region of Antequera (currently Oaxaca), explained by the region being home to the convergence of numerous mountain chains, known as the Complejo Oaxaqueño (Oaxaca Complex).

Disambiguating the thousands of place-names which are mentioned in the Relaciones Geográficas will allow us to effectively interact with the source material using computational methods. Using techniques such as Collocation Analysis in conjunction with our gazetteer will open up opportunities for analysing the text in innovative ways, such as identifying associations between locations, entities, topics etc. For example, it should be possible to search for Tlacotepec and determine whether this place has any relationship to another place, person, or concept. Furthermore, it will be possible to search for the specific Tlacotepec which you may be interested in, and any associated alternate names/spellings for that particular place. As the map below demonstrates, place-names are often repeated across, and within, regions. This is why disambiguating our corpus is so important!

At present, we have a total of 3,650 fully disambiguated place names – meaning that we have definite coordinates assigned to these names. You can see a sample of some of these locations on the tab of our website.

We have a fair few more toponyms which are partially located (i.e. we have identified the region in which they lie), and thousands more awaiting disambiguation. We’re approaching the halfway point…just over the next yucu!