DECM Project /digging-ecm Digging into Early Colonial Mexico Sun, 26 Jan 2020 14:57:27 +0000 en-GB hourly 1 https://wordpress.org/?v=6.9.4 Pathways to understanding 16th century Mesoamerica /digging-ecm/2019/07/pathways-to-understanding-16th-century-mesoamerica/ Mon, 29 Jul 2019 09:25:53 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=2783

Two of our team members, Raquel Liceras-Garrido and Katherine Bellamy have recently completed the project ‘Pathways to understanding 16th century Mesoamerican geographies’, funded by the Ƶ Department of History.

This spin-off project has used , combining interactive texts, images and maps in a series of online interactive learning resources on the history, archaeology and geography of the Mesoamerican Postclassical and Colonial period of Central Mexico, beginning in the 14th through to the mid-16th century. These resources are divided into three main areas:

The first of the story maps explores the history of the Mexica people, beginning with their journey to the foundation of Tenochtitlan in 1325, which would become (alongside its neighbour city to the north, Tlatelolco) the heart of the Triple Alliance. Following this, the story map shows how the Mexica began to expand, featuring the lists of conquered settlements as recorded in the Codex Mendoza. This leads up to the arrival of the Spanish, and the ultimate meeting of Moctezuma II and Hernán Cortés in 1519. It then proceeds to describe how Cortés, with considerable assistance from his indigenous allies, conquered Tenochtitlan. This story map concludes with a look at the beginning of the colonial era, exploring how the Spanish began to impose their own institutions across ‘New Spain’, with varying success due to the continuing influence of indigenous institutions across Mesoamerica.

This story map explores the nature of historic place-names across what is currently Mexico, introducing the importance of place-names and language as a tool of colonisation and empire. The story map explores how this tool was used not only by the Spanish, but also by the Mexica and the Triple Alliance (not to mention other indigenous groups), as part of their systematic colonisation of conquered settlements and people. The story map goes on to explore how indigenous place-names continued to be used, despite the processes of colonisation at the hands of both the Triple Alliance and the Spanish. In addition, it explores the meaning of Nahua toponymy in particular – demonstrating the use of suffixes such as -tepec (which means ‘inhabited place’) and showing the distribution of some of these examples. Following this are some case studies of individual place-names, explaining their meaning and how they have been depicted in the historical record. The story map concludes by giving a brief overview of colonial naming, and how indigenous influences have continued.

The final story map discusses depictions of geographic space and place. This starts with an explanation of why this is an important discussion, with particular reference to, and problematisation of, the use of Geographic Information Systems for representing historical geographies. Following this, the story map introduces the idea of representations of space, which may be unfamiliar to the modern reader, and explores the various types of pre-Hispanic Nahuatl documents, including those which represented geographies. The story map gives an introduction to the state of Spanish cartography in the sixteenth-century, before going on to discuss how the Spanish and Nahua traditions of depicting geography began to merge during the conquest of Mexico. There is considerable evidence of this merging of traditions throughout the historical record, which the story map explains, giving two specific examples.

Share This Story!

]]>
Subaltern Recogito: Annotating the sixteenth-century maps of the Geographic Reports of New Spain /digging-ecm/2019/06/subaltern-recogito/ Tue, 11 Jun 2019 12:50:43 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=2242

We’re very pleased to have been awarded a Resource Development Grant to explore the annotation of a series of historic maps using . Our corpus of maps includes those produced in the sixteenth-century for the Relaciones ұDzáھ de Nueva Españaacross the area which is currently Mexico.

On Monday 17th June, in collaboration with our colleagues in the at , the (ENAH), (UNAM), the (INAH), and the , we will be delivering an online workshop which will deliver training on for the annotation of the sixteenth-century maps of the Relaciones ұDzáھ.

We will be working with 27 scholars from and , delivering training on , and presenting an introduction to the Spatial Humanities and the use of these technologies. From here, this will evolve into a citizen science project, where we will meet online every week to take part in ‘mappathons’ with all our participants, completing the annotation of our full corpus of sixteenth-century maps.

Sixteenth-century map of Cempoala featuring annotations created using Recogito

‘’
Relaciones ұDzáھ de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

We will be annotating the full corpus of the (), as well as a number of maps from the and the . These maps are a unique reflection on sixteenth-century settlements in Mexico, drawn using a combination of indigenous and European techniques and ideas. This interplay of indigenous and European voices is a key part of these maps’ significance, offering a unique insight into multiple perceptions of space and place during this crucial period in Mexico’s history. The maps of the RGs contain a great variety of information, both textual and pictographic, which offer invaluable insight into the historical and geographical contexts in which these maps were produced. This information includes proper names in the form of both traditions, logographic Mesoamerican toponyms and people’s names and European alphabetic glosses.

Glyphs and Glosses

Digitally annotating these maps using Recogito gives us a promising opportunity to analyse this corpus, which is not heavily text-based, but features text alongside pictographic depictions of space and place. Annotating both logographic toponyms and alphabetic descriptions and place-names will enable us to better understand the different ways in which Mesoamerican indigenous spatial knowledge and portrayals changed over time, and the processes through which these became ‘subaltern’ to European thinking.

Excerpt of the map of Cempoala showing a toponym glyph representation of the place name

Close-up view of a toponym glyph and alphabetic gloss for Cempoala
‘’. Relaciones ұDzáھ de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

Excerpt of the map of Teguantepec showing a toponym glyph representation of the place name, cerro de tigre

Close-up view of a logographic depiction of the ‘Cerro de tigre’ with alphabetic gloss
‘’. Relaciones ұDzáھ de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

Excerpt of the map of Jojupango showing a toponym glyph representation of the place name, Amiztlan

Close-up view of a toponym glyph and alphabetic gloss for Amiztlan
‘’. Relaciones ұDzáھ de México y Guatemala, 1577-1585. Joaquín García Icazbalceta Manuscript Collection, University of Texas.

You can read the announcement to read more about their Small Grant Awards and other awardees!

Share This Story!

]]>
Test /digging-ecm/2019/05/test/ Sat, 25 May 2019 16:47:12 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=2093

This is an example of a wiki entry in our page. The description of a past geographic location goes here and the image is located on the right side.

]]>
Workshop – Exploring AI for Humanities Research /digging-ecm/2019/02/exploring-ai-workshop/ Tue, 19 Feb 2019 15:54:15 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=1943

The week before last, we organised a two-day workshop, ‘Exploring Artificial Intelligence for Humanities Research’, hosted by the in collaboration with , and funded by the ESRC IAA Business Boost.

Tagtog is an online Artificial Intelligence platform that uses Natural Language Processing and Machine Learning for the automated annotation of documents. The idea for this workshop stemmed from a collaboration between the TAP-ESRC project ‘’ (DECM) and , where tagtog is being used to assist in the annotation and extraction of information from historical documents which predominantly feature sixteenth-century Spanish, but also includes indigenous languages such as Nahuatl, Mixtec, and variants of Mayan. Natural Language Processing and Machine Learning are continually evolving fields, and humanities research which employs tools from these disciplines present new and interesting challenges.

The workshop brought together experts from numerous fields in both the humanities and computer sciences, with the aim of addressing the questions and problems that we encounter in Digital Humanities research, exploring the ways in which we can try to resolve these issues through collaborative working.

Photo showing the opening workshop presentation by Dr Patricia Murrieta Flores

Our first day featured a variety of case study presentations by humanities researchers at Ƶ:

Patricia Murrieta Flores

Towards the identification, extraction and analysis of information from 16th century colonial sources

In this talk, Patricia explored the ways in which we are identifying, extracting, and analysing information in the Digging into Early Colonial Mexico project. This project is creating and developing novel computational approaches for the semi-automated exploration of thousands of pages of sixteenth-century colonial sources. These sources, known as the Relaciones ұDzáھ de la Nueva España, are a series of geographic reports which contain a great variety of information about local areas across New Spain, and this project will enable new ways of accessing and analysing the data within. You can read more about how this project has been using tagtog for corpus annotation here.

Find out more about Patricia’s work:

Clare Egan

Using the Records of Early Modern Libel for Spatial Analysis

Clare gave us an introduction to the world of medieval and early modern defamation, with a focus on verse libels. These libels contain a great deal of information, including many spatial references which, with computational methods, could be identified automatically. The records of libel are not digitised, however, work is underway to photograph and transcribe the handwritten sources. The aim of transcribing this handwritten material is to convert it into machine-readable text, which will then allow the process of computational analysis to begin. Extracting data from these sources would enable new analyses of, and new ways of spatially representing, the rich information contained within

Find out more about Clare’s work:

Anna Mackenzie

TagTogging Time Lords: using AI and computational methods in developing the first annotated Doctor Who corpus

In her talk, Anna demonstrated how she has started the process of annotating episode scripts of Doctor Who, with the aim of developing the first annotated Doctor Who corpus. As a science-fiction corpus, these scripts feature references to unique locations, items, species, and concepts, some of which exist only in the Doctor Who universe. As such, the annotation and subsequent analysis of these scripts present unique challenges to methods in computational text analysis. With over 750 episodes-worth of material, computational analysis of this expansive corpus could offer new insights into how various themes/concepts have been portrayed during the seven decades over which the series has been running.

Find out more about Anna’s work:

James Butler

The Intent, Content, and Context Narratives of Literary Namescapes: Mapping spatial inference

James’ talk gave us an introduction to the ways in which the Ƶ research project, Chronotopic Cartographies, is investigating ways of using digital tools to analyse, map, and visualise the spaces of literary texts. References to fictional spaces which cannot be geographically located pose interesting challenges for computational analysis of text. James, with the Chronotopic Cartographies team are exploring ways of tackling this problem. James is also working on refining name roles in order to better contextualise their usage within fiction, which will enable more complex understandings and analyses of these texts.

Find out more about James’ work:

Raquel Liceras Garrido

Archaeological Reports: The case of Numantia

Raquel presented the potential for using computational text analysis for extracting information from historic archaeological reports, with specific reference to the case of Numantia, a site of significant archaeological significance in North-central Spain. There were a series of excavations in the period between 1906 and 1923, which produced a series of reports with crucial spatial, stratigraphic and material textual information. Automatic extraction of information contained within these reports would enable new exploration of the spatial distributions, stratigraphy and materials of this site.

Find out more about Raquel’s work:

Ƶ

Deborah Sutton

Mapping the Eighteenth-century Carnatic through Digitised Texts

In this talk, Deborah introduced us to cartographies of the eighteenth-century Carnatic (southern India), and some contemporaneous English texts produced in relation to military campaigns, alliances and conquests. These texts make spatial references both in terms of topography and in relation to the value of lands seized through conquest. Computational analysis of these texts would allow these texts to be mapped out and related to living landscapes and allow the relationship between English texts and Indian nomenclatures to be explored.

Find out more about Deborah’s work:

James Taylor

Money talks: the language of finance in the nineteenth-century press

James presented the case of analysing financial columns in the nineteenth-century press, exploring the great variety of information which could be extracted from these texts. Whilst these newspapers have been digitised, automatically isolating the specific sections of the text which feature the financial columns will be the first challenge for extracting the relevant data. Once extracted, analysing these texts could offer the potential for new insights into the way financial information was presented in these nineteenth-century financial columns, as well as how they referred to broader news and themes.

Find out more about James’ work:

Ian Gregory

Geographical Text Analysis

In this final talk, Ian explained the processes used for computational text analysis of a corpus of Lake District writing,which was employed during a five-year project at Ƶ from 2012-2016: Spatial Humanities: Texts, GIS & Places. The corpus consisted of 80 texts published from 1622 to 1900, amounting to 1.5 million words. The text was annotated using an XML schema, and place names extracted and geoparsed, producing a Geographic Information System which could then be used to visualise aspects of data contained within the text. For example, displaying the co-occurrence rate of the word ‘beautiful’ with the identified place-names. This approach enabled a great deal of information to be extracted and analysed, but there is still progress to be made with these computational methods.

Find out more about Ian’s work:

Ƶ

The second day was hosted by Juan Miguel Cejuela and Jorge Campos of tagtog, with a presentation exploring Machine Learning and Natural Language Processing. The slides from this presentation can be viewed . This was followed by a hands-on session which introduced participants to using the tagtog platform for automated annotation of documents, exploring the ways in which this approach could aid humanities research.

If you’re interested in using tagtog, but you’re not sure where to start, they have some on their website which give just a few examples of the ways in which their tool can be used to computationally analyse and extract data from textual information.

Find out more about tagtog: | |

These two days were a fantastic opportunity to get together with researchers in the humanities and computer sciences, exploring the different ways in which we can work together. We heard about some fascinating Digital Humanities projects and learned a great deal from Juan Miguel and Jorge at tagtog about how Machine Learning and Natural Language Processing work, as well as how best to utilise their wonderful annotation platform at .

We hope to put together some more opportunities for workshops like this – keep an eye out for Ƶ Digital Humanities Hub updates: |

]]>
DECM at the Spatial Humanities Conference /digging-ecm/2018/10/decm-at-shum/ Tue, 02 Oct 2018 09:06:42 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=1855

A couple of weeks ago, the was held at Ƶ (20th—21st September). This conference focused on exploring what geospatial technologies such as Geographic Information Systems (GIS) have to contribute to humanities research. The main aim was to explore and demonstrate the contributions to knowledge enabled by these technologies, approaches and methods within and beyond the digital humanities. To read a little more about how the conference progressed, take a look at the published on the Ƶ History Department website, and find further discussions at on Twitter.

Photo showing the opening presentation at the Spatial Humanities Conference

At this conference, we presented two papers related to the Digging Into Early Colonial Mexico project. On the first day, (University of Lisbon) presented ‘Exploring the challenges of Named Entity Recognition in an historical multilingual corpus: Digging into Early Colonial Mexico’. This paper focused on two of our project’s key aims: the creation of the first digital 16th century Spanish-Nahuatl Colonial Gazetteer, as well as a comprehensive Geographic Information System of New Spain. As we have mentioned in a , our corpus presents a key challenge for Natural Language Processing (NLP) – how do we accurately perform Named Entity Recognition tasks in a multilingual corpus, and particularly one with a combination of European and non-European languages? Bruno’s presentation addressed these challenges and explored ways in which they can be resolved, as well as showing some preliminary results of the NLP experiments carried out so far in our project.

On the second and final day of the conference, (Ƶ) presented ‘Development of an Historical Place-Name Gazetteer for the Viceroyalty of New Spain’. This paper described the main principles behind the development of our gazetteer, the process used for collecting and integrating data from multiple sources, the resulting software for managing and exporting the data (available as open-source at ), and lessons learned from our efforts that could be useful for the development of similar gazetteers. Our gazetteer has adopted the data model of the , which has already considered the association of places to multiple alternative place-names, feature-types, detailed geographical extents, quality and provenance information, and temporal ranges for all the aforementioned elements. For a visual on how these wonderful tables fit together in our project, click .

In addition to these two presentations on our project, our Principal Investigator, , presented the opening keynote of the conference, ‘Subaltern Spatial Thinking: Reflections on the technological integration of non-western and non-cartographic thinking in Humanities research’. An exploration of how GIS has been adopted in Humanities research and the problems this might bring, Paty spoke about how cartography can be seen as a tool of colonial hegemony and power, and how the Humanities need a critical view on the adoption of this or any technology. Advocating for a decolonial approach to technology, this paper considered examples of Mesoamerican and colonial spatial thinking which may be unfamiliar to a modern and western-centric gaze. The talk included a walk through the , and some of the maps of the 16th century Relaciones ұDzáھ.

This opening keynote set a valuable tone for conference, highlighting non-western and non-cartographic thinking in Spatial Humanities research, and the importance of recognising alternate methods of representing and analysing space and place in historical research.

]]>
Corpus Annotation with Tagtog /digging-ecm/2018/07/annotating-with-tagtog/ Mon, 30 Jul 2018 12:26:07 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=1660

A key element of our research on the Relaciones ұDzáھ is the analysis of the textual information contained within these sixteenth century reports. To do this, we will be drawing on techniques from Computer Science, namely Natural Language Processing (NLP) and Machine Learning. Whilst these areas have been studied a great deal, the vast majority of this research has been conducted using modern languages, and largely in English.

Our corpus is, of course, neither modern nor English. The Relaciones ұDzáھ that we are currently studying were written in the sixteenth century by Spanish officials and contributed to by indigenous people across Mexico. The mix of Spanish and indigenous languages throughout the Relaciones poses a challenge to these computational methods which have, for the most part, been trained on modern text. We are therefore faced with the task of training our own NLP system which takes into account the unique challenges presented by the Relaciones ұDzáھ.

Corpus Annotation

We have recently established a partnership with , an NLP-tech company that has developed an online text annotation tool with the capacity to train models to annotate large quantities of textual information. Tagtog offers a free version which allows a single user to work with up to 100 documents and to make use of its Machine Learning automatic annotation capabilities. For more information on their free and paid plans, head on over to their .

So far, we have been using tagtog to annotate a few excerpts from our corpus. Annotation essentially means assigning metadata about specific terms or phrases in order to train the machine to recognise key words. For example, in the excerpt below, we have tagged “Yenynguia” as a place-name (you may note that this place is also known as Coyula, which can be recorded through the use of dictionaries – as explained later in this post).

an excerpt from the Relacion de Papaloticpac (in Antequera) which shows some annotation of our corpus using the tagtog interface

Before annotating anything, it is important to define the types of entity that need identifying within the text. We started with a few key categories (such as place-names, institutions and geographic features), and have since expanded to include around forty categories which reflect the diverse nature of information contained within the Relaciones. Whilst this is a considerable number of categories to be using, tagtog is coping wonderfully so far!

Below is an excerpt from the Relacion de Papaloticpac (Antequera) which gives an indication of the sort of information we have been annotating with tagtog.

a screenshot showing an excerpt from the Relacion de Papaloticpac (in Antequera) which shows the tagtog interface

Within the first 800 words or so of this ó, we already have useful information highlighting numerous pueblos in the area and their location in relation to one another, the relevant “ilustre” and “muy excelente” señores involved in the production of this report, as well as some indication of geographic features in the area: cerros, sierras and quebradas. This is all valuable information which we want to be identifiable for textual analysis.

Dictionaries

As mentioned earlier, in cases where we have alternate names for one place (Yenynguia = Coyula), it is possible to use dictionaries to tell the machine that these entities are one and the same. With the inconsistencies of spelling throughout the Relaciones ұDzáھ, normalization of entities is essential. You may be able to see that, within the first few lines of the excerpt above, we are given three different spellings of the named pueblo. After ‘Papaloticpac’, we have ‘Papaloticpaque’ as well as ‘Papalotiquipaque’. Of course, this is referring to the same place, but the machine needs to be told this explicitly. In tagtog, this is made possible through the use of dictionaries which enable the normalization of entities. So, in the case of ‘Papaloticpac’, we would include each spelling in the ‘dictionary’ as follows:

an example to show how to format a dictionary entry in tagtog

(note that the all-caps instance of the word also has to be included for the machine to recognise this as a match)

Our next steps, once we have annotated some more of the corpus, will be to train a model using the annotations we have created. To do this, we will import some ‘raw’, un-annotated text, which the machine will annotate automatically given what it has learned from our manual annotations and dictionaries. Of course, this will not produce 100% accuracy. We will proceed to manually correct any errors, repeating this process until we produce a high level of accuracy. Training a model which can accurately produce automatic annotations will enable far more intuitive interaction with our multilingual corpus of over 3 million words.

]]>
Location, Location, Location /digging-ecm/2018/06/location-location-location/ Wed, 13 Jun 2018 08:16:08 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=1623

Following our last post, , in which I mentioned the problems we face in automatized identification of place-names, I thought it would be worthwhile to take a look at the toponyms we are working with, and why using computational approaches will allow us to further our understanding of the Relaciones ұDzáھ.

One of our first, and ongoing, challenges with this project is the identification of thousands of place-names across Mesoamerica. The source materials for the gazetteer we are currently compiling include:

Rene Acuña’s

Mercedes de la Garza’s

Alejandra Moreno Toscano’s

Francisco del Paso y Troncoso’s

Peter Gerhard’s and

Our first task was cleaning and converting each of these sources into a computer-readable format, allowing us to extract data more easily. OCR was (sometimes) our friend for this part of the process. We were then able to extract all the place-names listed in the indexes of these works (correcting OCR mistakes along the way), leaving us with a list of almost 14,500 toponyms. Of course, many of these are duplicates or alternate spellings of the same place. We are currently disambiguating these place-names to ensure we are referring to the correct location. (I described this process in our post if you’d like a little more detail about this.)

The wordcloud below was created from the full list of toponyms listed in Rene Acuña’s editions of the Relaciones ұDzáھ, excluding alternate spellings for the same place. If I had included the alternate spellings, the list would have been over 6,200 names. As it was, I inputted a list of around 4,900 toponyms.

wordcloud comprised of toponyms mentioned in Rene Acuña's editions of the Relaciones Geograficas

The influence of the Spanish language is clear, though not surprising, with names of saints featuring prominently alongside common descriptors such as Río, Valle and Laguna. However, indigenous toponyms remain prevalent, with frequent mentions of specific locations such as Acámbaro, Tlaxcala and Ixtlahuacan. Yucu, a Mixtec word meaning ‘hill’, appears 33 times, no less frequently than Valle. The occurrence of Yucu in this source material was also exclusively within the region of Antequera (currently Oaxaca), explained by the region being home to the convergence of numerous mountain chains, known as the Complejo Oaxaqueño (Oaxaca Complex).

Disambiguating the thousands of place-names which are mentioned in the Relaciones ұDzáھ will allow us to effectively interact with the source material using computational methods. Using techniques such as Collocation Analysis in conjunction with our gazetteer will open up opportunities for analysing the text in innovative ways, such as identifying associations between locations, entities, topics etc. For example, it should be possible to search for Tlacotepec and determine whether this place has any relationship to another place, person, or concept. Furthermore, it will be possible to search for the specific Tlacotepec which you may be interested in, and any associated alternate names/spellings for that particular place. As the map below demonstrates, place-names are often repeated across, and within, regions. This is why disambiguating our corpus is so important!

Map displaying multiple occurrences of the toponym Tlacotepec across central Mexico

At present, we have a total of 3,650 fully disambiguated place names – meaning that we have definite coordinates assigned to these names. You can see a sample of some of these locations on the tab of our website.

We have a fair few more toponyms which are partially located (i.e. we have identified the region in which they lie), and thousands more awaiting disambiguation. We’re approaching the halfway point…just over the next yucu!

]]>
Extracting and creating data from the Geographic Relations /digging-ecm/2018/05/extracting-and-creating-data-from-the-geographic-relations/ /digging-ecm/2018/05/extracting-and-creating-data-from-the-geographic-relations/#comments Thu, 03 May 2018 15:29:49 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=1581

Over the past few months, our team have been laying the groundwork for our research into the Relaciones ұDzáھ and getting to grips with our source material. Here’s a sneak-peak of our exponentially expanding GIS place-name layers:

The sheer size and non-standardised format of the Relaciones has meant that studying these documents has previously relied on close-reading of the texts, limiting the scope of research. Approaching this study from an interdisciplinary perspective offers us the chance to engage with innovative computational methodologies to create new opportunities for the exploration and study of these historic documents, improving accessibility and broadening the scope for research.

Some of the key problems we aim to address include: the capability of computational methods in dealing with multilingual corpora, the ambiguous nature of many place-names mentioned within the Relaciones, and the general inaccessibility of historical texts as large and complex as this.

We will be tackling these problems collaboratively as an interdisciplinary team, ensuring that our research contributes to the advancements of each of our fields of study. Each team brings their own expertise to the project, and by working collaboratively we are better equipped to tackle the problems posed by large historical source materials such as the Relaciones ұDzáھ.

One of the key challenges we face with the Relaciones ұDzáھ is that of linguistics. This multilingual corpus features a combination of Spanish and a number of indigenous languages (predominantly Nahuatl) throughout. The excerpt below demonstrates one of the linguistic issues we face in dealing with these historic documents. “Hun 4at” and “Oxi 4ahol” are the indigenous names in Mayan Quiche for two volcanoes referenced in the ó de Santiago پٱá.

With Natural Language Processing systems usually being trained with modern news text, they would be unable to recognise and tag words in an indigenous language such as Quiche, especially with the unfamiliar usage of a numerical character in a place-name. Computational methods for the analysis of language are continually improving, though their use in the analysis of historical texts and non-English languages still presents many challenges. Our project aims to address these problems and improve methods for the analysis of complex historical documents such as the Relaciones ұDzáھ.

Feel free to leave your comments and/or . If you would like to read more about individual members of our team, please see our page.

Share This Story!

]]>
/digging-ecm/2018/05/extracting-and-creating-data-from-the-geographic-relations/feed/ 2
Historical GIS /digging-ecm/2018/05/historical-gis/ Wed, 02 May 2018 16:13:09 +0000 http://www.lancaster.ac.uk/digging-ecm/?p=1588

A considerable amount of our time on the project so far has been dedicated to GIS, and to the creation of maps which reflect geographic boundaries as they existed in sixteenth century Middle America. The work of and has been an invaluable source of information, providing detailed explorations of the historical geography of New Spain in the years following Spanish arrival.

The maps Gerhard produced focused largely on the administrative boundaries imposed by the Spanish in the sixteenth century. We have used these maps as a starting point, creating GIS layers which reflect boundaries as depicted by Gerhard:

Whilst some of these administrative units were built upon pre-existing and well-established indigenous systems of governance, they ultimately reflect a Spanish administrative geography, divided into dioceses, audiencias, and provincias, with identified place-names being the sites of Spanish alcaldia mayores and corregimientos.

Of course, Middle America’s historical geography did not begin with the establishment of Spanish administrative units, and we aim to produce a more representative image of Middle America’s sixteenth century geography. The identification of historic place-names will be key to our understanding ofindigenous and historic geographies, and we are currently creating GIS layers which locate these place-names.

The process for doing this has been straightforward but very time-consuming, as it could only be semi-automated. We began with the digitisation of various geographic indexes included in the works of Peter Gerhard, Mercedes de la Garza, René Acuña, Francisco del Paso y Troncoso and Alejandra Moreno Toscano. These indexes contain thousands of place-names referenced in the Relaciones ұDzáھ. With these lists compiled, we then had the task of cross-referencing, or joining, these tables with existing location data, derived from a number of sources including historical secondary sources and .

Of course, changes over time and variations in spelling mean that joining these sets of data cannot rely solely on a computer recognising identical matches. Our UK Research Associate, Dr Raquel Liceras-Garrido in collaboration with our Mexico Research Associate Mariana Favila-Vázquez, has been spearheading the painstaking task of locating the thousands of place-names for which there was no match with existing geographic data. Already a monumental undertaking, the process is further complicated by the fact that there are often numerous names for the same place (and these numerous names often have numerous alternative spellings!) For example, throughout the historical record, present-day Ixtacamaxtitlán in Puebla state is known variously as San Francisco Iztaquimaxtitlan, S Francisco Iztaquimaxtitlan, Istac-ymachtitlan, Estacquimestitlan, Itztaquimitztitlan and Castilblanco.

Place names are often repeated across America (don’t get us started on San Juan) and there are numerous cases where it is not possible to determine the exact location of a place-name. The expertise of our colleagues in Mexico, Mariana Favila Vázquez and Dr Diego Jimenez-Badillo, has been invaluable in the disambiguation of hundreds of these place-names. For place-names which we have been unable to locate, our Mexican team have undertaken historical research in order to assign them coordinates. For those place-names which have managed to evade all our investigations (for now!), our Mexico team have so far been able to at least determine the region in which they lie.

This research will also prove invaluable in our experiments conducted in collaboration with the Portuguese team on automatic multilingual Named entity Recognition and place-name linguistic and geographic disambiguation, which is the next stage in our research.

Share This Story!

]]>