Coordinating Multi-language Translation Projects
I’ve been asked to give three presentations today and tomorrow, and I think the best contribution I can make is to share our practical experience from Berzin Archives on how to deal with organizing these topics – multi-language projects, terminology, and cooperation among translators – since perhaps our experience can suggest organizational methods applicable to this larger project that we are undertaking here. Also, I’m really not familiar with what’s available in Polish or Portuguese, and so on, so I can’t really report on that, let alone Arabic or Urdu. Our online project can perhaps offer a model, since I feel that the most appropriate media for the Tengyur project is online, but supplemented with printed versions as well. Also, I have to apologize for not having any PowerPoint material here to supplement my talks, because my being able to attend this conference was uncertain up until the last minute and so I had no time to prepare them.
In any case, today I’d like to speak about dealing with multi-language translations and how to coordinate them. Nowadays we are working actively with ten languages. We have nine of them already online: English, German, French, Spanish, Portuguese, Russian, Polish, Arabic, and Urdu. And the tenth one, Chinese, we hope to be able to get that online this year.
Now how do you coordinate and try to bring some order into this possible chaos of dealing with all these languages? And, first of all, we have certain guidelines which are shared in common by all languages; and that is a standard format, so that it is possible to integrate everything into one online system in a manner that is highly automated for uploading and formatting. Also we have a list of priorities of what can be done in a new language section and what needs to be done in the first round, and the second round, and so on. Of course in the case of this Seventeen Pandits project, that priority list would be dependent on what’s already been translated into these languages, their evaluation, and especially is the copyright available – this might not be the case and we might not be able to use them. And of course, as Luis Gomez pointed out, in certain languages, certain things might have higher priority than others.
Each language section that we have has a manager, a chief editor and a group of editors, a group of translators, and copy-editors. And to manage all of that, we have a menu database tool of all the items on the website with their priority numbers. And so here in our case of the Tengyur Project, we’d need to have a database of each of the materials that we’re going to translate. And, in each language section, we have fields within this where the status of the work – so a field for translation, for editing, for copy-editing, and for going online. And each of these has options that can be filled in, in a box, choices. So, what needs to be done; what has been sent for translation or editing; to whom; the date it’s been sent and the date that it’s returned. And this is very useful because then you can filter in order to assign new tasks when a particular worker has finished; and then we filter to see what’s still available in their area.
Also our manager then… Well, I think for our project here, we have to add another field which indicates what the source language has been that the translator works with: whether it’s Tibetan or Sanskrit or both. Or what I think will undoubtedly have to be the case is that if we can’t find qualified translators in all these various languages, that we plan who can deal with these original sources. They may have to deal with translating from other languages – whether it’s English, Russian, or whatever – and then indicate what language that source has used. And I think that this would mean then, let’s say… I’ll give the example: if they’re only working from Tibetan – since many of us have experienced that Tibetan has difficulty conveying all the different verb tenses from Sanskrit, and sometimes the ablative and dative cases get confused, and so on – things need to be referred back to a Sanskrit scholar in order to modify the tenses, if Sanskrit is available, in order to have it more accurate to the original.
We have a manager for each language section – usually that’s the editor-in-chief, although it could be somebody else – that assigns the tasks and keeps all the records in the online menu tool, which we keep on a network drive which is accessible only to the language section managers. We also have a personnel database, which I think is important to have, with the relevant information of the qualifications of each person and what tasks each of them can do, their availability of time, and so on. Also on a network drive, we keep the current and old versions of each work, at each stage of the work that it’s been done, for security purposes; and these are sorted, of course, according to whether it’s the rough translation, edited, copy-edited, etc.
When things are ready to go online then it goes to our technical people. We have people who do the uploading and the formatting. And because we have a standard format, then we can automate a great deal of this process, which makes it much, much easier. We have two people doing this now and each of them has familiarity with a number of different languages, and that’s very important so that they’re not afraid of dealing with the language. And also when various changes and emendations (like spelling mistakes, which inevitably are there) and so on – they can easily locate them with search tools, and deal with endings and so on without any problem.
Also what we do, which I think I’d recommend be done in this project as well, is that the manager for each language section makes a weekly progress report, for each section, of what’s been done. And the project manager (myself) puts that all together and we publish this online each week. And this is particularly to keep the donors and patrons happy, to demonstrate that we’re actually doing something every week. And this is something that I would recommend very much.
Also online, you have the option to be able to look at other language versions of the same article. And this is very, very helpful for many of our readers. For instance, if you are reading in Polish, for example, then you might have more of your Dharma background in reading in English. And although you appreciate the Polish version, you might want to check back for the English to see what’s been done there. Or, for instance, Spanish and Portuguese; the Portuguese readers are very often consulting Spanish as well.
So these are some of our experiences. And although, as I said, I’ve been asked to give three presentations, what I want to present doesn’t fit very conveniently into each of these three: they will slightly overlap. But this is the first part.
The Importance of Glossary Tools
I’d like to continue what I started yesterday by sharing practical advice from the work, the multi-language work I’ve been doing with Berzin Archives. In my Berzin Archives, we do have standardized terminology in each of the language sections, based on my English; but sometimes, of course, for some terms we’ll have two or three variants that are used. And the main reason for that is benefit of others; benefit of the readers. So that everything in the website is searchable – both in our internal search engines and also in Google – so they can find things. Now we can’t expect that in each language section of the Tengyur Project that we’re going to have standardization. But if we extrapolate from this, I think it would be extremely beneficial if each translator is consistent in all the works that he or she does, in terms of the terms that they use, and they also limit themselves to two or three variants at the most for one term.
So the question is how to help bring order to this potential chaos for the reader, with such a variety of terms. And the way that we have done this, to organize and manage, is – may I suggest a way – and this is with our glossary tools that we’ve created both for the benefit of ourselves as the translators and benefit for others, the readers. And this is for each of our language sections. Each of them has three glossaries: one of technical terms, one of text titles, and one of proper names – spelling of proper names. This is all on our internal network. For the Tengyur Project, I think it’s extremely important that – at least in terms of names of persons, places, and so on – that within one language section that it be standardized. Otherwise, again, it’s impossible to search on how you spell the names because it’s quite different in different languages. And then each translator, of course, would have their own terms and their own way of translating text titles. But, for this, if they translate a text that has been – in their text that they are translating, if it quotes a text that has been translated by another translator then, again, for searchability, I think they need to use the translation of that title that appears in our corpus from the other translator.
Now for us, once we have our basic terms in our glossary tool put in by the chief editor for a section with the Tibetan and Sanskrit and English, if also it’s from another language – let’s say German – each translator then compiles an Excel file of new terms that have not been dealt with before in other works in that section, and then periodically it’s sent to the glossary manager and can be automatically imported into the glossary tool without any trouble. The translators themselves only have reader access to the tools. And just as we have a mega-glossary for all the languages, in which each language has a section, I think for the Tengyur Project we could have each language within that structure have a subsection for each translator. And each translator would compile their own Excel file of terms – with the Sanskrit and Tibetan and their own language – and periodically import them into the tool. We’ve also imported Jeffrey Hopkins’s glossary into this, so that we have Hopkins’s equivalents for all of our terms as well.
For us, within the tools, each term has a page or text title, and so on, arranged according to English – but it can also be arranged according to any other language as the primary language – and then on the side it shows the equivalent translation, in a bar, for all other languages; and also Jeffrey Hopkins’s terminology, since this sometimes helps to understand the terms. So for the Tengyur Project, within each language it could be arranged according to – each term could be arranged according to any translator as the primary sort and then, on the sidebar, equivalent translations of the other team members, so that you can correlate and see what they’ve been doing. And also links to check with other languages for the terms, since this is very helpful – let’s say between Spanish and Portuguese.
We also have in the glossary tools the definitions in English, and then one translator in each language section has the task to translate these definitions into the language of their section. And, of course, several terms have different – several definitions. And I think that we’re going to need something similar in the Tengyur glossary tool. Also we indicate, in an option here, if the term is to go into online glossaries or just for internal use for translators. For instance, how do we deal with parama and uttama and shri and these sort of terms, that it would be nice to be consistent, but is not so much of use to the reader.
We also have the option to change the way that we translate a term. And if we do so, then there’s a box to indicate whether the status of these changes has been integrated into all the works within the website. Only the chief editor has access to make these changes within the glossary, in order to minimize several people trying to edit the thing at the same time. It always gets locked if somebody is actually editing, so that it’s always saved in a full manner.
By organizing our terminology in this way, it’s now possible to import glossary features automatically into the website from this tool for the benefit of the readers. And this I think needs to be our primary concern. It’s not so much benefit for ourselves as translators, but benefit for the readers. What we produce has to be useable and searchable.
So the biggest problem readers face now is correlating what they read in one text by one translator with what they read in another by another translator, especially if they don’t know Tibetan and Sanskrit. So our system generates online glossaries automatically with, let’s say, English, Sanskrit, Tibetan, and the definitions. Or if it’s in another language section, let’s say German, together with the English, Sanskrit, Tibetan, and the German definition. And it also automatically generates pop-up windows that appear when you put your cursor over a technical term in any text or document. A pop-up window comes up with the definition and also the equivalent in Jeffrey Hopkins’s terminology, so that if people are more familiar with that then they can correlate with what they’ve read elsewhere. Eventually we’ll take the Tibetan and Sanskrit out of the text, where now it’s present in parentheses, and also add it to the pop-up window so it doesn’t clutter the text. For the Tengyur Project, I would recommend doing the same with pop-up windows, but being selective in how many variants you include so you don’t flood the reader with too much information and it turns them off. If a term has several definitions, when you input the material you could tag in the glossary tool which definition’s applicable in which text, so only that definition will pop up.
For general reference and for study, I think it’s important to develop a tool that can be used for this project which would be a modification of Google translation tools, but fed from the glossary. So you would have two fields, one with source translation and one with target translation terms, with a pop-up of which ones are available. So you could input a term from one translator in the source field and get the equivalent in the target field of another translator’s terminology.
Also, I think it would be helpful to develop an option to switch from some terminology in a text into another system of terminology that the reader is more familiar with. It would require some sophisticated programming in the case of languages with case and gender inflection so that adjectives, nouns, and verbs in the whole sentence would agree with the change of the term. Also, I think it would be helpful to have an option, like we have in our online texts, to switch to another language version of the same text so that multilingual readers can improve their understanding. For instance, Spanish and Portuguese; or from their own language and the English version.
And eventually – last point – I think it’s important to develop a search engine for the entire corpus. So that you could enter a term and indicate who was the translator for this term, and then the search engine would search on the basis of the source Sanskrit and Tibetan term that is being translated here and it would give all references to the term in all translation variants, highlighting the term in the passages, so that you can recognize what it is.
I think by trying to adopt some of these methods that we’ve developed, it could help the reader very, very much in dealing with the inevitable variety of translation terms that are going to be used in the project.
Collaborating through a Wiki
I’d like to continue my presentation of the tools that we’ve developed with the Berzin Archives, specifically in this context: how we facilitate cooperation among our teams and within each team too. What we have created is an internal wiki, a Wikipedia type of tool, on our network drive. It’s intended for the technical team, the audio team, and the translation teams.
So for the technical team, for instance, it’s just organizational: how you put a project like this together. It has the full instructions for how to format and upload articles into the system, so new people can be trained to do that. It has full documentation of the various functions and tools that our really brilliant German and Russian teams have developed, so we can deal with the technical side. It also has full instructions how to edit the wiki, and a full instruction manual for using all our menu and glossary tools.
For the audio teams, then, we have instructions here on how to edit audio files; how to make pure English versions from our bilingual audio versions; how to transcribe single language and bilingual audio files; how to link to tools to do this. So that all the workers on the project have access to how to do things; and organizing that, so that they don’t have to figure it out themselves.
For the translation teams, we have a general section – by the way, everybody who works on the project then gets access to this wiki – we have a general section relevant to all language sections, and then the sections for each language. What’s common to everybody is the general format for all documents and material that is going to be submitted for the project.
And then one feature – which I think would be very, very useful to have – concerns questions on particular texts. So whenever there’s a text from my archives that the translators have questions with, we create a folder for that in the wiki. And they ask questions, and then I answer them. They ask about the meaning of a text – what this means, and so on – and then I post the question and my answer in this section in the wiki, in the folder for that text. And often a translator will ask maybe twenty or thirty questions about a text. And by having it in the wiki, then, when translators in another language section come to translate the same text, they can see if their questions have already been asked and answered here, and they can also check their understanding. In the case of the Tengyur Project, we could post a question about a text and the translators can be notified by email, like it’s done in the Lotsawa forum, or it could just be implemented from the Lotsawa forum, and then the questions and answers could be posted here for use of translations of the same text into other languages, the way that we do that.
We haven’t implemented this yet, but what we would like to do is that the editors of the language section who ask the question could flag in the menu tool that the text has questions and answers in the wiki, so that when the text gets assigned in another language section, the translator and editor can be notified beforehand that there is material in the wiki concerning that text – of problematic passages.
Then another feature in this common thing – Some languages, like German and Russian, are allergic to vague references: this and that. And even if you don’t fill in what this or that refers to, you still need a gender for the this and that. And for the Sanskrit gender of the this and that, the references are not always obvious; and the Tibetan doesn’t help at all in terms of that. So if people ask questions about that also – in the same manner, as just general questions about a text – then these issues can be filled in, then used by the people who are working on this text in another language. And if the questions and answers are in a language that the people don’t know, they could always use translation tools, like Google and Babylon, to get a rough translation of what the questions and answers are.
Also, some languages have difficulty expressing intransitive verb structures for certain verbs; verbs that can only be used in the transitive and they need an object. This is particularly the case, for instance, in Russian and Chinese, where you have several verbs that can be used for one English verb or some Sanskrit verb. Depending on what the object is, you would have a different verb that’s used. And, again, these type of questions could be dealt with and that information placed in the wiki.
Then, for specific languages, one of the big problems is how you’re going to transliterate Sanskrit and Tibetan names. For instance, like “bodhisattva.” Well, the aspirated ‘d’. In Arabic, you don’t do aspirated letters. In Urdu you add in a “ha” into that. In Tamil you don’t have ‘d’ and ‘t’; you only have one for all four of these things. So all these issues were dealt with. I developed a way of transliterating Chinese, Tibetan, and Sanskrit names into Arabic and Urdu, since there weren’t any systems for that. So you put that there, so that the people who work in that section can use it consistently.
Then also there are general translation styles that can also be dealt with in this wiki type of fashion. For instance, some languages don’t have all the verb tenses of English, and they certainly don’t have all the verb tenses of Sanskrit. Tibetan is a good example of that, or Chinese is an even stronger example. So you have an area for discussion of how you are going to handle the distinctions of the different tenses in your language. And eventually, through a discussion that can be held on the wiki so everybody can read it you might come up with a guideline based on a general agreement, or several options to use, if there’s no consensus. Or how you’re going to break up yat-tat type of complex sentences, which in some languages you simply can’t replicate, like Urdu, for example.
Sanskrit likes to repeat words in poetical forms. Tibetan is okay with this, but this is considered extremely bad style in some languages, for instance, Russian. So my Russian translator’s always complaining about that in verses, and they want to change it. And so then how are you going to deal with this? You can have a discussion.
There’s also the big problem of classical versus colloquial language – in regard to style, in regard to grammar, in regard to terminology. This is especially relevant with Chinese and Japanese. How much classical Chinese are you going to throw into your translation? And we’re not just talking about technical terms; we’re talking about little words like “and” or the genitive indicator. These sort of things can be dealt with in a classical or in a colloquial way.
Another issue that can be resolved here or discussed in this wiki thing is how much Sanskrit to leave in, or how you’re going to deal with poetry, in terms of meter. Are there suggestions for how you might deal with that in your language; to deal with shloka meter? And this can also be a very convenient place for discussion of terminological issues that don’t fit into just one term in the glossary. Like how you’re going to deal with sems, shes-pa, rig-pa, rnam-shes, gtso-sems, sems-byung, etc, all of these sort of terms. Because some languages might have many more terms that might be available for use; but then other languages, like Spanish, might not have so many. And so again these are things that can be discussed and then used by the internal translators.
Also the multilingual website itself – of having many different versions of the text – can be used as a tool for cooperation. Because by having these links to all the different language versions of the same texts then, if a text has quotations from the sutra or from these other things and it’s already been figured out in one language, then people in other language sections don’t have to replicate that research; they can just find them. And also it can help with how you deal with translation issues. For instance, the Portuguese section translators of my website are always looking at the Spanish to get some suggestions and ideas for difficult cases.
So, like this, I think there are many ways we can use electronic tools to facilitate cooperation among various workers on the project.