Advice for the Tengyur Translation Project

Coordinating Multi-Language Translation Projects

I’ve been asked to give three presentations today and tomorrow, and I think the best contribution I can make is to share our practical experience from Berzin Archives on how to deal with organizing these topics – multi-language projects, terminology and cooperation among translators – since perhaps our experience can suggest organizational methods applicable to this larger project that we are undertaking here. Also, I’m really not familiar with what’s available in Polish or Portuguese, and so on, so I can’t really report on that, let alone Arabic or Urdu. Our online project can perhaps offer a model, since I feel that the most appropriate media for the Tengyur project is online, but supplemented with printed versions as well.

Today I’d like to speak about dealing with multi-language translations and how to coordinate them. Nowadays, we are working actively with ten languages. We have nine of them already online: English, German, French, Spanish, Portuguese, Russian, Polish, Arabic and Urdu. The tenth one, Chinese, we hope to be able to get online this year.

Now, how do we coordinate and try to bring some order into this possible chaos of dealing with all these languages? First of all, we have certain guidelines that are shared in common by all languages, and that is a standard format so that it is possible to integrate everything into one online system in a manner that is highly automated for uploading and formatting. Also, we have a list of priorities of what can be done in a new language section and what needs to be done in the first round, the second round, and so on. Of course, in the case of this Seventeen Pandits project, that priority list would be dependent on what’s already been translated into these languages, their evaluation, and especially, is the copyright available; this might not be the case, so we might not be able to use them. Of course, as Luis Gomez pointed out, in certain languages, certain things might have higher priority than others.

Each language section that we have has a manager, a chief editor and a group of editors, a group of translators and copy-editors. To manage all of that, we have a menu database tool of all the items on the website with their priority numbers. Here, in our case of the Tengyur Project, we’d need to have a database of each of the materials that we’re going to translate. In each language section, we have fields within this where the status of the work – a field for translation, for editing, for copy-editing and for going online. Each of these has options that can be filled in, in a box (choices). It’s what needs to be done; what has been sent for translation or editing and to whom; the date it’s been sent and the date that it’s returned. This is very useful because then we can filter in order to assign new tasks when a particular worker has finished, and then we filter to see what’s still available in their area.

I think for our project here, we have to add another field that indicates what the source language has been that the translator works with: whether it’s Tibetan or Sanskrit or both. Or what I think will undoubtedly have to be the case is that if we can’t find qualified translators in all these various languages, we plan who can deal with these original sources. They may have to deal with translating from other languages – whether it’s English, Russian, or whatever – and then indicate what language that source has used. I’ll give an example: If they’re only working from Tibetan, then since many of us have experienced that Tibetan has difficulty conveying all the different verb tenses from Sanskrit, and sometimes the ablative and dative cases get confused, and so on, things need to be referred back to a Sanskrit scholar in order to modify the tenses, if Sanskrit is available, in order to have it more accurate to the original.

We have a manager for each language section – usually, that’s the editor-in-chief, although it could be somebody else – that assigns the tasks and keeps all the records in the online menu tool, which we keep on a network drive that is accessible only to the language section managers. We also have a personnel database, which I think is important to have, with the relevant information of the qualifications of each person and what tasks each of them can do, their availability of time, and so on. Also, on a network drive, we keep the current and old versions of each work at each stage of the work that it’s been done, for security purposes; these are sorted, of course, according to whether it’s the rough translation, edited, copy-edited, etc.

When things are ready to go online, then it goes to our technical people. We have people who do the uploading and the formatting. Because we have a standard format, we can automate a great deal of this process, which makes it much, much easier. We have two people doing this now, and each of them has familiarity with a number of different languages, and that’s very important so that they’re not afraid of dealing with the language. Also, when various changes and emendations (like spelling mistakes, which inevitably are there) and so on, they can easily locate them with search tools, and deal with endings and so on without any problem.

Also, what we do, which I think I’d recommend be done in this project as well, is that the manager for each language section makes a weekly progress report for each section of what’s been done. The project manager (myself) puts that all together, and we publish this online each week. This is particularly to keep the donors and patrons happy, to demonstrate that we’re actually doing something every week; this is something that I would recommend very much.

Also, online, we have the option to be able to look at other language versions of the same article, and this is very, very helpful for many of our readers. For instance, if we are reading in Polish, for example, we might have more of our Dharma background in reading in English. Although we appreciate the Polish version, we might want to check back for the English to see what’s been done there, or, for instance, Spanish and Portuguese; the Portuguese readers very often consult Spanish as well.

These are some of our experiences. Although, as I said, I’ve been asked to give three presentations, what I want to present doesn’t fit very conveniently into each of these three; they will slightly overlap, but this is the first part.

Thank you.

The Importance of Glossary Tools

I’d like to continue what I started yesterday by sharing practical advice from the multi-language work we’ve been doing with Berzin Archives. In Berzin Archives, we do have standardized terminology in each of the language sections, based on my English; sometimes, of course, for some terms, we’ll have two or three variants that are used. The main reason for that is for the benefit of others, benefit of the readers, so that everything in the website is searchable – both in our internal search engines and also in Google – so that they can find things. Now, we can’t expect that in each language section of the Tengyur Project that we’re going to have standardization. However, if we extrapolate from this, I think it would be extremely beneficial if each translator is consistent in all the works that he or she does, in terms of the terms that they use, and they also limit themselves to two or three variants at the most for one term.

The question is how to help bring order to this potential chaos for the reader, with such a variety of terms. The way that we have done this, to organize and manage – may I suggest a way – is with our glossary tools that we’ve created both for the benefit of ourselves as the translators and the benefit for others, the readers. This is for each of our language sections. Each of them has three glossaries: one of technical terms, one of text titles and one of proper names, the spelling of proper names. This is all on our internal network. For the Tengyur Project, I think it’s extremely important that – at least in terms of names of persons, places, and so on – that within one language section that it be standardized. Otherwise, again, it’s impossible to search how we spell the names because it’s quite different in different languages. Then, each translator, of course, would have their own terms and their own way of translating text titles. For this, if the text that they are translating quotes a text that has been translated by another translator then, again, for searchability, I think they need to use the translation of that title that appears in our corpus from the other translator.

Now for us, once we have our basic terms in our glossary tool put in by the chief editor for a section with the Tibetan and Sanskrit and English, if then another language – let’s say German – has additional terms that are problematic to translate in their language, the translators in that language section compile an Excel file with these terms, and then periodically, they’re sent to the glossary manager and can be automatically imported into the glossary tool without any trouble. The translators themselves only have read access to the tools. Just as we have a mega-glossary for all the languages, in which each language has a section, I think for the Tengyur Project, we could have each language within that structure have a subsection for each translator. Each translator would compile their own Excel file of terms – with the Sanskrit and Tibetan and their own language – and periodically import them into the tool. We’ve also imported Jeffrey Hopkins’s glossary into this so that we have Hopkins’s equivalents for all of our terms as well.

For us, within the tools, each term has a page or text title, and so on, arranged according to English – but it can also be arranged according to any other language as the primary language – and then on the side, it shows the equivalent translation, in a bar, for all other languages, and also Jeffrey Hopkins’s terminology, since this sometimes helps to understand the terms. For the Tengyur Project, within each language, each term could be arranged according to any translator as the primary sort and then, on the sidebar, equivalent translations of the other team members so that we can correlate and see what they’ve been doing. Also, there are links to check with other languages for the terms, since this is very helpful – let’s say between Spanish and Portuguese.

We also have in the glossary tools the definitions in English, and then one translator in each language section has the task to translate these definitions into the language of their section. Of course, several terms have several definitions. I think that we’re going to need something similar in the Tengyur glossary tool. Also, we indicate, in an option here, if the term is to go into online glossaries or just for internal use for translators. For instance, how do we deal with parama and uttama and shri and these sorts of terms that would be nice to be consistent but is not so much of use to the reader?

We also have the option to change the way that we translate a term. If we do so, then there’s a box to indicate whether the status of these changes has been integrated into all the works within the website. Only the chief editor has access to make these changes within the glossary in order to minimize several people trying to edit the thing at the same time. It always gets locked if somebody is actually editing so that it’s always saved in a full manner.

By organizing our terminology in this way, it’s now possible to import glossary features automatically into the website from this tool for the benefit of the readers. I think this needs to be our primary concern. It’s not so much benefit for ourselves as translators, but benefit for the readers. What we produce has to be useable and searchable.

The biggest problem readers face now is correlating what they read in one text by one translator with what they read in another by another translator, especially if they don’t know Tibetan and Sanskrit. Our system generates online glossaries automatically with, let’s say, English, Sanskrit, Tibetan and the definitions, or if it’s in another language section, let’s say, German, together with the English, Sanskrit, Tibetan and the German definition. It also automatically generates pop-up windows that appear when we put our cursor over a technical term in any text or document. A pop-up window comes up with the definition and also the equivalent in Jeffrey Hopkins’s terminology so that if people are more familiar with that, then they can correlate with what they’ve read elsewhere. Eventually, we’ll take the Tibetan and Sanskrit out of the text, where now it’s present in parentheses, and also add it to the pop-up window so that it doesn’t clutter the text. For the Tengyur Project, I would recommend doing the same with pop-up windows, but being selective in how many variants we include so that we don’t flood the reader with too much information and turn them off. If a term has several definitions, when we input the material, we could tag in the glossary tool which definition’s applicable in which text, so only that definition will pop up.

For general reference and for study, I think it’s important to develop a tool that can be used for this project, which would be a modification of Google translation tools, but fed from the glossary. We would have two fields, one with source translation and one with target translation terms, with a pop-up of which ones are available. We could input a term from one translator in the source field and get the equivalent in the target field of another translator’s terminology.

Also, I think it would be helpful to develop an option to switch from some terminology in a text into another system of terminology that the reader is more familiar with. It would require some sophisticated programming in the case of languages with case and gender inflection so that adjectives, nouns and verbs in the whole sentence would agree with the change of the term. Also, I think it would be helpful to have an option, like we have in our online texts, to switch to another language version of the same text so that multilingual readers can improve their understanding. For instance, Spanish and Portuguese, or from their own language and the English version.

Eventually – the last point – I think it’s important to develop a search engine for the entire corpus so that we could enter a term and indicate who was the translator for this term, and then the search engine would search on the basis of the source Sanskrit and Tibetan term that is being translated here; it would give all references to the term in all translation variants, highlighting the term in the passages so that we can recognize what it is.

I think by trying to adopt some of these methods that we’ve developed, it could help the reader very, very much in dealing with the inevitable variety of translation terms that are going to be used in the project.

Collaborating through a Wiki

I’d like to continue my presentation of the tools that we’ve developed with the Berzin Archives, specifically in this context: how we facilitate cooperation among our teams and within each team too. What we have created is an internal wiki, a Wikipedia type of tool, on our network drive. It’s intended for the technical team, the audio team and the translation teams.

For the technical team, for instance, it’s just organizational: how we put a project like this together. It has the full instructions for how to format and upload articles into the system, so new people can be trained to do that. It has full documentation of the various functions and tools that our really brilliant German and Russian teams have developed, so we can deal with the technical side. It also has full instructions on how to edit the wiki and a full instruction manual for using all our menu and glossary tools.

For the audio teams, we have instructions here on how to edit audio files, how to make pure English versions from our bilingual audio versions, how to transcribe single language and bilingual audio files, and how to link to tools to do this. It’s so that all the workers on the project have access to how to do things and organizing that so that they don’t have to figure it out themselves.

For the translation teams, we have a general section – by the way, everybody who works on the project then gets access to this wiki – we have a general section relevant to all language sections, and then the sections for each language. What’s common to everybody is the general format for all documents and material that is going to be submitted for the project.

Then, one feature – which I think would be very, very useful to have – concerns questions on particular texts. Whenever there’s a text from my archives that the translators have questions about, we create a folder for that in the wiki. They ask questions, and then I answer them. They ask about the meaning of a text – what this means, and so on – and then I post the question and my answer in this section in the wiki, in the folder for that text. Often a translator will ask maybe 20 or 30 questions about a text. By having it in the wiki, then, when translators in another language section come to translate the same text, they can see if their questions have already been asked and answered here, and they can also check their understanding. In the case of the Tengyur Project, we could post a question about a text, and the translators can be notified by email, like it’s done in the Lotsawa forum, or it could just be implemented from the Lotsawa forum, and then the questions and answers could be posted here for use of translations of the same text into other languages, the way that we do that.

We haven’t implemented this yet, but what we would like to do is that the editors of the language section who ask the question could flag in the menu tool that the text has questions and answers in the wiki so that when the text gets assigned in another language section; the translator and editor can be notified beforehand that there is material in the wiki concerning that text, of problematic passages.

Then, another feature in this common tool, some languages like German and Russian are allergic to vague references: this and that. Even if we don’t fill in what this or that refers to, we still need a gender for this and that. For the Sanskrit gender of this and that, the references are not always obvious, and the Tibetan doesn’t help at all in terms of that. If people ask questions about that also – in the same manner as just general questions about a text – then these issues can be filled in and then used by the people who are working on this text in another language. If the questions and answers are in a language that the people don’t know, they could always use translation tools, like Google and Babylon, to get a rough translation of what the questions and answers are.

Also, some languages have difficulty expressing intransitive verb structures for certain verbs, verbs that can only be used in the transitive and need an object. This is particularly the case, for instance, in Russian and Chinese, where we have several verbs that can be used for one English verb or some Sanskrit verb. Depending on what the object is, we would have a different verb that’s used. Again, these types of questions could be dealt with and that information placed in the wiki.

Then, for specific languages, one of the big problems is how we’re going to transliterate Sanskrit and Tibetan names, for instance, “bodhisattva.” Well, the aspirated ‘d’– in Arabic, we don’t do aspirated letters, and in Urdu, we add in a “ha” into that. In Tamil, we don’t have a separate ‘d’ and ‘t’ or aspirated versions of them; we only have one letter for all four of these things. So, all of these issues were dealt with. I developed a way of transliterating Chinese, Tibetan and Sanskrit names into Arabic and Urdu since there weren’t any systems for that. We put that there so that the people who work in that section can use it consistently.

Also, there are general translation styles that can be dealt with in this wiki type of fashion. For instance, some languages don’t have all the verb tenses of English, and they certainly don’t have all the verb tenses of Sanskrit. Tibetan is a good example of that, or Chinese is an even stronger example, so we have an area for discussion of how we are going to handle the distinctions of the different tenses in your language. Eventually, through a discussion that can be held on the wiki so that everybody can read it, we might come up with a guideline based on a general agreement or several options to use if there’s no consensus.

Or how we’re going to break up yat-tat type of complex sentences, which in some languages, we simply can’t replicate, like Urdu, for example. Sanskrit likes to repeat words in poetical forms. Tibetan is okay with this, but this is considered extremely bad style in some languages, for instance, Russian. Our Russian translators are always complaining about that in verses, and they want to change it. Then, how are we going to deal with this? We can have a discussion.

There’s also the big problem of classical versus colloquial language: in regard to style, in regard to grammar, in regard to terminology. This is especially relevant with Chinese and Japanese. How much classical Chinese are we going to throw into our translation? We’re not just talking about technical terms; we’re talking about little words like “and” or the genitive indicator. These sorts of things can be dealt with in a classical or in a colloquial way. Another issue that can be resolved here or discussed in this wiki system is how much Sanskrit to leave in or how you’re going to deal with poetry in terms of meter. Are there suggestions for how you might deal with that in your language, to deal with shloka meter? This can also be a very convenient place for discussion of terminological issues that don’t fit into just one term in the glossary. Like how you’re going to deal with sems, shes-pa, rig-pa, rnam-shes, gtso-sems, sems-byung, etc., all of these sort of terms. Because some languages might have many more terms that might be available for use, but then other languages, like Spanish, might not have so many. Again, these are things that can be discussed and then used by the internal translators.

Also, the multilingual website itself – of having many different versions of the text – can be used as a tool for cooperation. Because by having these links to all the different language versions of the same texts then, if a text has quotations from the sutra or from these other things, and it’s already been figured out in one language, then people in other language sections don’t have to replicate that research; they can just find them. Also, it can help with how you deal with translation issues. For instance, the Portuguese section translators of my website are always looking at the Spanish to get some suggestions and ideas for difficult cases.

Like this, I think there are many ways we can use electronic tools to facilitate cooperation among various workers on the project.

Thank you.

Original Audio from the Seminar

Buddhist Scriptures