One of the problems faced, when translating legal documents produced in MS-Word into other languages using Computer Assisted Translation (CAT) tools is that, the translated document needs to be re-edited and re-formatted to fix its layout.
This problem is particularly evident when translating across languages with different character sets i.e. english => chinese or english => arabic. The layout issues would not be a deal breaker, if the documents were intended just for online publication. However, in most cases these documents are translated for print production; in some cases they are time sensitive and need to be printed and published after translation.
Documents done in MS-Word mix presentation with content, which is why when they get translated the layout also gets “translated” – and we end up with a document which does not look as we expected.
I have seen cases where translated documents had to be reformatted over and over again – because during the production workflow, the original english document changed in content significantly and had to be translated again during multiple intermediate print publication steps (e.g. 1st reading, 2nd reading, committee review) – and during each of these the translated document had to be reformatted all over again.
For example – below is an english “Provisional Agenda” document, note the styling information which is part of the content (marked and named in violet)
And see below the same document translated into Arabic:
Notice how the Arabic translation of the document has subtle differences (apart from the obvious right-to-left rendering of text) – in terms of font faces, font sizes, alignments etc. Such changes are not automatically applied when the english document is translated into Arabic, and they need to be applied into the translated document by editing it.
All this work and re-work is required just to produce one output format for printing – if we consider that there are multiple mediums for publication nowadays – mobile, web, tablet-devices, other external services and data consumers, doing each of these will require further re-engineering and conversion from Word to the target platform. Typically what happens is that because of the huge amount of effort invested in producing the correct print format in so many different languages, there are no more resources or energy left to do these others any justice.
What if the legal documents were drafted in Akoma Ntoso XML instead of MS-Word ? Could that work better with CAT systems ?
How Akoma Ntoso helps here
If the english legal document had been drafted in Akoma Ntoso format, manual reformatting of the Arabic translation would not have been required.
This is because the Akoma Ntoso XML legal document does not contain any styling and page dimension dependent information like: font faces, font sizes, margins, header & footer sizes, and page numbers. These are defined and encoded in a separate file with this information called an XSLT, which when applied on the XML document produces the desired output:
These rules are applied in an automated manner, so the output is always predictable and consistent. Any formatting issues can be addressed centrally in the XSLT rules and all the target documents can be reproduced with the correct formatting literally at the press of a button.
Print Ready outputs
For producing documents specifically for printing — XSLT provides a specific extension to the technology called XSL-FO (XSL Formatting Objects) that allows producing print ready PDF documents with precision layouts, page numbering, margins, and support for foot-notes, table of contents etc.