Designing a Bilingual Website: A Quick Case-Study

Earlier this year we produced the Ingenious Ireland education site to promote Irish science and technology. I thought I’d write a bit about how we developed for both the English and the Irish parts of the site.

A little bit of research

The W3C’s Internationalization pages are a great place to start. Particularly useful is the articles section where common questions are answered in straight forward sections. As we knew which languages we were dealing with (not always the case when developing multilingual sites), it was worth finding out a bit about the character sets. Pennsylvania State University provide an excellent resource, suffering with the unfortunate name, Computing with Foreign Symbols. Technical information about many languages can be found there, such as character sets, encoding and language codes (more about these later).

Which encoding

Declaring the correct character encoding is important. The character encoding helps the browser know how to translate the bits embedded in the html file to characters on the screen. Declare the wrong encoding and some of the characters will turn to gibberish. Both the English and Irish parts of the site can be encoded with either ISO-8859-1 (Latin-1) or UTF-8. We chose to use ISO-8859-1, so our Content-Type meta tag in the head of each page looks like this:

<meta http-equiv="Content-Type content="text/html;" »
charset=iso-8859-1" />

For XHTML, we could also declare the encoding in the xml declaration at the top of the page:

<?xml version="1.0" encoding="iso-8859-1"?>

However, as including this prologue can lead to unpredictable results by throwing browsers like Internet Explorer into quirks mode, we left it out. This may not be a problem in your situation.

Language codes

Now we have the correct character encoding, we need to declare the base language for our documents. This is done using language codes in the html tag of the document. Here is the complete tag in the case of the Irish pages:

<html xmlns="http://www.w3.org/1999/xhtml" »
xml:lang="ga-IE" lang="ga-IE" >

We declare the language using the lang and xml:lang attributes. For XHTML we need to include both these attributes.

The first part of the value, the primary sub-tag, is the language code. It can be either a two, or three letter code based on the ISO-639 standard. If the language you are specifying has both a two and three letter code, use the two letter one. In the case of Irish, we use the “ga” language code.

The second part of the value is an optional sub-tag that helps refine the language selection to a specific country, region or dialect. We use the ISO-3166 two letter country code for Ireland, “IE”.

The html tag for the English part of the site looks like this:

<html xmlns="http://www.w3.org/1999/xhtml" »
xml:lang="en-GB" lang="en-GB">

So the language tag for English is “en” and the country code to refine the language specification to Great Britain is “GB”.

We also declared the base language in the meta tag for content-language in the head of the document:

<meta http-equiv="content-language" content="en-gb" />

Over-riding the base language

Some parts of the site have English and Irish on the same page. For example, the actual language selection menu at the top of each page has the language name written in either English or Irish respectively, regardless of what the base language for the page is.

Language selector in English

Fig 1. Language selector in English

On the English pages, the word “Gaeilge” needs to be declared as Irish (Fig. 1). Likewise, on the Irish pages, the word “English” needs to be declared as English. Again, we used the same lang and xml:lang attributes to achieve this.

English pages:

<div id="languageSelection"> LANGUAGE: ENGLISH | <a href="index_ga-IE.htm" »
lang="ga-IE" xml:lang="ga-IE">GAEILGE</a></div>

Irish pages:

<div id="languageSelection"> TEANGA: <a href="index.htm" lang="en-GB" »
xml:lang="en-GB">ENGLISH</a> | GAEILGE </div>

Styling bilingual sites

Now we have the site marked up with the proper language declarations, we can go about styling it with CSS.

Different banner images for the English and Irish versions of Ingenious Ireland

Fig 2. Ingenious Ireland banners (reduced by 50%)

The banner at the top of each page uses a background image that contains the name of the site in English or Irish. To minimise the CSS coding, the header is first specified for the English site as the default, and changes that specify for the Irish version of the banner are added afterwards to over-ride the default. Because we’ve used the lang attribute in the html tag, we can use the CSS2.1 :lang pseudo-class selector in the CSS:

/* English default */ #header { width: 694px; height: 100px; background: transparent »
url(../furniture/banner.png)" no-repeat top; } /* Irish over-ride */ html:lang(ga-IE) #header { background-image: url(../furniture/banner_ga.png); }

Predictably, not all browsers support this pseudo-class. Safari and Internet Explorer will ignore it. It also looks like IE7 won’t support the :lang pseudo-class either.

To remedy this situation we used a hack, which sucks, so we tried to keep it as unobtrusive in the markup as possible. Each Irish language page contains a special class in the body tag:

<body class="gaIE">

As I’m going to use browser hacks to specify CSS rules for Safari and IE, we’re going keep them separated as much as possible from the rest of the CSS by including them as different files at the top of the main CSS file.

/* import for Safari - must come first so subsequent rules can over-ride it*/ @import "safari.css"; /* import for IE6 */ @import "ie6win.css";

It is especially important that the Safari file is included first because subsequent rules will over-ride those set there.

In safari.css:

body.gaIE #header{ background-image: url(../furniture/banner_ga.png); } /* Note the hash (pound sign) after the semi-colon. */ body.gaIE #header{ background-image: none;# }

First we set the rule we want Safari to follow, then we reset that rule to its default for all other browsers. In the example above, Safari will ignore the second rule because the hash (pound sign to you Americans) following the semi-colon will stop Safari from processing all rules in that selector. It’s a pretty dirty hack, and the CSS won’t validate (which is another reason to keep it in a separate file).

In ie6win.css:

* html body.gaIE #header{ background-image: url(../furniture/banner_ga.png); }

This rule will only be seen by Internet Explorer because it misinterprets the universal * html selector (theoretically * html shouldn’t work because html shouldn’t have a parent node). Unfortunately, IE7 will fix this bug, so the hack won’t work even though :lang() is still not going to be supported. The IE team suggest the use of conditional comments in the HTML, to specify IE7-only rules.

Word lengths between languages

Average word lengths differ between languages. Where possible it’s important to allow flexibility in the design so that it can cope with differing word lengths between the language versions, especially in navigation elements. Also try to get your main navigation labels defined and translated as early as possible so there aren’t any nasty gotchas lurking when it comes to word length.

Different versions of page navigation

Fig 3. Page navigation rendered at different widths

During the course of developing page navigation for Ingenious Ireland, it became obvious that the Irish version wouldn’t fit in the space defined for it in the CSS. Once again, using the :lang() pseudo-class selector, we could target the Irish version to give it slightly more room.

.pageSelecta { float:right; display:block; width: 100px; text-align: center; } html:lang(ga-IE) .pageSelecta { width:125px; }

It would probably have been better to allow the navigation to resize automatically, especially in situations where the user wants to resize the text, but for a quick fix, this worked well.