I’ve come across people who do not think that CSS is related to internationalisation at all, but if you think about it, internationalisation is more than translating the content on your site into multiple languages and calling it a day. There are various nuances to the presentation of that content which affect the experience of a native speaker using your site.
There is no single canonical definition for internationalisation but the W3C offers the following guidance:
Internationalisation is the design and development of a product, application or document that enables easy localisation for target audiences that vary in culture, region, or language.
This is a lot of ground to cover, from use of Unicode and character encodings, to the technical implementations of serving translated content, as well as the presentation of said content. Today I’m only going to cover the CSS-related aspects of multilingual support.
CSS is used to describe the presentation of a web page by telling the browser how elements on the page ought to be styled and laid out. There are several methods we can use to apply different styles to different languages on a multilingual page with CSS.
In addition, there are CSS properties that provide layout and typographic capabilities for scripts and writing systems beyond the Latin-based horizontal top-to-bottom ones that are predominantly seen on the web today.
So buckle up, because this might end up being a pretty lengthy article. ¯\_(ツ)_/¯
Have you ever wondered how Chrome knows to ask you if you’d like a web page’s content to be translated? No? Okay, maybe it’s just me then. But it’s because of the
lang attribute on the
lang attribute is a pretty important one because it identifies the language of textual content on the web, and this information is used in many places. The aforementioned built-in translation of Chrome, search engines for language-specific content, as well as screen readers.
Ah hah, maybe the screen reader bit didn’t occur to you, but if you’re not a screen reader user or know folks who are, it probably isn’t top of mind. Screen readers make use of language information so it can read out the content in the appropriate accent and correct pronunciation.
The key to language-related styling lies in appropriate usage of the
lang attribute in your page markup. The
lang attribute recognises the ISO 639 language codes as values.
For most cases, you would use a two-letter codes like
zh for Chinese, but Chinese (among other languages like Arabic) is considered a macrolanguage that consist of a number of languages with more specific primary language subtags.
Refer to Language tags in HTML and XML for an in-depth explanation on how to construct language tags.
The general guidance is that the
html element must always have a
lang attribute set, which is then inherited by all other elements.
It is not uncommon to see content of different languages on the same page. In this case, you would wrap that content with a
<span> or a
<div> and apply the correct
lang attribute onto that wrapping element.
<p>The fourth animal in the Chinese Zodiac is Rabbit (<span lang="zh">兔子</span>).</p>
Now that we’ve got that sorted, the following techniques will bear the assumption that
lang attributes have been implemented responsibly.
:lang() pseudo-class selector
Turns out the
:lang() pseudo-class selector is not that well-known.
But this pseudo-class selector is pretty cool because it recognises the language of the content even if the language is declared outside the element.
For example, a line of markup with two languages like so:
<p>We use <em>italics</em> to emphasise words in English, <span lang="zh">但是中文则是用<em>着重号</em></span>.</p>
Can be styled with the following:
If your browser supports the
text-emphasis CSS property, you should be able to see emphasis marks (a typographic symbol traditionally used for emphasis on a run of East Asian text) added to each Chinese character within the
<em>. Chrome needs the
-webkit- prefix, boo.
We use italics to emphasise words in English, 但是中文则是用着重号.
But the point is, the
lang attribute was not applied on the
<em> element, but on its parent. The pseudo-class still works. If we used the more commonly known attribute selectors, e.g.
[lang="zh], this attribute must be on the
<em> element to take effect.
Using attribute selectors
Which brings me to our next technique, using attribute selectors. These let us select elements with certain attributes or attributes of a certain value. (shameless plug time, for more on attribute selectors, try this Codrops CSS reference entry written by yours truly)
There are seven ways to match attribute selectors, but I’ll only talk about those I think are more relevant to matching the
lang attribute. All my examples with use Chinese as the targeted language, so
zh and its variants.
Update: Amelia Bellamy-Royds pointed out that my examples make it seem like attribute selectors are neccessary to do partial language tag matching, but the
:lang() pseudo-class covers that use-case already.
First, we can match the
lang attribute value exactly using this syntax:
/* will match only zh */
I mentioned earlier that Chinese is considered a macrolanguage , which means its language tag can be composed with additional specifics, e.g. script subtags
Hant (W3C says only use script subtags if they are necessary to make a distinction you need, otherwise don’t), region subtag
TW and so on.
The point is, language tags can be longer than just two letters. But the most generalised category always comes first, so to target attribute values that start with a particular string, we use this syntax involving the
/* will match zh, zh-HK, zh-Hans, zhong, zh123…
* basically anything with zh as the first 2 characters */
There is another syntax involving the
|, which will match the exact value in your selector or a value that starts with your value immediately followed by a
-. This seems like it was made just for language subcode matching, no?
/* will match zh, zh-HK, zh-Hans, zh-amazing, zh-123 */
Do remember that for attribute selectors, the attribute has to be on the element you want styled, it won’t work if it’s on a parent or ancestor. Note that the examples I’ve come up with for partial language tag matching can already be done with the
In other words,
:lang(en) will match
lang="en-GB" and so on, in addition to
lang="en". I will update the examples when I can come up with better ones. Meanwhile, go for the
How about normal classes or ids?
Yes. You can use normal classes or ids. Though you’d no longer be making use of the convenience of what is already on your element. (Again, my assumption is the
lang attribute is being applied correctly and responsibly) But sure, go ahead and give your elements class names for applying specific language-related styling if you really want to, nobody will stop you.
Okay, selectors covered. Let’s talk about the styles we want applied to elements that match those selectors.
The default value for
horizontal-tb. Perfectly logical because the web was born at CERN, where the official languages are English and French. Moreover, most of web technologies were pioneered in English-speaking countries anyway (I think).
But the wondrous-ness of humanity gave us more than 3000 written languages with scripts and writing directions beyond a horizontal top-to-bottom orientation.
The traditional Mongolian script runs vertically from left-to-right, while East Asian languages like Japanese, Chinese and Korean, when written vertically, runs from right-to-left. The writing mode properties that let you do that are
ᠬᠦᠮᠦᠨ ᠪᠦᠷ ᠲᠥᠷᠥᠵᠦ ᠮᠡᠨᠳᠡᠯᠡᠬᠦ ᠡᠷᠬᠡ ᠴᠢᠯᠥᠭᠡ ᠲᠡᠢ᠂ ᠠᠳᠠᠯᠢᠬᠠᠨ ᠨᠡᠷᠡ ᠲᠥᠷᠥ ᠲᠡᠢ᠂ ᠢᠵᠢᠯ ᠡᠷᠬᠡ ᠲᠡᠢ ᠪᠠᠢᠠᠭ᠃ ᠣᠶᠤᠨ ᠤᠬᠠᠭᠠᠨ᠂ ᠨᠠᠨᠳᠢᠨ ᠴᠢᠨᠠᠷ ᠵᠠᠶᠠᠭᠠᠰᠠᠨ ᠬᠦᠮᠦᠨ ᠬᠡᠭᠴᠢ ᠥᠭᠡᠷᠡ ᠬᠣᠭᠣᠷᠣᠨᠳᠣᠨ ᠠᠬᠠᠨ ᠳᠡᠭᠦᠦ ᠢᠨ ᠦᠵᠢᠯ ᠰᠠᠨᠠᠭᠠ ᠥᠠᠷ ᠬᠠᠷᠢᠴᠠᠬᠥ ᠤᠴᠢᠷ ᠲᠠᠢ᠃
すべての人間は、生まれながらにして自由であり、かつ、尊厳と権利と について平等である。人間は、理性と良心とを授けられており、互いに同 胞の精神をもって行動しなければならない。
There are also the values of
sideways-rl, which rotate the glyphs sideways. Every Unicode character has a vertical orientation property that informs rendering engines how the glyph should be oriented by default.
We can change the character’s orientation with the
text-orientation property. This usually comes into play when you have vertically typeset East Asian text interspersed with Latin-based words or characters. For abbreviations, you have the option of using
text-combine-upright to squeeze the letters into one character space.
國家籃球協會（英語：National Basketball Association，縮寫：NBA）是北美的男子職業籃球聯盟。
國家籃球協會（英語：National Basketball Association，縮寫：NBA）是北美的男子職業籃球聯盟。
Some of you might wonder about right-to-left languages like Arabic, Hebrew or Farsi (just to name a few), and whether CSS is applicable for those scripts as well. The short answer is CSS should not be used for bi-directional styling. Guidance from the W3C is as follows:
Because directionality is an integral part of the document structure, markup should be used to set the directionality for a document or chunk of information, or to identify places in the text where the Unicode bidirectional algorithm alone is insufficient to achieve desired directionality.
This is because styling applied via CSS has the possibility of being turned off, being overridden, going unrecognised, or being changed/replaced in different contexts. Instead, use of the
dir attribute to set the base direction of text for display is the recommended approach.
I highly recommend referring to Structural markup and right-to-left text in HTML, CSS vs. markup for bidi support and Inline markup and bidirectional text in HTML for more detailed explanations and implementation details.
Everything on a web page is a box, and CSS has always used the physical directions of
right to indicate which side of the box we’re targeting. But when writing modes not oriented in the default horizontal top-to-bottom direction, these values tend to get confusing.
Update: David Baron pointed out I was using the old syntax in the previous version of the specification and the syntax implemented in browsers is actually the one in the Editor’s Draft. Table has been updated accordingly.
The matrix of writing directions and their corresponding values for a box’s physical sides and logical sides for positioning are as follows (the table has been lifted from the specification as of time of writing):
|writing-mode / direction
The logical top of a container uses
inset-block-start, while the logical bottom of a container uses
inset-block-end. The logical left of a container uses
inset-inline-start, while the logical right of a container uses
There are also corresponding mappings for borders, margins and paddings, which are:
And the mappings for sizing are as follows:
Lists and counters
Numeral systems are writing systems for expressing numbers, and even though the most commonly used system of numerals is the Hindu-Arabic numeral system (0, 1, 2, 3 and so on), CSS allows us to display ordered lists with other numeral systems as well.
Predefined counter styles can be used with the
list-style-type property, which covers 174 numeral systems from
urdu. You can check out the full list at MDN.
If you’re interested in CSS counters, I wrote about them some time last year where I explore the “Heavenly-stem” and “Earthly-branch” numeral system used in traditional Chinese contexts (as well as a fizzbuzz implementation in CSS because why not).
As mentioned earlier, East Asian languages do not have the concept of italics. Instead, we have emphasis dots. They can be placed above or beneath characters to emphasise the text, strengthen the tone of voice or avoid ambiguity.
When Chinese is written in horizontal writing mode, these dots are placed underneath the characters, and when written in vertical writing mode, these dots are placed to the right of the characters.
Japanese, on the other hand, places emphasis dots above the characters in horizontal writing mode. In order to make the CSS property more generalised,
text-emphasis-color were introduced in CSS Text Decoration Module Level 3.
You can use different symbols other than dots, like
triangle or even a single character as a string. Position and colour can also be tweaked with their respective properties.
Colours? Random shapes? Sure, why not?
Line decoration is also covered in the same specification, and provides developers more granular control of underlines and overlines (in level 4 of the spec). But this is especially useful for scripts that have ascenders or descenders that regularly spill over the baseline.
text-decoration-skip is covered in CSS Text Decoration Module Level 4, which controls how overlines and underlines are drawn when they cross over a glyph. Again, something that happens less frequently for languages like English, but greatly affect aesthetics for scripts like Burmese, for example.
There are two categories of CSS properties for accessing OpenType features, high-level properties and low-level properties. The specification recommends use of high-level properties whenever possible. This is mostly predicated on browser support.
font-variant-east-asian allow for control over glyph forms for characters that have variants, like Simplified Chinese glyphs versus Traditional Chinese glyphs. It is the same character, but they can be written differently.
There is also
font-variant-ligatures which provide numerous pre-defined options for ligatures and contextual forms, like
The low-level properties are accessed via
font-feature-settings where you would use the 4-letter OpenType tags to toggle the features you want (this does depend on whether your font has those features to begin with, but let’s assume it does).
There are 141 feature tags from Alternative Fractions to Justification Alternates to Ruby Notation Forms to Slashed Zero. These CSS properties are closely related to features within the font file itself, so there is that external dependency that lies upon your choice of font. Something to keep in mind.
This post got really long, so I’ll have a second part where I go into more specifics on how we could build up a layout using the selectors we covered to make sure our layout is robust even if the language changes. Modern layout properties like Flexbox and Grid are well suited for use cases like this.
One of the things I find most interesting about CSS is how we can combine them in different ways to achieve a myriad of outcomes, and with more than 500 CSS properties in existence, that’s a lot of possibilities. I’m not saying anything goes, because often, there are numerous ways to reach the same result, and some ways are more appropriate than others.
However, it is up to us to make an informed decision regarding which is the most appropriate method to use for our context, by understanding the mechanics behind each technique, its pros and cons, and being aware of why we chose to do things a certain way.
I still believe, that after more than three decades, the web is still an informational medium, where content is key. Hence, the presentation of that content should be optimised regardless of what language or script it is in. And I’m glad that CSS is continually developing to provide developers a means to do just that.
Anyway, stay tuned for part 2.
- W3C Internationalization (i18n)
- Internationalization techniques: Authoring HTML & CSS
- Using the HTML lang attribute
- Styling using language attributes
- Localization vs. Internationalization
- Using the Unicode BIDI Algorithm to Handle Complexities in Typesetting Multi-Script Vertical Text
- OpenType features in CSS