The uTalk Guide to Mandarin Pronunciation

If you’ve started learning Mandarin Chinese and don’t have any knowledge of any other east Asian languages, you might have noticed that there’s a bit of a learning curve when it comes to good pronunciation. In fact, at first glance, it might not be clear what’s going on at all! Read this post, the first in uTalk’s series of longer guides to target the nitty-gritty of language learning, to find out more about how to get your Mandarin pronunciation 很厉害! (hěn lìhai – amazing)

Our app teaches you Mandarin as it’s spoken in China and, although we don’t ever claim to be a source for learning Chinese characters, we do have those, as well as a useful romanisation system: pinyin.

Romanisation: the conversion of writing from a different writing system to the Roman (Latin) script, or a system for doing so.

1. What is Pinyin?

Simply, pinyin is the standard system used to romanise Standard Chinese. It is used all across mainland China and also in parts of Taiwan. If you’ve seen a Chinese name, then you’ll have seen pinyin before, albeit without the tones.*

*Don’t worry, we’ll get to those shortly!

Some famous Chinese names – you’ve probably seen pinyin and didn’t even realise it! – include: Ai Weiwei (artist and activist); Li Lianjie (aka. Jet Li, actor, martial artist); Yao Ming (professional basketball player); Zhang Zi Yi (actress); Chan Kong-sang (a.k.a. Jackie Chan, martial artist, actor, stuntman… the list is pretty long!).

The pinyin system was developed in the 1950s by a large group of linguists – including Zhou Youguang, the ‘father of pinyin’ – and was based on earlier forms of Chinese romanisation, including Gwoyeu Romatzyh (1928), Latinxua Sin Wenz (1931) and the diacritic markings from zhuyin (bopomofo). Before this, the Wade-Giles system was used, which you might notice occasionally when it comes to referring to certain places or people. An example is Sichuan (a province in China) versus Szechuan – Sichuan is the pinyin form, Szechuan from Wade-Giles. There have been romanised forms of Chinese since the publication of Xizi Qiji in 1605 by a Jesuit missionary, but the earlier forms were intended more for Western audiences than for the Chinese population.

In 1958, the Chinese government published the pinyin system that had been developed but it has been revised several times since. This was part of an attempt to encourage more literacy, as Chinese characters are not phonetic, so a phonetic system could be used as a learning aid. There have been attempts to make pinyin a standard system in Taiwan, too, but Taiwan still technically has no standardised spelling system; instead, there are several rival systems in use, including the previously mentioned bopomofo. 

Despite the fact that pinyin has been the standard form of romanising Standard Chinese in China since 1958, it didn’t come into use in Western publications until the 1980s – after the normalisation of diplomatic relations between the USA and the People’s Republic of China in 1979. If you learn Mandarin Chinese now, you will almost certainly begin by learning it through the medium of pinyin.

拼音 (pīnyīn) is how ‘pinyin’ is written in Chinese characters; it literally translates as ‘spelt sounds’.

1.1 Initials

So, how does pinyin work? Each Mandarin syllable is represented by a cluster of letters, made up of one initial (the sound at the beginning) and one final (the sound at the end). Initials are usually consonants and finals are usually made up of vowels, and every syllable can be spelt with exactly one initial followed by one final. Initials can never occur alone (but finals can!).

The only exception to this is a special syllable, –er, or a trailing -r, when it is considered to be part of a syllable. This changes the vowel that comes before it in a unique way that also occurs in English (think ‘farm’ or ‘bird’). This -r, however, is mainly found in sub dialects and is not often used in official publications; e.g. 哪里 (nǎli) ‘where?’ in Standard Chinese is often said as 哪儿 (nǎr or nǎ’er) ‘where?’ according to the Beijing dialect.

These are the initials in pinyin (in their conventional order):

b-, p-, m-, f-, d-, t-, n-, l-, g-, k-, h-, j-, q-, x-, zh-, ch-, sh-, r-, z-, c-, s-

1.2 Finals

Finals come – surprise, surprise – after initials. The vast majority of finals are vowels or diphthongs (e.g. -a, -e, -i, -o, -u; -ai, -ei, -ou), though there are three that are consonants: -n, -ng, and -r. Chinese syllables that end with any other consonant are either from non-Mandarin languages or indicate the use of a non-pinyin romanisation system (some final consonants are used in these systems to represent the tones).

Diphthong: a sound formed by the combination of two vowels in a single syllable, in which the sound begins as one vowel but moves to the other (as in: coin or loud).

There are six simple finals: -a, -o, -e, -i, -u, and -ü. Although we’ve written -ü with the umlaut here, you will actually only see it with the umlaut in the wild (as it were) if it is combined with an initial that can also combine with -u. 

Therefore, as nü and nu exist, the first one needs the umlaut in pinyin; yu, on the other hand, is only ever y- + -ü, so is written as yu.

Pinyin then features nine compound finals (or diphthongs) – compound, as they are made of more than one simple vowel: -ai, -ei, -ao, -ou, -iu, -ui, -ie, -üe, and -er.

Finally, there are nine nasal finals, -an, -en, -in, -un, -ün, -ang, -eng, -ing, and -ong. These are, like the name suggests, pronounced rather nasally!

Here’s a chart of all those initials and finals together:

So, if we look at some Chinese cities:

Beijing (北京 běijīng) – B (initial) + ei (final) / j (initial) + ing (final)

Shanghai (上海 shànghǎi) – Sh (initial) + ang (final) / h (initial) + ai (final)

Chengdu (成都 chéngdū) – Ch (initial) + eng (final) / d (initial) + ai (final)

1.3 Which of these are similar to English?

There are plenty of sounds represented by pinyin that are similar to sounds in English. These are not exactly the same, of course, and you should make sure to try and practise the correct pronunciation wherever possible, but these are the ones you may need to worry about less at the beginning of your Mandarin journey.

b- – actually, this may sometimes sound a little more like a ‘p’ to English speakers, but you will most likely still be understood if you pronounce ‘b’ the way you do in English.

p- – sounds like English ‘p’.

m- – sounds like English ‘m’.

f- – sounds like English ‘f’.

d- – like ‘b’, this is very close to an English ‘d’ sound but it may also sound like a ‘t’ to you sometimes. You can use ‘d’ as you start out.

t- – sounds like English ‘t’.

n- – sounds like English ‘n’.

l- – sounds like English ‘l’.

s- – sounds like English ‘s’.

g- – joins our new ‘b’ and ‘d’; this can sound like it’s somewhere between ‘k’ and ‘g’. Just use ‘g’ for now.

k- – sounds like English ‘k’.

h- – sometimes this sounds like a regular ‘h’ but it can also sound like the throatier ‘h’ you might hear when a Scottish person says loch. However, this is a natural variation so you can use whichever is more comfortable for you.

1.4 Common difficulties in pronunciation

Those are the easy ones. Some of the initials, however, and most of the finals, are quite different to what you might expect if you look at the pinyin and try to read it like English.

Let’s start with the initials first.

1.4.1 ‘c’ and ‘z’

These can be difficult for some learners.

c- – this is a ‘ts’ sound. Think rats or cats or mats. Where this gets difficult for some learners is that this sound always comes at the beginning of a word in Mandarin, where it is always in the middle or at the end of a word in English. So, be prepared to practise!

z- – this is a similar sound to c- and the English ‘dz’ sound. Think roads and loads and codes (since we don’t pronounce that ‘e’!). Like c-, z- is always an initial in Mandarin, whereas it is in the middle or at the end of a word in English.

1.4.2 ‘ch’, ‘sh’, and ‘zh’

These are all similar sounds in Mandarin, which is why it’s a good idea to learn them together (because you’ll learn how to differentiate between them!). 

Also, good news? They’re actually not that different to saying ‘ch’, ‘sh’ or ‘j’ in English. Yep, that ‘zh’ is pronounced a lot like ‘j’. Pretty easy, right?

What is interesting about these initials is how they change the final, -i, when it follows them. Most of the time, i- is pronounced ‘ee’, but after ch-, sh-, and zh-, it is shortened.

chi – try saying ‘chirp’ but stop as you get to the ‘r’ sound.

shi – try saying ‘shirt’ but stop as you get to the ‘r’ sound.

zhi – try saying ‘jerk’ but stop as you get to the ‘r’ sound.

1.4.3 ‘r’

The Mandarin r- does not exist in English but, oddly, the way you pronounce it is very similar to those ch-, sh- and zh- sounds. That’s because it is what’s called a retroflex sound – when you say it, the tip of your tongue should be pointed up toward the roof of your mouth and also be quite far back.

Make that ‘j’ sound like in leisure or pleasure and try pulling the tip of your tongue further into the back of your mouth. As you move further back, the sound will start to sound more like an ‘r’ – and that’s it, you’ve got it!

Again, there’s an acceptable range on the pronunciation here – no one expects you to be perfect!

1.4.4 ‘j’, ‘q’, and ‘x’

These ones don’t exist in English at all! That’s fun, right, three new sounds for you to learn?

x- – try and make a ‘sh’ sound while the tip of your tongue is down below your lower front teeth. The middle of your tongue should rise to the roof of your mouth to make the sound. If you can smile comfortably while saying it, you’re there or close – it’s more difficult to smile while making the ‘sh’ sound.

q- – try and make a ‘ch’ sound while the tip of your tongue is down below your lower front teeth and, again, the middle of your tongue should lift.

j- – just like for x- and q-, get the tip of your tongue behind your front teeth and try making that ‘j’ sound. 

Note: the initials j-, q-, and x- never combine with -a, -o, and -u. Remember, that means whenever you see ju, qu, or xu, all those u sounds are actually ü!

1.4.5 Vowels

-a – sounds like ‘a’ in ‘father’.

-o – can sound like ‘oh’ but often is a kind of ‘oo-uh’ sound.

-e – sounds like English ‘duh’ or ‘uhhh’.

-i – sounds like ‘ee’; note that the syllable ‘yi’ rhymes with this as ‘y’ as an initial generally makes no special sound.

-u – sounds like English ‘oo’.

-ü – make an ‘ee’ sound and then slowly round your lips to get this sound – your tongue needs to stay tense but your lips must be rounded.

-ai – sounds like ‘ai’ in ‘aisle’.

-ei – sounds like ‘ei’ in ‘eight’.

-ui – technically, this is -uei, but these three vowels are never written together in a pinyin syllable. Sounds like ‘ay’ in ‘way’.

-ao – sounds like ‘ao’ in ‘tao’ (unsurprisingly, as it was borrowed from Chinese!) or ‘ow’ in ‘how’.

-ou – sounds like English ‘oh.’

-iu – sounds like ‘ee-ow’, so like ‘low’ or ‘go’; this is actually a combination of -i and -ou, it’s just that the ‘o’ is omitted when it’s written down like this.

-ie – the ‘e’ here is like ‘eh’, so ye (y- + -ie = ye) is pronounced like the ‘ye’ from ‘yes’ in English.

-üe – add ‘eh’ to the original ‘ü’ sound.

-er – when your tongue is in the correct position for ‘e’, roll the tip up to make ‘er’, sounds a lot like ‘are’ in English.

-an – sounds like ‘on’ or sometimes like the vowel in ‘can’; this isn’t a perfect match, but it’s close.

-en – sounds like the -e sound followed by an ‘n’. A little like ‘un’ in ‘sun’.

-in – the ‘ee’ sound for ‘i’ and then just an ‘n’ on the end.

-un – ‘u’ plus an ‘n’ sound.

-ün – ‘ü’ plus an ‘n’ sound.

-ang – similar to ‘ong’ in ‘kong’ because of the nasalised -ng ending.

-eng – similar to ‘ung’ in ‘sung’ because of the nasalised -ng ending.

-ing – this can vary depending where you are in China; in southern China, it is pronounced just like yin, but with a final -ng instead of -n. In northern China, the -ing sound is yi followed by -eng, so sounds kind of like ‘ee-ung’ in English. 

-ong – the ‘oh’ sound for the ‘o’ and then the nasalised ‘ng’ on the end.

Here’s a video to give you a rundown of all those sounds we just covered:


2. What are Tones?

Like all varieties of Chinese, Standard Chinese is tonal. So, you don’t just have to worry about the initials and finals – words are also distinguished from each other by their pitch contour. 

While this can be difficult for a learner to master, it turns out that tones are as important as vowels in Mandarin – a word being pronounced with the wrong tone in Chinese is like saying the word ‘bud’ in English to mean ‘not good’ (bad) or ‘the thing I sleep in’ (bed).

But don’t let that put you off! With some practice, you can get the hang of them and fortunately, in Mandarin, there are only four (well, maybe five) tones to master!

2.1 Chinese tones

When you start learning Mandarin and read pinyin, you’ll notice that there are marks over almost every word. Some of them even have two marks or more! These are tone markings, used to help you get used to which tone is used when.

Initially, you will learn these tones on finals, then with one syllable (e.g. mā, má, mǎ, mà) and then you should try to practise multisyllabic words and sentences. This is because, although each word has a tone, this tone can sometimes change.

Still, let’s introduce the tones first, shall we?

2.1.1. The first tone: high

The first tone is called the high tone. It is a steady high sound and think of it as being more sung than spoken. 

An example: 一 (yī) – one

2.1.2 The second tone: rising

Like the name suggests, this is a tone that rises from middle to high pitch. Think of it like the English exclamation, “What?!” 

An example: 人 (rén) – person

2.1.3 The third tone: low / dipping

This tone descends from mid-low to low and then goes back up again, though this rise is normally only heard if the tone appears at the end of a sentence or before a pause. Without the rise, it is sometimes called a half third tone. A fun way to practise this is to dip your chin as you go down in pitch, then lift up again as you raise your tone.

An example: 你 (nǐ) – you

2.1.4 The fourth tone: falling

This tone is a sharp fall from high to low – like you might hear in curt commands in English. If it is followed by another fourth-tone syllable, this fall may not be as sharp on the first syllable.

An example: 四 (sì) – four

2.1.5 Neutral tone

If you are reading pinyin and see a word without tone markings, then it is probably said with a neutral tone. This is also considered to sometimes be a lack of tone, and the pitch is determined by the tone of the preceding syllable. 

Linguists consider the neutral tone to not be a fully-fledged tone; rather, it may result from a ‘spreading out’ of the tone on the syllable that came before it. This means that this is something you will generally pick up through repeated exposure to different words and phrases in Chinese.

Oh, and when you put all those initials and finals and tones together, things start to look like this:

2.2 Tone sandhi

Tone sandhi is when a tone changes because of the word or phrase it is in. It is really most important when referring to the third tone, though it does, of course, occur with other tones as well.

The main rule with third tone sandhi is that when there are two third-tone syllables in a row, the first one changes and is pronounced with a second tone. 

Reminder: the third tone is the low or dipping tone; it descends from mid-low to low and then goes back up again, though this rise is normally only heard if the tone appears at the end of a sentence or before a pause.

You will learn third tone sandhi when you first start learning Chinese: 你 (nǐ) and 好 (hǎo) both have the third tone separately, but when they come together to make ‘hello’, you pronounce 你好 (níhǎo). 

When there are more than two third tones in a row, the situation becomes a bit more complicated, but general rules are:

  • • If the first word is two syllables and the second word is one syllable, then the first two syllables become second tones.

    • • 保管好 bǎoguǎn hǎo – to take good care of – takes the pronunciation báoguán hǎo

  • • If the first word is one syllable and the second word is two syllables, the second syllable becomes second tone, but the first syllable remains third tone.

    • • 老保管 lǎo bǎoguǎn – to take care of all the time – takes the pronunciation lǎo báoguǎn

不 bù is also a special case – this means ‘not’, so it is especially important to know about it as it will come up a lot.

When followed by another fourth tone, bù becomes bú; so ‘to not be’ is 不是 búshì. It can also be neutral in tone in the particular instance of being between words in certain question forms, e.g. 是不是 shìbushì – is / is not.

一 yī, one, has several rules, since it is so often combined with other words and syllables.

When followed by a fourth tone, it is pronounced with a second tone. It is pronounced with a fourth tone when it comes before a first, second, or third tone syllable.

When it is the final part of a sentence or comes at the end of a multisyllabic word (regardless of the first tone of the next word), it is pronounced with the first tone. It can also become neutral in tone when it is used between two reduplicated words, e.g. 看一看 kànyikàn, to take a look at.

2.3 Why are tones important?

Like we said before, tones in Mandarin are considered as important as vowels in English. This is especially true once you progress past the ‘first words’ stage and become able to discuss a wider variety of ideas – if your tones are off at first, but you only know how to order a coffee and say hello, then it’s likely whoever you’re speaking to will be able to guess where you went wrong. However, once you get to more intermediate or advanced speaking, you really want to be able to get things right.

There are two key things to remember about Chinese tones:

  • • They are not like English tone of voice; they don’t convey your attitude or emotions. They’re attached to words and carry meaning.

  • • If you get them wrong that means that that meaning is lost, so make sure to practise!

Here’s the evergreen example (to all four tones and the neutral tone) of how the same syllable can have different meanings depending on the tone:

妈 – mā – mother

麻 – má – bother

马 – mǎ –  horse

骂 – mà – scold

吗 – ma – (a word that makes a sentence into a question)

So, learn your tones, or you might get your mothers and your horses all mixed up!

Just to help you out, here’s another video from LTL:


3. How to Improve Your Pronunciation

Much like every other aspect of language learning, practice is key! But there are some things you can do to make it easier for yourself (and more fun, too!).

3.1 Master the basics

Whether you’re learning alone or you’re in a class, you’ll probably learn the bare bones of pronunciation and then not touch on it again for a while – if at all. So make sure you go back to it! Practise those tables, practise syllables, and more importantly, practise tones and syllables in longer words and sentences. 

There are good tables and drills online if you’re just looking to get the pronunciation of a specific combination down well: Sinosplice has tone pair drills and there are also plenty of pinyin charts that provide audio for every combination of initials and finals.

And then find something where you can practise sentences! Our app is good for that, as there are so many phrases in it, and this is easier, sometimes, than trying to learn sounds in isolation. The more you practise words in context, the more automatic that correct pronunciation becomes.

3.2 Input, input, input!

One of the best things you can do for your language learning – at any stage – is to get masses of comprehensible input. This means: apps, music, podcasts, TV, whatever and wherever you can expose yourself to spoken Chinese, make sure you do so. 

Again, the more you hear those tones and syllables in context, the more automatic and accurate your pronunciation will become. Taking TV, for instance, there will be words or phrases that are said over and over again so that, by the time you get to the end of a show, you will have the way they were pronounced stuck in your brain.

3.3 Practise output

Of course, that doesn’t mean you shouldn’t speak at all. Find a native speaker, if you’d like, or talk to a teacher, if you happen to have one. Try just talking to yourself (or your pets!), or parroting what people are saying on the radio. Sing along to your favourite songs! Record yourself in our app and compare yourself to a native speaker. 

All of these are good ways to get your mouth physically used to the production of these new sounds so that they’ll be rolling smoothly off your tongue in no time!

3.4 Learn from your mistakes

This means either working with someone who can pull you up when you’ve gone wrong or recording yourself and listening to it carefully, preferably in comparison to native audio.

That’s because, as you speak, there are times when you’re going to think you’re saying everything right – but once you listen back to it, you might notice a mistake here and there. Work out which mistakes are occurring most often for you and use the tone drills and pinyin charts and everything else to iron them out again.


We hope you’ve enjoyed our long guide to Mandarin pronunciation! If there’s anything you think we’re missing, drop a comment and we’ll see what we can do – and 好运 (hǎoyùn) with your Chinese learning!

Leave a Comment