When I do a pronunciation assessment, I look at three key areas. The first two are the ones that come to mind when most people think about pronunciation. These are the individual sounds – also called “phonemes” or “segmentals” – the vowels and consonants. The other aspect I analyze is what I like to refer to as the “music of the language.” This is called “prosody” or “suprasegmentals” and includes a variety of dynamics very similar to what a musician incorporates to make the music expressive. This blog post discusses the first two areas – the individual sounds. Our next post will discuss “prosody,” the music of the language.


The vowels we use in English depend upon where we come from. They differ considerably between native English accents and are a key sign of where we have learned to speak the language. I like to break vowels down into three groups – those with no movement, those with movement and the “lazy vowel.”

When we explore vowels, remember we are looking at pronunciation, not spelling. We have 5 vowel letters – A, E, I, O and U – and Y is also sometimes considered a vowel. But these are letters. We have many more vowel sounds than letters in English.

In a neutral North American accent, examples of vowels with no movement include the A sound in BLACK, or the E sound in RED. We can also call these “simple vowels.” While I may pronounce them somewhat differently depending upon where I was raised, they are still pronounced as one sound.

Examples of vowels with movement – also called “diphthongs” or “complex vowels”– are the O in ROSE and the A in GRAY (also spelt GREY.) The ROSE vowel is really OW – you can see the lips round at the end of the vowel. You don’t see the W in the spelling (as you can in the word LOW) but the sound is there. The GRAY vowel sound is really EY – you can feel the back of the tongue rise into the Y sound. In this case, you do see the Y but often it is not there in the spelling, as in the word GRADE. Some people don’t move enough on the vowels which creates a lazy sound, making it difficult to really understand which vowel is being used. Often simply increasing the movement of these diphthongs will make the speech considerably clearer.

Vowels are important because, in English, we stress vowels rather than consonants. And there is always one primary stress in a word. If that word is stressed, then that vowel needs to be pronounced clearly to have clear diction. But when a vowel has no stress, we often replace it with the lazy vowel – technically called the “schwa,” the reduced vowel – which is the most common vowel sound in English. When we don’t stress a simple vowel, and sometimes a complex vowel, it is reduced – its pronunciation is not so important and reducing it is what creates a more natural English rhythm. (I have created a 5-part video series on the schwa so stay tuned.)

It can be difficult to illustrate the schwa in writing. It is often written as UH. For example, the word PHOTOGRAPH can be annotated as FOW-TUH-GRAF. In this case, the vowel in the first syllable is stressed and therefore clear (as in ROSE) and the final syllable also has some stress and is pronounced like the A in BLACK. But when we add the suffix Y (PHOTOGRAHY,) the stress shifts and the second syllable gets the stress, annotated as FUH-TAA-GRUH-FEE. You can see there are now 2 schwas in the word which replace the full vowel sounds used in the first example.

Learning what to reduce and how to do it can be difficult for people who have native languages that are syllable-based rather than stress-based. (More on this in the upcoming video series.) But once mastered, your English will sound more local and flow more easily.


You can break consonants into two major categories – those that can go on for as long as physically possible and those that stop the sound. If I say the V sound or the S sound, for example, I can continue indefinitely – VVVVV or SSSSS. We call these sounds “continuants” because the sound can continue as long as we want it to. Vowels are also continuants and we lengthen vowel sounds when we stress them. In English, generally we don’t lengthen consonants but sometimes we do, for example MMMMM when something is delicious. 🙂

The other category is the opposite of continuants. These are called “stops” because they stop the sound. Examples of these are P, D and K. If one of these sounds appears in a word, the sound will briefly stop at that point. An example of this is the difference in the pronunciation of ROSE vs ROADS. We will often have stops in words that are heard simply because they stop the sound rather than being pronounced clearly. The difference between the sound SH and CH is that CH starts with a stop (often written as TSH.)

We can also categorize consonants based on other characteristics. Does the sound go through the nose, as with N or M? Does it create friction – vibration caused by contact between the air and some part of the mouth – such as with Z or F? Does it come very close to a point of articulation without creating friction nor going through the nose, such as L or Y?


While pronouncing the individual sounds is important, learning to put everything together and communicate effectively – and not like a robot – is a work of creation, a work of art. How I link the sounds, adjust the volume, lengthen the vowel, vary the speed, group the words, etc. – these are all elements of “prosody.” Read next month’s post for information about these musical aspects of English. And sign up for the newsletter to receive notice of upcoming articles and posts.