The vowels we use in English depend upon where we come from. They differ considerably between native English accents and are a key sign of where we have learned to speak the language. I like to break vowels down into three groups – those with no movement, those with movement and the “lazy vowel.”

When we explore vowels, remember we are looking at pronunciation, not spelling. We have 5 vowel letters – A, E, I, O and U – and Y is also sometimes considered a vowel. But these are letters. We have many more vowel sounds than letters in English.

In a neutral North American accent, examples of vowels with no movement include the A sound in BLACK, or the E sound in RED. We can also call these “simple vowels.” While I may pronounce them somewhat differently depending upon where I was raised, they are still pronounced as one sound.

Examples of vowels with movement – also called “diphthongs” or “complex vowels”– are the O in ROSE and the A in GRAY (also spelt GREY.) The ROSE vowel is really OW – you can see the lips round at the end of the vowel. You don’t see the W in the spelling (as you can in the word LOW) but the sound is there. The GRAY vowel sound is really EY – you can feel the back of the tongue rise into the Y sound. In this case, you do see the Y but often it is not there in the spelling, as in the word GRADE. Some people don’t move enough on the vowels which creates a lazy sound, making it difficult to really understand which vowel is being used. Often simply increasing the movement of these diphthongs will make the speech considerably clearer.

Vowels are important because, in English, we stress vowels rather than consonants. And there is always one primary stress in a word. If that word is stressed, then that vowel needs to be pronounced clearly to have clear diction. But when a vowel has no stress, we often replace it with the lazy vowel – technically called the “schwa,” the reduced vowel – which is the most common vowel sound in English. When we don’t stress a simple vowel, and sometimes a complex vowel, it is reduced – its pronunciation is not so important and reducing it is what creates a more natural English rhythm.

It can be difficult to illustrate the schwa in writing. It is often written as UH. For example, the word PHOTOGRAPH can be annotated as FOW-TUH-GRAF. In this case, the vowel in the first syllable is stressed and therefore clear (as in ROSE) and the final syllable also has some stress and is pronounced like the A in BLACK. But when we add the suffix Y (PHOTOGRAHY,) the stress shifts and the second syllable gets the stress, annotated as FUH-TAA-GRUH-FEE. You can see there are now 2 schwas in the word which replace the full vowel sounds used in the first example.

Learning what to reduce and how to do it can be difficult for people who have native languages that are syllable-based rather than stress-based. (More on this in the upcoming video series.) But once mastered, your English will sound more local and flow more easily.



You can break consonants into two major categories – those that can go on for as long as physically possible and those that stop the sound. If I say the V sound or the S sound, for example, I can continue indefinitely – VVVVV or SSSSS. We call these sounds “continuants” because the sound can continue as long as we want it to. Vowels are also continuants and we lengthen vowel sounds when we stress them. In English, generally we don’t lengthen consonants but sometimes we do, for example MMMMM when something is delicious. 

The other category is the opposite of continuants. These are called “stops” because they stop the sound. Examples of these are P, D and K. If one of these sounds appears in a word, the sound will briefly stop at that point. An example of this is the difference in the pronunciation of ROSE vs ROADS. We will often have stops in words that are heard simply because they stop the sound rather than being pronounced clearly. The difference between the sound SH and CH is that CH starts with a stop (often written as TSH.)

We can also categorize consonants based on other characteristics. Does the sound go through the nose, as with N or M? Does it create friction – vibration caused by contact between the air and some part of the mouth – such as with Z or F? Does it come very close to a point of articulation without creating friction nor going through the nose, such as L or Y?



While pronouncing the individual sounds is important, learning to put everything together and communicate effectively – and not like a robot – is a work of creation, a work of art. How I link the sounds, adjust the volume, lengthen the vowel, vary the speed, group the words, etc. – these are all elements of “prosody.”

I call prosody the “music of the language” because language uses so many musical elements to give expression to what we say. Without these, we would sound like a robot – everything even and flat. In contrast, we vary the volume and speed at which we speak, the pitch – the height of the sound, the phrasing – what we group together and how we use intonation to indicate the phrase, etc.



Let’s start by looking at stress. I’m not talking about the stress we feel when we are overworked or nervous about something. Stress in speech refers to emphasis on particular words and phrases. But what do we do to the sound to create this emphasis? In three words, we make the sound louder, longer and clearer.

English is a stress-based language. Not all languages are – some, for example Spanish, Turkish and Cantonese, are syllable based. This does not mean they don’t have stress but rather that syllables tend to be the same length and the vowels tend to have the same clarity. In English, as in other language such as German, Russian and Farsi, this is not true. While we make the stressed syllables (really the vowel in the stressed syllable) louder, longer and clearer, in contrast we make the unstressed vowels quieter, shorter and lazier – less clear.

An example of this is in the word PHOTOGRAPH. We stress the first syllable PHO – it has the primary stress – and we stress the last syllable GRAPH less but it still has some stress –we call it secondary stress. But the TO is completely unstressed. So, the O in TO becomes a schwa – written with this symbol “ə.” – which is a lazy vowel. When we change the stress in PHOTOGRAPHY, now the O in PHO and the A in GRAPH are both reduced and become schwas.

That’s a quick explanation of stress and reduction. So what are the other aspects of prosody, the other aspects that allow us to express ourselves and our meaning through musical dynamics?


Let’s look at intonation. Intonation refers to the upward and downward movement of the pitch, the height of the sound. Some people confuse this with volume, the loudness. But pitch change refers to the melody of the language.

We tend to use a wave form intonation to indicate a phrase, a group of words that have meaning in and of themselves. So we often raise the pitch of the first stressed word in the phrase and then step down on each stressed word.  Another example is an intonation pattern called list intonation. When we are listing things, we tend to raise the pitch at the end of every item in the list and lower it at the end of the list.

Phrasing is another important aspect of prosody. A phrase is a group of words that have meaning in and of themselves. For example, lets look at this sentence.

I decided to stay home, work from home and make sure that the kids were doing their schoolwork.

We can break this into thought groups as follows:

I decided to stay home, / work from home / and make sure / that the kids /were doing their schoolwork. /

Each of these phrases have meaning, as compared to breaking that sentence up in the following manner:

I decided to / stay home, work / from home and make / sure that the / kids were doing their / schoolwork. /

If you are reading aloud, it is useful preparation to mark the phrasing so that what you read makes sense to the listeners.

Prosody is a large topic, as is pronunciation and clear, effective speech in general. But hopefully this article has given you an idea of what is involved in speaking clearly and effectively in English.


If you are interested in getting an assessment of your current level of English pronunciation, please contact us at info@voicetoword.ca.