Why Vocal Musicality for Podcasters Is the Real Key to Audience Engagement

Your content is good. Your listeners are bored to tears.

Published April 30, 2026

Your content is good, but your listeners are bored to tears.

That is the uncomfortable truth Joseph Lewin unpacks in this episode. Following the standard advice to "just be yourself" on audio or video does not protect your audience. It loses them. The fix is vocal musicality: the deliberate variation of speed, pitch, and tonality that separates a flat conversational voice from one that actually holds attention. One voiceover tweak took a video from a few thousand views to 15 million on TikTok alone.

💡
Authenticity is not a performance strategy. If your natural speaking voice is monotone, being "yourself" on a podcast is the fastest way to lose your audience before you ever deliver value.

What should podcasters and content creators know about vocal musicality?

Vocal musicality is the single most underused tool in audio and video content. Here are the core principles from this episode.

  • Authenticity can be a trap for creators: Being yourself in a normal conversation is fine, but audio and video formats demand higher engagement levels than casual conversation delivers. Listeners need more stimulus to stay focused.
  • The entertainer mindset is non-negotiable: When you run a podcast, you step out of the role of a normal person and into the role of a performer. Even educational content benefits from the techniques radio hosts and entertainers use to hold attention.
  • Speed variation creates forward momentum: Alternating between rapid delivery and slow, methodical emphasis gives your voice movement. This keeps listeners oriented and signals which points matter most.
  • Pitch and tonality carry meaning: Raising your voice to a higher pitch and then dropping it to a lower register communicates emphasis and contrast. Without those shifts, every point lands with the same weight, which means nothing stands out.
  • Vocal movement mirrors good writing structure: Just as a mix of short and long sentences makes prose easier to read, a mix of fast and slow, high and low delivery makes audio easier to follow.
  • The results are measurable: A single hook with high vocal musicality generated close to 100 million views across social channels and 35,000 YouTube followers from one video.

Why is "being authentic" making your podcast boring?

The problem is not your content. The problem is your delivery.

Most creators hear the advice to be authentic and interpret it as permission to speak on camera or microphone exactly the way they would in a coffee shop conversation. That is a mistake. Normal conversational speech is calibrated for an environment where the listener can read your face, respond in real time, and ask you to repeat yourself. Audio strips all of that away. Video reduces it significantly.

"When you're on video, or it's even more so when you're just on audio, if you talk in the same way that you do in normal conversations, you're going to lose people's attention.". Joseph Lewin

The monotone that creeps into most people's voices when they are explaining something they know well is not a sign of confidence. It is a signal to the listener's brain that nothing is changing and nothing requires attention. The brain responds by tuning out. Authenticity, in this context, is not a virtue. It is an excuse for staying comfortable while your audience disengages.

What is vocal musicality, and how do you use it in practice?

Vocal musicality is the intentional variation of three elements: speed, pitch, and tonality.

Speed means deliberately accelerating through certain passages and then slowing down for the points you want to land. The contrast itself creates emphasis. You do not need to slow down everything important. You need to slow down relative to what came before it.

Pitch means moving your voice up and down in register throughout a sentence or a section. A voice that stays at one pitch throughout a recording is functionally monotone even if the words are varied. The up-and-down movement gives the listener's ear something to track.

Tonality is the broader color of your voice, the warmth, urgency, or weight you bring to a specific moment. Tonality shifts signal to the listener that the emotional context of what you are saying has changed.

"One of the simplest, I won't say easiest tweaks you can make, is to have more musicality in your voice.". Joseph Lewin

Think of it the way you think about sentence structure in writing. A page of sentences that are all the same length and shape is exhausting to read. The same is true for audio. Varied vocal rhythm gives the listener's attention somewhere to go.

How does vocal musicality change content performance on YouTube Shorts and TikTok?

The data from this episode is specific and hard to ignore.

Joseph posted a video on a personal channel, nothing related to B2B content. The hook was strong. The first version tanked with only a couple thousand views. After studying how creator Jenny Hoyos structures high-performing hooks on Jay Clouse's show, he re-recorded the voiceover with significantly more vocal musicality. The delivery was, by his own description, over the top.

The result: 15 million views on TikTok from that single version. The hook has since accumulated close to 100 million total views across different social channels. That same video generated 35,000 YouTube followers on its own.

The content did not change. The information was identical. The only variable was how the voiceover was delivered.

"The only change wasn't the actual content itself, it was my voiceover that I did. And I added way more musicality.". Joseph Lewin

This is the core argument of the episode. Iteration on delivery, not iteration on ideas, is what unlocks performance in short-form content. If a piece of content is underperforming, the first question to ask is not "Is the topic wrong?" It is "Is the delivery flat?"

FAQ

What is vocal musicality for podcasters?

Vocal musicality is the deliberate variation of speed, pitch, and tonality while speaking on audio or video. It prevents monotone delivery and creates the kind of vocal movement that keeps listeners engaged. Think of it as the audio equivalent of varied sentence structure in writing.

Why does being authentic hurt my podcast performance?

Authentic conversational speech is calibrated for in-person settings where facial expressions and real-time feedback fill in the gaps. On audio, those cues are gone. A natural conversational tone often defaults to a low-variation monotone that gives listeners no reason to stay focused. The entertainer mindset, not the conversational mindset, is what audio and video formats actually reward.

How do I add more musicality to my voice without sounding fake?

Start with speed variation. Pick a point you want to emphasize and slow down into it after a faster passage. Then experiment with pitch, raising your voice slightly at the start of a new idea and dropping it as you land the conclusion. It will feel exaggerated at first. That feeling is the point. Record yourself, play it back, and compare engagement. The data will tell you whether it is working.

Can vocal delivery really change how many views a video gets?

Yes, and the numbers from this episode are specific. The same hook, re-recorded with higher vocal musicality, went from a couple thousand views to 15 million on TikTok. The same video generated 35,000 YouTube followers. The content was unchanged. The delivery was the variable.

Who should study to improve vocal performance on podcasts and short-form video?

Joseph Lewin points to Jenny Hoyos, a creator known for high-performing YouTube Shorts, as a direct model. Studying how she structures hooks and delivers voiceovers was the specific input that led to the viral result described in this episode. Radio hosts are also a strong reference point for vocal range and pacing.

The thesis of this episode is simple and uncomfortable. If your podcast is not performing, the content is probably not the problem. The delivery is. Vocal musicality, specifically the variation of speed, pitch, and tonality, is the mechanism that separates a recording people finish from one they abandon in the first 60 seconds. Being authentic is fine in a conversation. On audio or video, it is a liability unless your authentic voice already has range, movement, and energy. If it does not, the work is to build those skills deliberately. Treat the microphone like a stage, not a phone call, and your audience will stay.

About the host

Joseph Lewin

Joseph Lewin

Host of B2B On Air · The Podcast Launch Guy | 45 B2B Podcasts Launched | Hosts I’ve worked with have closed over $17M in revenue | 100 Million Views On My Personal Social Video

Transcript

Read the full transcript

Joseph Lewin [0:00]

Your content is good, but your listeners are bored to tears. And the problem is, if you listen to the common advice about be yourself and just be authentic, it’s going to make the problem worse because the problem is actually you. Welcome to B2B On Air. I’m your host, Joseph Lewin, and in today’s episode, I’m gonna share with you one simple tweak you can make to be less authentic that’s actually gonna help you to engage your audience better. So when you’re having a normal conversation with somebody, you do wanna be yourself and you wanna use your normal tone and cadence. There’s some things that you might wanna change, but that natural way of having a conversation can be very engaging for people. When you’re on video, or it’s even more so when you’re just on audio, if you talk in the same way that you do

in normal conversations, you’re going to lose people’s attention. So you need to understand that if you’re running a podcast, you’re stepping out of the realm of being a normal human being and you’re stepping into being an entertainer. Even if your podcast is focused on something educational, if you’re able to do a few things that entertainers do, that radio hosts do, you’re able to pull in your audience, keep them engaged way more effectively. And one of the easiest tweaks that you can make, or let me rephrase that, one of the simplest, I won’t say easiest tweaks you can make, is to have more musicality in your voice. So what do I mean by that? If you talk to me in a normal conversation, I’m probably not going to sound the way that I do right now. I’m going to be talking to you. I’m going to

say, hey, look, if you want to be better at podcasting, you really need to think through how you’re communicating and talk more effectively. And I tend to drop into what you just heard there, a very monotone voice. And it’s not that cool Eastern European affect where you’re kind of like ultra monotone and you don’t change at all. It’s worse than that. It’s this monotone that just barely shifts up and down a little bit. Whereas if you listen to radio hosts, you listen to people who are on air all the time, they have a lot of changes in their voice, and it’s something that I’m still working on. So if you’re going, hey Joe, you sound over the top, I don’t wanna sound like you while I’m working on it, but trust me, People are way more engaged when I add extra musicality to my voice.

So what does that look like in practice? It means having some things where you want to emphasize something, and so you’re going to talk a lot faster than you normally would and go really quickly, and then you’re going to emphasize a different point by slowing down and being a little bit more methodical about what you’re saying. So you have the speed aspect, which is what I’m sharing with you there, but then you also have the tonality. And so if you raise your voice up to a higher pitch at certain points and then you lower it down to a lower note, you’re actually changing what you’re communicating through your voice. And just having a little bit more movement, a little up and down, think of it like when you’re writing. If you write a bunch of similar sentences all the way through, it’s really hard for

readers to actually read through that content. You kind of get stuck because all the sentences are about the same length, they’re structured about the same, But when you have a few sentences that are really long and then you have a few sentences that are really short, just by changing the musicality of your voice, you’re actually able to communicate much more effectively and keep people engaged longer. I’m gonna share a quick story to illustrate this. There is a video that I posted on my YouTube channel, nothing related to B2B content. I build things and burn things and do kind of crazy stuff on my property. But this one video that I made has gotten almost 100 million views. Now it’s kind of broken over different social channels, and I’ve had a few iterations of it, but the hook is the same on every single one. And

for whatever reason, that hook way outperforms. But the funny thing is, the first time I posted it, it tanked. And then I watched a video of Jenny Hoyos— I think that’s how you say her name— on Jay Clouse’s show, where she talked about YouTube Shorts and how to go viral. And I imitated some of the things that she does, and my video literally went from getting a couple thousand views to getting, uh, I believe the first version of it got 15 million views on TikTok. And the only change wasn’t the actual content itself, it was my voiceover that I did. And I added way more musicality. I was way over the top, and people love it or they hate it. It’s one of the two. Um, but it got way more attention, way more views, and on YouTube, that video alone has gained me about

35,000 followers. And so it’s easy to write this off of being over the top and unauthentic, but I would rather be a little bit over the top and actually engage people, grow an audience, get attention, than to stick to my boring, monotone voice. So if you have a choice of being authentic, but you’re authentically boring, it might be time to work on how you’re communicating and level up your entertainment value so that people actually listen. And with that, thanks so much for joining me, and I’ll see you on the next episode.

Get the latest episodes directly in your inbox