When we are physically in the same space, we use various subtle signals to manage speaking turns: eye contact, facial expressions, and body language. This leads to a more even distribution of turns among everyone. In audio conversations, these visual cues are absent, so you focus entirely on auditory signals such as pitch, speaking speed, pauses, and small sounds indicating someone wants to respond. In video meetings, which theoretically offer more data than audio conversations, we pay less attention to audio cues, while visual signals are less perceivable on a screen. Interestingly, the limitation of audio conversations actually ensures you listen better to auditory signals and respond more effectively than in video conversations. This results in everyone having more equal speaking time, preventing anyone from easily dominating the conversation.
This more equal distribution of turns positively affects what researchers call ‘prosodic synchrony’: you better align your rhythm, intonation, and speech patterns. When you have more equal turns in an audio conversation, you naturally become more attuned in how you speak. You unconsciously mimic each other in tempo and pitch, resulting in a smoother conversation. This alignment contributes to what is called ‘collective intelligence’: the ability of a group to solve problems together. Teams using only audio are more successful in synchronizing their voice usage and turns, and when they achieve this, they perform better as a group.
Although video theoretically offers more information, in practice, visual signals are less perceivable than in in-person meetings. This raises questions about the assumption that richer media are always better for remote collaboration. For certain forms of teamwork, audio meetings can be more effective than video conferencing.