The Uncanny Valley of video conferencing

Disgusted at the Uncanny Valley of Zoom

There is talk about video conferencing replacing meetings and educational lectures in universities. I think this makes too many assumptions, especially as things stand. Being obliged to use Zoom while also finishing a PhD based in virtual worlds, in which I considered how different online spaces interrelate, has made me think through why. It comes back to a 40 year old theory from games: the Uncanny Valley.

Space

Going back a few years, I used to be in politics, and both chaired and sat in many meetings. The councillors were at least notionally elected equals, and so the choice for some was influenced by their responsibilities, but everyone chose where to sit. Carefully. Some always at the front, some the back, and some along the side.

This was not merely habit, but about feeling comfortable and secure in space. It enabled them to participate in the way they wanted. Some chose to be opposite other people, where they could try to dominate or contest directly, while some chose to be at the side, often contributing less but more thoughtfully and responsively. In all cases though, being able to see the constant response of others was essential to participation and good debate. Bad rooms led to bad meetings.

Video conferencing does not allow you to do choose space. You are rowed up on screen, made to face everyone else head on. Consequently responses are less vital, flattened, dispersed. But space really is the critical aspect. You can’t choose your natural, psychologically comfortable position. Everyone is sitting in front of their territory on screen. It is never a shared space; there is a void in-between people’s territories. And as space is how power is exercised, whether deliberately, out of habit or incidentally, power is imposed. It contrasts starkly with my use of audio, where in my mind I leave the space I am in to converse with the other person. I don’t picture the space, but I’m with them. If I am talking for any length of time, I have to readjust to returning to my own space.

As Merleau-Ponty argued, life is an embodied experience. We might imagine life without a body, but we are born with a body, and we die with one. If we feel disconnected from our body through circumstances, the sense of a lack of a body is an experience that is tied to having once been embodied. We deal with other human beings as similarly embodied, and its how we understand them, and how we live in the world.

I’ve recently been in a conference in second life where virtual ability was a concern – how people who use virtual space to participate according to different abilities and needs. People very clearly choose to position their avatars – this is not merely copying what humans do in the outside world – it is human psychology at work. The same as I saw in council meetings. And something video conferencing so lacks.

Uncanny Valley

One of the concepts in video games is the Uncanny Valley. Games sometimes fail because they drop into this ‘valley’. Humans can cope with representation or with realism, but the almost realistic is a problem. It’s the freak out zone, where they look like but don’t quite behave like humans are expected to. It’s deeply unnerving to those who experience it, but it’s not everyone. It’s like a lot of people really like the old-fashioned automaton.

I am arguing that is what happens in video conferencing too, and it doesn’t necessarily matter whether it includes friends or strangers. It is where the brain flits between unreal and real – between the screen person and the person behind the screen, a kind of psychological dissonance. It is enough to make a game fail, but it is unquestioned in other areas such as video conferencing.

Time

But there is another factor beyond space, and that is time. The screen person is not as immediately responsive as the person behind it. In particular, the lips do not match what is being said. Many people unconsciously employ lip reading, beside those who depend on it, and this lag and dis-sync can feel very disconcerting. It is not like the hesitations of verbal sounds. There has been research in this area that suggested that delay is not perceived as a technical hitch, but related to the personality and behaviour of the other person in the conversation.

The myth of face to face

While facial expression is important in communication, but inappropriate facial expressions, either deliberately or because of the time factor are counter productive. The question is, how much does seeing someone’s face really help in communication? We’ve used phones and letters, texts and messages. And when we are having conversation our eyes move between the other person and other objects as a way of regulating our feelings, which in a zoom conference can be seen as disinterest or disengagement. The other alternative, of looking straight at the screen is, on the other hand, potentially intimidating.

Other research has found that when the visual is available, people tend to depend on visual feedback to indicate that the other person has understood, ‘offloading’ the effort and work of interpreting the message on them. With audio only, the speaker put more effort in trying to be understood by the other person, by explaining more clearly.

It is certainly not the case that everyone has that adverse feeling. But for those that do, it is not something wrong with them, an inadequacy or mere unfamiliarity or lack of practice.

We have got to think about this, and not just assume that the software will produce an inclusive set-up. It won’t. To quote Hito Steyerl:

to expect any kind of progressive transformation to happen by itself—just because the infrastructure or technology exists—would be like expecting the internet to create socialism or automation to evenly benefit all humankind. The internet spawned Uber and Amazon, not the Paris Commune. The results may be called ‘the sharing economy’, but this mostly means that the poor share with the rich, not vice versa. Should any less unilateral sharing be suggested, the bulk of capital will decamp immediately’

Steyerl H (2016) If you don’t have bread, eat art!: contemporary art and derivative fascisms. e-flux (76).

The companies are there to make a profit, too frequently in devious ways and by exploiting the data and privacy of users. It reflects how we are obsessed with new is best on the internet, even if it has not been worked on long enough or costs us in the long run. In terms of how they deal with people, the biggest common denominator comes first, and maybe the rest will find a place. As it stands, video conferencing includes many, but excludes many. And that should not be assigned to personal inadequacy. It should be about being inclusive and recognising that not everyone is the same.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.