A robotic bartender has to do something challenging for a machine - ignore data and recognize social signals. Researchers at the Cluster of Excellence Cognitive Interaction Technology (CITEC) of Bielefeld University investigated how a robotic bartender can better understand human communication and serve drinks like a human would. They invited participants in the lab and asked them to try and be a robotic bartender the researchers call James. The participants saw through the robot's eyes and ears and selected actions from its repertoire. 

"We teach James how to recognize if a customer wishes to place an order," says Jan de Ruiter. A robot can't recognize which behavior indicates that a customer wishes to be served because there are too many factors, like proximity to the bar and angle of the customer or if the customer speaks. To try and make it workable, the group gave it simple brute-force options, e.g. "near to bar; no", "speaks: no", that updates ("near to bar: yes", "speaks: yes").

Without weighting, all information is processed independently and as equally important, but that is not how human bartenders get tips.

pic

Adopting the role of bartender robot James at the computer. Credit: Photo: CITEC/Bielefeld University

The human participants of the experiment were asked to put themselves into the mind of a robotic bartender. They sat in front of a computer screen and had an overview of the the robot data: Visibility of customer, position at bar, position of face, angle of body and angle of face to the robot. They had no video. This data was recorded during a trial session with the bartending robot James at its own mock bar in Munich. 

For the trial, customers were asked to order a drink with James and to rate their experience afterwards. In the Bielefeld lab, the participants observed on the screen what the robot had recognized at the time. For example, they were shown if customers said something ('I would like a glass of water, please') and how confident the robotic speech recognition had been. The participants observed two customers at a time.

The customers' behavior was presented in turns, so participants had to decide in each step what they would do as the robotic bartender. They selected an action from the robot's repertoire. De Ruiter explains "This is similar to selecting an action from a character's special abilities in a computer game. For example, they could ask which drink the customer would like ("What would you like to drink?"), turn the robot's head towards the customer, serve a drink - or just do nothing."

In the next turn, the participants observed the reaction of the customer and selected actions again and so on. This interaction continued, until either a drink was served or the interaction was terminated. 

They found what you might expect but a robot could not know without help. Customers wish to place an order if they stand near the bar and look at the bartender - but it is irrelevant if they speak. For example, the participants did not speak to their customers immediately but they turned the robot towards the customers and looked at them. The eye contact is a visual handshake. It opens a channel such that both parties can speak, like seeing whether a person is on a phone call  before we start chatting.

Once it was established that the customer wishes to place an order, body language became less important and participants focused on what the customer said. It was interesting that if the camera lost the customer and the robot believed the customer was "not visible", the participants ignored this visual information. They continued speaking, served the drink or asked for a repetition of the order.

So a a robotic bartender should sometimes ignore data but the real robot did the opposite during the trials. If a customer was not visible, the robot assumed that it could not serve a drink and waited for the cameras and restarted the entire process as if the customer had just arrived at the bar. Humans were not confused by some technical glitches and stayed focused on the important information.

Robots and the Language Learning Models that power them will need a lot more work to replicate the social behavior of customer service, or else you will have people getting annoyed like when they keep having to yell "Representative!" to get out of a voice menu black hole during a customer service call.