How to create virtual touch? Without haptic feedback rigs or direct stimulation to the brain, how can we get closer to that special, sometimes intimate, sometimes intricate, sometimes magical feeling that is touch? We’re trying a lot of different approaches, but this video illustrates one combination: a front-facing PrimeSense depth camera, the FaceShift facial tracking SDK, the Leap Motion controller, and the Hifi virtual world software. There’s no physical feeling for either party, but as you’ll see, Ryan is virtually touching Emily’s hair, and that’s one step in the right direction.
Emily and Ryan both sat at MacBook Pro’s with PrimeSense depth cameras clipped to the top of their screens (we 3D printed the clip), the Faceshift SDK extracting head position and facial features, and our Interface software processing and streaming the data to control the avatar’s face and body. Ryan’s hand is detected by the Leap Motion controller. The end-to-end latency is about 100 milliseconds. For our headphone and microphone, we usually use this Sennheiser headset.
You might notice that the audio is noisy. This is because we applied some gain to bring the levels up. “But you claim high fidelity audio,” you might be thinking. Well, one of the brilliant things about our audio architecture is that it works similar to the real world. The further you are away, the harder it is to hear. But this doesn’t work for recording/capturing, something we’ve yet to optimize for.
We captured the video with the Screen Recording functionality in QuickTime player, piping the sound out from Interface and in to QuickTime using Soundflower. To capture, I logged in as my avatar and stood next to Ryan and Emily, recording what I observed.
When you see Emily’s right hand raise, it’s because she’s moving her mouse. In our current version, moving your mouse cursor will also move your hand and arm.
Still curious and question-full? Leave a comment; ask away.
Using an avatar as a proxy for communication has many benefits. Your avatar can always look good, be well lit and in an interesting location. However, even the most immersive virtual worlds fall flat when trying to deliver the emotional data from real world facial expressions and body language.
From video game controllers to tracking your sleep behavior, there is a good deal of experimentation being done with wearable sensor hardware right now. In addition to soldering our own creations together, we have been checking out work done by others as fast as we can all with the goal enabling rich emotional avatar communication.
As you can imagine, when we received our beautiful new Google Glass as part of the Explorer Program, we were eager to see if we could access its sensors and drive our avatar’s head movement (caveat: Google Ventures is one of our investors).
Being the only white guy with a beard here at High Fidelity, working with Glass fell to me This was a great exercise because it gave us an opportunity to abstract the input layer for multiple device support (we also got Oculus working! Stay tuned for that blog).
We had previously created an Android app that grabbed all the phone’s sensor data and sent it over UDP to a configurable port. Imagine holding your phone and being able to twist and move your avatar’s hand. Kinda like turning any phone (with sensors) into a Wii controller. Low and behold when we plugged our Glass in and tried to run the Android app from our IDE, Glass showed up as a device and it “just worked”. We could not edit the fields in the GUI on Glass but we could see from the log that it was transmitting the data.
For obvious reasons, Glass has some pretty aggressive energy saving behavior which made it tricky to keep the transmission alive. We ended up moving the sensor data transmission to a service layer. To stop transmission we just turn Glass off.
You can see in the video that we have a very low latency connection between human and avatar head movement using Glass!
The last few days, we’re been giving our trusty Makerbot Replicator a bit of a work out, as we mock up parts for our ongoing office experiments. Ryan figured out the platform wasn’t staying hot enough to keep the printout together, so he hacked it by placing a small space heater to keep the temperature near the printer a bit warmer. Worked like a charm!
We’re doing some fun experiments with the atmospheric rendering in our still-very-early virtual world prototypes and will have a longer blog post about it coming soon. In the meantime, this image of the sky and stars was too good not to share.
Given that we are lucky enough to have two co-founders – Ryan (shown above) and Fred – that have also trained as professional musicians, we wanted to test for ourselves the amount of delay that could be tolerated by two performers that were trying to ‘groove’ together when separated by Internet-induced delays. This is an important number, because it is easy to imagine how amazing it could be if you could create a virtual world which enabled multiple band members to perform live together when each was not physically on the same stage or in the same studio.
Attempts and experiments at multi-person live performances have already happened in Second Life and other virtual worlds, but the large and variable delays have generally forced the performers to adopt a serialized strategy where, for example, a bass player might lay down a basic groove which is then streamed to the next musician who adds her own line and then streams that to the next one, etc.
Although this does result in a live performance, the earlier players in the stream are ‘blind’ to the later ones, so they are not truly playing together and responding with subtlety to each other. But the demanding example of musicians playing together and continuously adjusting millisecond-to-millisecond as they watch/hear each other is what we wanted to explore… how much latency could we add before we ‘broke’ their ability to play together, or at least made it discernibly less fun?
To test this, we set up our two experts as performers, separated visually and by enough of a distance that they could not ‘feel’ the floor moving with the music, and wearing high quality headphones connected by a device which allowed us to impose an exact amount of latency (single millisecond resolution) between them. An experimenter would then dial up various amounts of delay, allow them to play for a while, and then record their verbal assessments of how good or bad the experience was for them. By changing roles and trying various instruments, we were able to learn that much below about 10 milliseconds of one-way delay, they typically could not discern the impairment, but that by about 20 milliseconds they would uniformly report the experience as highly undesirable.
This outcome is a challenging one to the dream of having multi-performer live Internet jam sessions: to get one way latency down to 10 milliseconds would require very high packet rates – say 250 packets-per-second or more, along with the performers being separated by only modest real-world distances. At the speed of light, 5 milliseconds of delay corresponds to a physical distance of about 1,500 kilometers. Also, Internet routers today still impose a delay that can easily be greater than this, for example the one-way delay from San Francisco to our EC2 machines in San Jose is about 7 milliseconds. Ultimately we will get there, but for now we can expect that our virtual live performers may still need to be in the same room together. But, of course, this will still make for some pretty amazing potential for live performers to interact with their audience while performing in a virtual world – more on this to come.