Sensory substitution is a technique whereby sensory information in one modality such as vision can be assimilated by an individual in another modality such as hearing. This paper makes use of a depth sensor to provide a spatial-auditory sensory substitution system, which converts an array of range data to spatial auditory information in real-time. In experiments, participants were trained with the system then blindfolded, seated behind a table while equipped with the sensory substitution system while keeping the sensor in front of their eyes. In the experiments, participants had to localise a target on the table by reporting its direction and in its distance. Results showed that the using the proposed system participants achieved a high accuracy rate (90%) in detecting the direction of the object, and showed a performance of 56% for determining the object's distance.
An empirical investigation is presented of different approaches to sonification of 3D objects as a part of a sensory substitution system. The system takes 3D point clouds of objects obtained from a depth camera and presents them to a user as spatial audio. Two approaches to shape sonification are presented and their characteristics investigated. The first approach directly encodes the contours belonging to the object in the image as sound waveforms. The second approach categorizes the object according to its 3D surface properties as encapsulated in the rotation invariant Fast Point Feature Histogram (FPFH), and each category is represented by a different synthesized musical instrument. Object identification experiments are done with human users to evaluate the ability of each encoding to transmit object identity to a user. Each of these approaches has their disadvantages. Although the FPFH approach is more invariant to object pose and contains more information about the object, it lacks generality because of the intermediate recognition step. On the other, since contour based approach has no information about depth and curvature of objects, it fails in identifying different objects with similar silhouettes. On the task of distinguishing between 10 different 3D shapes, the FPFH approach produced more accurate responses. However, the fact that it is a direct encoding means that the contour-based approach is more likely to scale up to a wider variety of shapes.
In this paper, we present a new approach to real-time tracking and sonification of 3D object shapes and test the ability of blindfolded participants to learn to locate and recognize objects using our system in a controlled physical environment. In our sonification and sensory substitution system, a depth camera accesses the 3D structure of objects in the form of point clouds and objects are presented to users as spatial audio in real time. We introduce a novel object tracking scheme, which allows the system to be used in the wild, and a method for sonification of objects which encodes the internal 3D contour of objects. We compare the new sonfication method with our previous object-outline based approach. We use an ABA/BAB experimental protocol variant to test the effect of learning during training and testing and to control for order effects with a small group of participants. Results show that our system allows object recognition and localization with short learning time and similar performance between the two sonification methods.