Lauri Savioja , Jyri Huopaniemi , Tapio Lokki , and Riitta Väänänen
Helsinki University of Technology
Laboratory of Acoustics and Audio Signal Processing
P.O. Box 3000, FIN-02015 HUT, Finland
Helsinki University of Technology
Laboratory of Telecommunications Software
P.O. Box 1100, FIN-02015 HUT, Finland
Lauri.Savioja@hut.fi, Jyri.Huopaniemi@hut.fi, Tapio.Lokki@hut.fi,
Riitta.Vaananen@hut.fi
http://www.tcm.hut.fi/Research/DIVA/
At ICAD'96, a real-time virtual audio reality model was presented, which included model-based sound synthesizers, geometric room acoustics modeling, binaural auralization for headphone and loudspeaker listening, and high-quality animation. The DIVA environment is an integrated implementation of a virtual reality system currently aiming at a virtual symphony orchestra performance. In the current version of the software, multiple sound sources (physical models of musical instruments) are conducted by a virtual conductor (controlled by a position tracker with 3 transmitters). The real-time calculation of auralization has been enhanced by accurate HRTF approximations, a new late reverberation model, and by an efficient image source method.
The basis of the image source method that we use is presented thoroughly in many articles [1] [3]. The real-time communication and the updating rules of the image sources are described in our earlier article [4]. In this section we concentrate on some performance issues and handling of multiple sound sources in the image source method.
The major problems in the image source method are the computational and memory requirements. Figure 1 represents a geometric model of a 13th-century Gothic cathedral, Marienkirche (which is under construction to be rebuilt as a concert hall with 1200 seats), in Neubrandenburg in northern Germany. This geometry has 480 surfaces that may produce valid image sources. In a typical situation with this model, only 5-10 first order image sources are visible.
To avoid unnecessary calculation, only the image sources which might come visible during the first reflections must be searched. To achieve this we make a preprocessing run with ray tracing to check visibilities of all surface pairs. During the image source calculation, the image sources are reflected only to those surfaces which are visible to the original reflecting surface instead of all surfaces that have the normal toward the source. With the help of this technique the amount of image sources reduces with second order images about 65 % and with third order image sources about 85 %. For example in the case of Fig. 1 the number of possible second order image sources reduces from 110000 to about 40000. Of these possible image sources approximately 20-30 are visible to the listener simultaneously. Another task with the image source method which requires a lot of calculation capacity is the visibility checking. In geometrical terms it needs a lot of intersection calculations of surfaces and lines. To reduce that amount we use an advanced geometrical directory EXCELL [13].
Figure 1: The example geometry of Marienkirche in Neubrandenburg, Germany, which has 480 surfaces.
To avoid transients in the dynamic auralization process, interpolation of auralization parameters has to be performed. A typical update rate of the direct sound and image-source parameters is 20-30 Hz. Interpolation is carried out for the 1/r gain, propagation delay, ITD, and minimum-phase HRTFs. To obtain the exact distance of each image source, fractional delays are used [10]. In an updated version of the DIVA system it is possible to use multiple sound sources. The direct sound emanating from each source is calculated individually, but if the sources are close to one another the image sources (reflections) are common for all. The common image sources are calculated from the point which is the average point of the multiple source points. Naturally every new sound source increases computational load.
In the DIVA setup, both headphone (binaural) and 2-loudspeaker (transaural) output formats are supported. Efficient ways for HRTF filter design and implementation have been considered in [5]. Based on these results, the use of 30-tap FIR or even 14th order Warped IIR filter realizations of HRTFs is motivated when minimum-phase reconstruction is applied. Depending on the calculation capacity, HRTFs can be applied to image sources as well as to the direct sound. A more detailed discussion can be found in [12] and [5].
The revised auralization unit of the DIVA system is shown in Fig. 2. This figure shows the process for auralizing one sound source, but the system is expandable to multiple sources as presented in the previous chapter. In the following, the blocks concerning real-time auralization are discussed in greater detail. In Figure 2, the filters implement the air absorption including distance attenuation, the material absorption of the walls [6] and the directivity of the sound source [9]. The filters implement the binaural filtering by adding a direction-dependent interaural time delay to the direct sound and reflections and by using minimum-phase HRTFs for directional filtering [5]. These binaural early reflections are fed to the output of the system together with late reverberation. In addition, a cross-talk canceling module (not shown in the figure) can be attached to the output in order to obtain transaural output. In the late reverberation module, the gain b(r) is dependent on the distance between the sound source and the listener. It adjusts the level of the early reflections fed to the late reverberation unit so that the level of the late reverberation remains approximately constant (property of a diffuse sound field). The late reverberation unit consists of several parallel feedback loops which contain a delay line DLk, a comb-allpass filter, and a lowpass filter [16]. The lowpass filters implement the frequency-dependent reverberation time (similarly as in [7]). They are implemented typically as one-pole lowpass filters. The comb-allpass filters bring about a diffusing effect on the recirculating signal in the feedback loops by increasing the reflection density of the reverberator output. The feedback connection resembles that of a Feedback Delay Network system (FDN) where the outputs of each delay line are connected to the inputs of the delay lines through a feedback matrix [7] [8] [11]. The feedback connection in our system corresponds to a feedback matrix which is unitary and circular, and contains different elements on the diagonal than elsewhere in the matrix. Compared to the FDN, our reverberator requires less computation, and the reflection density grows faster as a function of time. An advantage of FDN as well as the proposed reverberator is that the outputs of the delay lines contain energy from the modes of all the other delay lines and they are mutually incoherent. This gives a possibility to take outputs from different delay lines to each output channel. This yields a pseudostereophonic effect because of incoherence of late reverberation between different channels [7].
Figure 2: Illustration of the revised auralization scheme in the DIVA system.
A number of applications for virtual acoustic environment simulation have been considered during the scope of this research. Firstly, this system has been proven useful for both acousticians and architects in the design, auralization and animation of planned concert halls. It is also a valuable research tool for acousticians and DSP experts in the research of 3-D sound, room acoustics and auralization. In the following, another example that involves both conductor and listener interaction with the virtual environment is presented.
The other issues that the DIVA system encompasses are real-time animation of players, conductor following and real-time sound synthesis with physical models of instruments. When the entire system is running, the maestro (conductor) conducts the virtual orchestra using a baton that is connected to a motion tracker. The virtual players play their instruments with real fingerings animated according to the strokes of the conductor. Simultaneously another user (listener) can fly in the concert hall and see the animated virtual orchestra and hear the auralized sound output. A prominent application for this specific case study is the teaching of conductors, and the reply from experienced conductors using this system has indeed been very positive and encouraging. In the current version, four instruments are implemented; flute, guitar and double bass with physical models and one MIDI drum set all as individual sound sources. For all this calculation three Silicon Graphics workstations (two Octanes and one O2) are utilized and the computers communicate through an Ethernet network [4]. In Fig. 3, the setup of the system used in the case study is illustrated.
Recent developments in the DIVA virtual environment simulation system have
been presented. A case study that was presented in this paper (see
http://www.tcm.hut.fi/Research/DIVA/ and
http://www.medios.fi/pes/neu/index.htm for details) was featured as an
interactive real-time performance in the Electric Garden at SIGGRAPH'97
(http://www.siggraph.org/s97). A demonstration will be given at ICAD'97
illustrating the current system and the results of the case study.
Figure 3: Overview of the DIVA system configuration.