PhD session [2022-10]

Sharing session with SoundLab graduate students :: current projects and interests. Thursday 27 October 2022, 14:00–17:00, CMC.

Program

Manni – summary (automix)
Giuseppe – OAT
Mirjana – summary (VR)
Gooey – Destination
YueRan – summary (sound/smell)
PerMagnus – long/short
Manni – Generative
Mirjana – field research
YueRan – own project

Manni CHEN

Generative Adversarial Networks for How Reverberation Affects Frequency Contents in Music Production

Reverberation is an audio effect used in music production, which affects spectrum of audio and the timbre of music. In this case, in music production, how reverb works on the spectrum of audio is important since sound engineers need to perceive subtle changes in timbre caused by reverb to reach the target texture that they design. In this paper we aim to designing an equalization filter by implementing a GAN to indicate optimal filter coefficient which provides flat frequency response of audio with reverb. We feed the GAN with sine sweep signals which are added six different reverb pre-sets including “chamber”, “hall”, “random hall”, “concert hall”, “plate” and “vintage plate” via Lexicon PCM Native Reverb plugin [1], while neural networks output the coefficient of filter. We apply a Vanilla FC LSTM autoencoder [2] to convert data from a high dimensional space to a lower dimensional latent space. The goal of optimization is acquiring filter coefficients that lead to frequency response of the original sine sweep signal (target frequency response). In conclusion, we propose a method based on GAN for the design of filter to invert frequency changes after adding reverb to facilitate technical ear training for music production students.

[1] Lexicon PCM Native Reverb plugin. https://lexiconpro.com/en/products/pcm-native-reverb-plug-in-bundle
[2] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A Search Space Odyssey,” IEEE transaction on neural networks and learning systems, vol. 28, no. 10, pp. 2222–2232, 2017, doi: 10.1109/TNNLS.2016.2582924

Manni CHEN

AUTOMATIC MUSIC MIXING WITH DEEP LEARNING AND OUT-OF-DOMAIN DATA by Marco A. Martínez-Ramírez [1]

The research on artificial intelligence applied on automatic music multichannel mixing involves the problems of acquiring datasets about clean and individual tracks. This paper tries to tackle this problem of this problem through using the out-of-domain data [2] which contains the wet or processed multitrack music recordings and utilize them into training the supervised deep learning models. In regard of the methodology, the project proposed a novel data preprocessing method to allow the models to perform automatic music mixing. The data preprocessing is implemented by computing average features related to audio effects on an out-of-domain dataset, which facilitates the effect normalization of wet stems in music multichannel mixing. The normalization and augmentation include loudness, equalization, compressor and reverberation. The architecture of deep learning model is divided into three parts: adaptative front-end that contains two one-dimensional convolutional networks, latent-space mixer that applies temporal dilated convolutions [3] and synthesis back-end that contains a convolutional network and a Squeeze-and-Excitation (SE) block [4]. The results indicate that this approach successfully achieves automatic loudness, EQ, DRC, panning, and reverberation music mixing. From this work, I’m inspired that in the future I may apply out-of-domain data into a specific audio effect in deeper understanding.

[1] M. A. Martínez-Ramírez, W.-H. Liao, G. Fabbro, S. Uhlich, C. Nagashima, and Y. Mitsufuji, “Automatic music mixing with deep learning and out-of-domain data,” 2022.
[2] R. Iyer, M. Ostendorf, and H. Gish, “Using out-of-domain data to improve in-domain language models,” IEEE signal processing letters, vol. 4, no. 8, pp. 221–223, 1997, doi: 10.1109/97.611282.
[3] N. Cheema et al., “Dilated Temporal Fully-Convolutional Network for Semantic Segmentation of Motion Capture Data,” 2018, doi: 10.2312/sca.20181185.
[4] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.

GUI Ren

Destination: Sound, memories, and LGBTQ identities in a Beijing nightclub

As the first nightclub in China to attract gay customers, Destination in the Chaoyang district of Beijing has a history of over ten years. The unique environment, music, activities, discourse, performance, etc. in this space constitute its unique cultural landscape. This representative bar also alludes to the changes in Chinese LGBTQ culture in different time periods, and today’s Destination not only has a stable gay clientele but also attracts customers of various identities. However, due to the effects of the pandemic and other reasons, the nightclub has been closed for a long time. Hoskins (2001) stated the “new memories” in today’s media-saturated environment, the role of media technology had been more combined with personal’s memories, and audio perceptions and technologies have also been part of individual and collective memories. (Bijsterveld & van Dijck, 2009) Starting with the premise from previous research such as the above, this paper attempts to trigger the memories of gay customer’s multisensory feelings in this nightclub, and explore what is special about the sounds and music in this space, how these sounds and music help the gay communities construct their different identities inside the space and their expectations of sound and music in an idealized space. It will use mixed methods. First of all, the author will analyze sound and video clips collected as an observer from 2019 to 2020 at Destination as first-hand materials, including people’s behavior and activities. Secondly, the author will collect sounds and video clips from former active customers and conduct in-depth interviews with them. Finally, the author will use focus group interviews with volunteer participants selected according to the given topic to discuss identity construction through music and sounds in Destination nightclub.

Keywords: sound, identities, gay, music, space

Mirjana DOKIC

Creating Audio Object-Focused Acoustic Environments for Room-Scale Virtual Reality

Room-Scale Virtual Reality (VR) allows creation of realistic and compelling experiences by tracking the position and rotation of the player’s limbs in six degrees of freedom (6-DoF) in real-time and translating them from the real to the virtual world. However, advances in VR require new audio design methodologies that support player’s interactivity and engaging experiences. Current VR sound design, including Ambisonic beds and non-diegetic music, is problematic due to the fact that it restricts player’s interaction and immersion. Therefore, authors propose novel methodology for acoustic environments for room-scale VR which combines an object-focused approach, multimodal rule based systems, adaptive and 3D binaural audio in 6-DoF. The proposed governing system defines object’s behavior on all its sensory, interactive and spatial representations. To evaluate the feasibility of this approach, author’s developed a prototype named Planet Xerilia for Meta Quest 2. Multimodal applications in Planet Xerilia include collision-based, list-based and motion-pattern-based system based on key design rules for creating audio object-focused acoustic environments. While this methodology allows player to freely move through eight VR scenes and interact with virtual ‘sound installations’, it has also some disadvantages including implementation complexities and processing limitations. Finally, authors recommend that designers must weigh the possibilities for player-audio interaction against the game structure, as well as time and budget limit.

Constantin Popp and Damian T. Murphy [AudioLab, Department of Electronic Engineering, University of York, UK]

Mirjana DOKIC

Field Research in Hong Kong: Music Composition, Performance and Sound Design for Virtual Reality

Due to the fact that my work includes a few interdependent parts: spatial music composition and sound design for Virtual Reality (VR), sonification, field recording, immersive live performance, VR recording and post-production, my research is organized chronologically. Firstly, I am investigating the literature about spatial music composition and sound design for VR written by composers such as Damian T. Murphy, Rob Hamilton, Ricardo Climent, Christine Webster and Phivos-Angelos Kollias. I line with this research, I am currently completing three new spatial music compositions named Immersive Salsa and Reggaeton for VR (for studying Latin dances in VR), When I Look at Your Eyes (future bass music for VR video games) and Immerive Valse, soundscape composition for VR film with melodic and rhythmic characteristics of the French waltz. These compositions will be further used for developing Concrete-Musical sonifications. Secondly, I am investigating the literature and tutorials about Musical Instruments for VR such as Coretet (string quartet) by Rob Hamilton, Virtuoso VR, PATCH XR and AliveInVR: Virtual Reality controller for Ableton Live, Transient and Jam Studio VR. In parallel, I will explore and compare all this VR instruments with a goal to find the most suitable ones for my project and produce research articles for conferences such as Innovation in Music. Thirdly, in order to get insights into the current state of the music for VR, I visited VR exhibition Phygital D: DOKU, The Binary World by New Media artist Lu Yang, co-presented by Freespace, West Kowloon Cultural District and Sydney Opera House. This was an opportunity to experience immersive live musical/dance audiovisual performance in VR and see how real-time motion capture between live performance and virtual avatars functions. Further, in order to explore electroacoustic music for my instrument, viola, and performance on VR instruments, I visited Cosmopolis Festival – Synesthesia: An Electroacoustic Music and Multimedia Concert. This offered me an insight into HKUST Shaw Auditorium’s state-of-the-art audio-visual system possibilities in multimodal simulations through the JUNK!’s live performance on VR instruments, Two Airs (2022) for solo viola with white noise machine, piano, and live electronics by Samson Young, and Dusk’s Gate (2018) for multichannel tape by Natasha Barrett. Finally, having an aim to create a VR film about the pink dolphins in Hong Kong, I pursued a field research in the South China Sea close to Tai O, Lantau Island. This Dolphins Encounter was led by Dr. Brian Kot, Assistant Professor, leader of the Aquatic Animal Virtopsy Lab, City University of Hong Kong, who presented the lab’s research based on diagnostic imaging & conservation medicine. This was an opportunity for me to observe dolphins, learn more about cetacean conservation and present my PhD research project to Dr. Brian Kot and his team. They expressed interest in my project and we discussed about the possibilities for the future collaboration. We agreed that I will take part of each of their research trips organized twice a month. My objective will be to represent their and Leah Barclay’s acoustic marine ecology methodologies in my research. My future plans are to visit as many as possible VR exhibitions and conferences, such as HKACT! Act 13 Symposium, as well as the natural heritage sites of Hong Kong where I am planning to do field recordings, such as Tung Ping Chau, part of Hong Kong UNESCO Global Geopark.

Giuseppe PISANO

OAT – Open Ambisonics Toolkit

A collection of learning resources and third-party tools to make Ambisonics democratic. Immersive audio has experienced a surge of interest in the past few years. Due to faster computers, the accessibility of digital audio technologies, and the diffusion of VR, more and more artists, content creators, musicians, sound designers and engineers are turning their attention to these technologies. Especially, Dolby Atmos and Ambisonics seem to share the lead in terms of popularity, but while the first one is a patented technology primarily used in the film industry and the mainstream music sector, Ambisonics is an open source, free alternative that offers a wider range of applications and represents a common choice in the videogame industry and in the field of contemporary arts. Despite its potential, a pedagogical methodology for the transmission of these techniques, which requires a rather wide and specific set of skills that includes theoretical, software and hardware knowledge, has yet not been developed which represents a hindrance for students who want to progress autonomously. Moreover, getting access to the practical resources to work with this technology is still pretty demanding from a financial standpoint. The aim of this project is to propose affordable hardware solution for the playback of High Order Ambisonics sound files, using open-source technologies and DIY hacks, and to link these to an educational platform approaching the topic in a beginner friendly and cross platform way.

YUE Ran

Courses

In these two presentations, I will present the article I have recently read, “Real and imagined smellscapes” which I summarize the methods, approaches, and results. Future prospects and further reflections on the smellscape project will be discussed. The emphasis is on the comparison part of onsite and online sensory walks, as well as the exploration of olfactory stimuli and psychological sensations. After the presentation in “Research skill and methodology class” on this Tuesday, some questions of the process of sensory walk and olfactory perception were raised in the class, which I will organize and propose together. In addition, regarding my research schedule, I will present my thoughts on the psychological (emotion, memory) aspects of the following smellscape project, including relevant research methods and the latest research status. For next year’s EAA conference, I conducted a brief review of the literature on “indoor soundscape”, and found there may be oppotunities to further the experiments in SCM. Finally, I will summarize the recent research progress and content, and specificially put forward my own shortcomings and improvement suggestions.

PerMagnus LINDBORG

Re-scaling Beethoven: very long, very short

The innovation of sound recording and reproduction technologies some 145 years ago spurred composers to imagine and indeed create works of extreme duration: very long – lasting days, years, or more – and conversely, very short– miniatures of a few seconds that nevertheless encapsulate ‘large’ expressions or denote a corpus of pre-existing music. Comparing Leif Inge’s 9 Beet Stretch from 2002, and a section from Johannes Kreidler’s Compression Sound Art from 2009, this paper reflects upon idea-based sonic art that explores duration. Listening to such works tests Karlheinz Stockhausen’s notion of ‘unified time structure’, as well as Pierre Schaeffer’s definition of a musical object as necessarily having an “overall temporal form” that allows “optimal memorisation”; especially, he claimed it could neither be “too short”, nor “too long”. In this perspective, we ask: what makes us understand a ‘work of music’ as a unitary whole?

SoundLab

High Spatial Resolution Audio :: Sound Art, Design and Perception Research

PhD session [2022-10]

Program

Manni CHEN

Generative Adversarial Networks for How Reverberation Affects Frequency Contents in Music Production

Manni CHEN

AUTOMATIC MUSIC MIXING WITH DEEP LEARNING AND OUT-OF-DOMAIN DATA by Marco A. Martínez-Ramírez [1]

GUI Ren

Destination: Sound, memories, and LGBTQ identities in a Beijing nightclub

Mirjana DOKIC

Creating Audio Object-Focused Acoustic Environments for Room-Scale Virtual Reality

Mirjana DOKIC

Field Research in Hong Kong: Music Composition, Performance and Sound Design for Virtual Reality

Giuseppe PISANO

OAT – Open Ambisonics Toolkit

YUE Ran

Courses

PerMagnus LINDBORG

Re-scaling Beethoven: very long, very short

Leave a Reply Cancel reply