Ambisonics Recording Kit


Ambisonics Recording Kit Demo

thrown together by David Tamés, v.1, December 14, 2023

This presentation is a first draft, comments and suggestions are requested, please contact me at: d (dot) tames (at) northeastern (dot) edu.

For copyright and acknowledgments please refer to the last slide. 


Ambisonics Recording Kit Demo Agenda

In this 90-minute demo, you will be introduced to using the Ambisonics Recording Kit that will soon be available from the CAMD Immersive Media Lab. We will begin with an introduction of the fundamentals of Ambisonics and the components in the kit, and you’ll have an opportunity to record sound with the kit and listen to the results, and learn about postproduction options for working with Ambisonics sources for use in immersive audio compositions, 360 videos, and VR projects. I would also like to discuss what additional resources do we need to support creative research that incorporates immersive audio? The MSO has not yet determined the actual name of the kit, and the deployment and actual inventory of the kit and the availability of post-production software resources are still being finalized. If you are interested in using Ambisonics recording or postproduction for a course or creative research, please get in touch with Jonny Ouk.


Ambisonics is object based audio that captures a 3D sound field

Ambisonics is a sound encoding/decoding standard for full-sphere surround sound. Ambisonics is flexible, future-proof, and captures a high degree of realism. Despite being around since the 1970s, it is still new to a lot of people and, like every technique, it has a bit of a learning curve. Audio materials encoded in the Ambisonics format represent sound sources along the horizontal plane and above and below the listener. Unlike traditional multi-channel surround formats (e.g., 5.1 and double MS), Ambisonics implements a loudspeaker-independent representation of a sound field (or soundfield) that can be decoded to a wide range of loudspeaker setups (e.g., mono, binaural, stereo, 5.1, 7.1, cube, octahedral, etc.). 

Ambisonics allows creators to work with sound sources based on directions instead of speaker positions and offers flexibility regarding the playback environment. Many sound artists and designers, including Mark Mangini (Dune, Blade Runner 2049), advocate for recording sound effects and ambiances in Ambisonics due to the post-production flexibility it provides. Therefore, 360 video and VR are only two of the many use cases for Ambisonics in the contemporary media production landscape. You don’t need to record Ambisonics sources for an Ambisonics mix, many creators build their immersive soundscapes from stereo and mono sources that have been encoded to Ambisonics for mixing. The future of audio is object based, which releases us from the limits of specific speaker configurations and makes it easier to future proof our projects and release them in a variety of delivery formats. 

Image: Illustration by Yoshi Sodeoka from Spatial Audio, New York Times, October 22, 2021,


Ambisonics has been around since the 1970s and is a close cousin of MS and double MS stereo

Ambisonics was developed in the mid-1970s by a group of British academics, notably Michael Gerzon (Mathematical Institute, Oxford) and Peter Fellgett (University of Reading). The system was designed to reproduce recordings made with a “Soundfield” microphone and mixed using their own technology, and reproduced with a minimum of four speakers.  The significant innovation of this project was the creation of an object-based audio format that encoded the direction, distance, and height of recorded sound without specific reference to a loudspeaker channel. The technology is a cousin of the M-S stereo recording technique developed in the early 1930s by Alan Dower Blumlein, an English scientist who was also responsible for the 45/45 stereo record cutting technique. The M-S configuration with a cardioid or line-cardioid microphone for the middle channel and a figure-of-eight microphone for the side channels has been demonstrated to achieve the highest accuracy of stereo imaging with two loudspeakers and is easily decoded to stereo (and stereo is easily encoded into M-S). Double M-S adds an additional line-cardioid or cardioid middle channel pointing to the rear for a full 360 surround recording. But the moment you move or turn your head, the spatial realism vanishes. Ambisonics to the rescue. 

Curiously, if you are working with AmbiX files, using SN3D normalization and standard ACN (Ambisonic Channel Numbering), WYZX, the first two channels of the file are the W (omnidirectional) and Y (left/right) signals, identical to the mid/side signals of M-S stereo, which you can simply decode with a mid/side decoder available in most DAWs. The M-S heritage is right there in Channels 1 and 2!


Images: Double MS (B9 Audio); Triple Calec Microphones (Into The Soundfield Michael Gerzon and Ambisonics at Oxford)


While developed in the 70s, Ambisonics did not gain traction until YouTube, Facebook (now Meta), and Unity adopted it as a standard for 360-degree videos. Ambisonics has become a widely used not only as a 360-degree spatial audio standard for VR and 360 video but is also used for immersive installations and field recording due to the flexibility of the format. For example, Reeps One: Does Not Exist (VR Beatbox with 3D sound) by Mill+ and Aurelia Soundworks was composed to take advantage of the 360 audiovisual space, creating a new style of music video, watch on YouTube, and read more at


Ambisonics enables an immersive sound experience

The listener can locate the sound coming from any direction in space. It works on the principle that if you can record the boundaries, you can reproduce the inside using differential field equations (Kirchhoff-Helmholtz Integral theory. An Ambisonics encoder takes azimuth and elevation of the sound source to be encoded as input and the listener will perceive the sound source to be present at that particular position when the audio material is decoded.

In an Ambisonics rendering system you would see loudspeakers not just at a height of the listener’s ear level but even above and below (Periphony). If the listener is using headphones, binaural processing based on HRTF filtering can be applied to provide the spatial effect and if head tracking is implemented, the sound field will remain stable rather than remain fixed when the listener moves their head. 

For an example, see “The Rise Of Immersive Audio: Progressum Unifies Vivid House” by Katinka Allender, LIVE DESIGN, August 18, 2022,


Ambisonics is loudspeaker array independent, therefore it has many applications

The 5th order Ambisonic Dome at the School of Arts, Media and Engineering (DAME) at Arizona State University is an example of a state-of-the-art space to work with spatial audio and the home of  Ambisonic Dome Concerts The dome contains 45 speakers, allowing for the precise placement of sounds anywhere along a three-dimensional space, giving artists the opportunity to create immersive audio experiences and explore new frontiers of sonic art, virtual reality and multimedia art. The Dome was designed by Garth Paine with technical director Peter Weisman. 


Ambisonics microphones and post-production options

There’s a wide range of range of microphones available for field recording and several plug-in libraries for major DAWs supporting Ambisonics mixing and rendering to different exhibition formats and we’ll cover some of these today.


Ambisonics is a full sphere surround sound format

Unlike traditional surround formats like quadraphonic, 5.1, 7.1, etc. Ambisonics covers sources above and below listener in addition to the horizontal plane. Transmission channels do not carry speaker signals; instead, they contain a speaker-independent representation of a sound field (called B-format). For playback, material encoded in Ambisonics B-format is decoded to a specific speaker configuration. The same source material can be decoded for playback through stereo speakers, binaural or stereo headphones, a four-speaker setup, a multi-speaker dome, etc. Media makers can think in terms of sound source directions rather than loudspeaker positions, and the format provides a high degree of flexibility in terms of speaker layout. The most effective method to present spatial audio in VR and 360 video applications, the scene can be rotated to match the participant's head orientation and then decoded as binaural stereo.


Ambisonics requires multiple speaker arrays

For the most immersive effect, we need as many loudspeakers as we have B-format channels, preferably a few more!  The number of loudspeakers for playback should exceed the number of channels. The number of Ambisonics encoded channels is equal to (order + 1)² Higher order Ambisonics provides better localization of sound source, however, it requires additional channels. 1st order Ambisonics requires four encoded channels.  2nd order Ambisonics requires nine encoded channels.  3rd order Ambisonics requires16 encoded channels.  And would also need additional loudspeakers to take advantage of higher orders.


Ambisonics reconstructs a plane wave by decomposing the sound field into spherical harmonics

Higher order Ambisonics (HOA) are used to reconstruct a plane wave by decomposing the sound field into spherical harmonics. This process is known as encoding. Encoding creates a set of signals that depend on the position of the sound source, with the channels weighted depending on the source direction. The functions become more and more complex as the HOA order increases. The spherical harmonics are shown here up to third-order. These third-order signals include, as a subset, the omnidirectional zeroth-order and the first-order figure-of-eights. Depending on the source direction and the channel, the signal can also have its polarity inverted (the darker lobes).

The various channels in an Ambisonics B-format file may be visualized as virtual microphones with increasingly complex pick-up patterns. As you add channels, you increase the spatial resolution and the size of sweet spot, remember, channels do not correspond to loudspeakers, they correspond to spatial harmonics. Here we see a visualization of  Ambisonics spherical harmonics for orders up to three. 1st order Ambisonics (abbreviated 1OA or FOA) includes the top two lines of basis functions (four channels) and 3OA includes all four lines (16 channels). Note that ACN channels 2, 6 and 12 contain only vertical components. 

The AmbiX file format is the standard in most widespread use at this time. The format defines the channel ordering and the normalization, both are related to the spherical harmonics, the mathematics behind Ambisonics. The channel ordering used in the Ambix files is called ACN which stands for Ambisonic Channel Order. In contrast to FuMa, another B-format standard, is that the channels aren’t ordered alphabetically any more. The first order components for FuMa is WXYZ and the first order components for ACN is WYZX where  W is the omni-directional signal and X, Y, and Z are the figure-of-eights in the x, y, and z direction, respectively.  SN3D Normalization is used in the AmbiX format because this ensures that when you encode a source, the levels of all channels will not exceed the first (omni, W) channel. In general, this is quite handy to avoid clipping in your DAW. It is important to know the file format when working with Ambisonics sources that were recorded in the past, as there were many different configurations in use before the AmbiX standard.  A neat property of AmbiX files: Using SN3D normalization and the standard ACN (Ambisonic Channel Numbering, WYZX), in other words AmbiX, the first two channels of your Ambisonic signals (WY) will hold mid/side signals, which you can simply decode with a mid/side decoder, like the one which comes with Reaper. So a very very simple StereoDecoder would be exactlyt that. Neat, right? The MS heritage is right there in Channels 1 and 2!



Spatial resolution is improved and sweet spot enlarged with higher orders 

Since the number of loudspeakers has to at least match the number of HOA channels the expense and physical limitations are significant factors. How many venues can provide the 64 speakers needed for 7OA playback? So why encoding things to a high order in postproduction if we are limited to lower order playback? There are two reasons why postproduction is often done at orders much higher than the source material: (1) future-proofing and (2) better binaural rendering (and given headphones are ubiquitous, even if participants are only going to listen with headphones, mixing in higher orders makes sense. 

One of the best features of Ambisonics as a postproduction format is that you can work with a subset of channels to use for a lower order rendering. The first four channels in a 3OA mix are exactly the same as the four channels of a 1OA mix! We can ignore the higher order channels without having to do any approximative down-mixing! By encoding at a higher order than might be feasible for the current configuration for deployment you can remain ready for for another configuration of loudspeakers in the future. For example, you might mix for a binaural preview, then mix for a 24 speaker dome for a museum installation, and use the binaural mix for documentation of the experience and a 1OA mix for deployment to a VR headset. If the limiting factors to HOA are cost and loudspeaker placement issues then what if we use headphones instead? A binaural rendering uses headphones to place a set of virtual loudspeakers around the listener. Now our rendering is only limited by the number of channels our PC/laptop/smartphone can handle at any one time (and the quality of the HRTF).


1st order Ambisonics (1OA) microphones

1st order Ambisonics (1OA) microphones require only four channel recorders at the expense of reduced spatial cues and a smaller sweet spot compared to 2OA and 3OA microphones. The Sennheiser AMBEO VR microphone in the CAMD kit is a 1OA microphone. 


CAMD Ambisonics Recording Kit Inventory

Configuration currently a work-in-progress.


Channel formats

A-format is the term used for the unprocessed signals from the four capsules of a tetrahedral sound field microphone consisting of four sub-cardioid capsules (a polar response that is slightly more omni than cardioid) mounted on the surface of a tetrahedron. The capsule outputs can be electronically equalized to some degree so that they appear to be coincident up to a certain frequency. Basic sum-and-difference processing of the A-format outputs generates the B-format components. X = 0.5((LF–LB) + (RF–RB)), Y = 0.5((LF–RB) – (RF–LB)), Z = 0.5((LF–LB) + (RB–RF)), and W = 0.5(LF+LB+RF+RB). The resulting B-format outputs are carefully equalized to compensate for level differences, as for example the W output may lack low frequencies as it is derived from velocity capsules that can lack bass. Because the characteristics the capsules will vary between different microphone designs, the exact specification of the A-format signals is not fixed and each microphone has a procedure (implemented in hardware or software) for converting from A-format to B-format for further processing. Usually we are working with four channels (1OA) from field recordings, but higher orders are also used. 

The basic format that is used for the storage and manipulation of ambisonics is B-format. AmbiX is the contemporary B-format standard that has been widely adopted by distribution platforms such as YouTube, it orders the channels W-Y-Z-X. B-format format consists of the spherical harmonics of the sound field up to the order being considered.  For first-order ambisonics there is one signal of 0th order known as W, and three of 1st order known as X, Y, and Z. These signals correspond conceptually with the outputs of one omnidirectional microphone and three orthogonal figure-of-eight microphones placed at the same point. This four channel signal allows the manipulation required to generate speaker signals, to rotate the sound field, and various other transformations, to be performed with the simple mathematics, and therefore this format is used for storage, manipulation, and transmission of Ambisonic material. 

For a more comprehensive description of Ambisonics channels, see 


Ambisonics can be decoded to any speaker configuration

An Ambisonics decoder can decode the B-format signals to any loudspeaker setup or binaural in case of headphones. One of the advantages Ambisonics is that the format is independent of  loudspeaker configuration. Traditional surround sound formats have separate channels for each loudspeaker placed at the front, center, back, etc.). Ambisonics co-exists well with existing mono, stereo and 5.1 setups, but also opens up the possibilities of more sophisticated surround sound arrangements. No wonder sound designers like Mark Mangini now advocate for recording ambiences in Ambisonics for maximum flexibility in post production. While many sound effects are better recorded using a coincident stereo format like MS stereo and dialogue is traditionally recorded in mono, these can be placed anywhere in the sound field in post production. The Ambisonic B-format WXYZ signals define what the listener should hear. How these signals are presented to the listener depends on the number of speakers and their location. Ambisonics treats directions where no speakers are placed with as much importance as speaker positions.  UHJ is the matrixing scheme associated with Ambisonics, the two-channel version was a compromise between the BBC's Matrix H and the NRDC's 45J.  The BBC's had been chosen from a range of possibilities by listening tests, and the NRDC's was based on theoretical principles.


Ambisonics offers maximum postproduction flexibility

B-format Ambisonics can be decoded into almost any playback format, including: mono (without "sum to mono" phase cancellation issues); stereo; binaural, fixed-head or headtracked, using individualized or generic HRTF information; four speakers arranged as a square or rectangle; six speakers arranged as a regular or irregular hexagon; 5.1 (ITU); 7.1; 10.1; Dolby Atmos; or any of these formats plus height information (e.g., two hexagonal arrays of speakers, one above the listener and one below); and many more.


Ambisonics — hands-on recording and listening

Time for the hands-on portion of the demo! We'll make a recording, listen to it, and talk about some postproduction considerations.

I would be happy to offer a follow-up workshop covering editing, mixing, and playback options for Ambisonics projects, let me know if you are interested.


A brief glimpse of Ambisonics post-production options

Reaper ( is a DAW widely used for Ambisonics mixing due to the ability to accomodate up to 64 tracks for working with HOA up to 7OA.

The most extensive suite of free plug-ins for working with Ambisonics materials is the IEM Plug-in Suite,, it includes Ambisonic plug-ins up to 7OA, it was created by staff and students of the  Institute of Electronic Music and Acoustics (IEM) (  Below is a summary of the plug-ins in the suite and what each is used for:

  • AllRADecoder is used design an Ambisonic decoder for an arbitrary loudspeaker layout.
  • BinauralDecoder converts Ambisonic signals directly to binaural headphone signals using HRTFs from the Neumann KU 100 dummy head and it allows you to add equalization.
  • CoordinateConverter converts VST parameters from a spherical representation to a cartesian representation, and vice versa. Typically used to covert xyz-position data (e.g. in track automation) to a spherical representation (azimuth, elevation, radius).****
  • DirectionalCompressor is an Ambisonic compressor/limiter which lets you control the dynamics for different spatial regions.
  • DirectivityShaper filters an input signal into four independent bands, to which different directivity patterns and filters can be applied.
  • DistanceCompensator takes the distances from the listening position to each loudspeaker and calculates the needed delays and gains in order to compensate for distance differences.
  • DualDelay has two delay-lines which can be configured independently and the timbre can be shaped with the high- and low-pass filters and it can be fed back into itself.
  • EnergyVisualizer displays the energy distribution on the sphere of the Ambisonic input signal using an area-preserving spherical projection. Energy levels are color-coded with a perceptually motivated colormap.
  • FdnReverb provides spatial and temporal diffuse reverberation.
  • GranularEncoder takes a mono or stereo input and encodes short ‘grains’ of audio into the Ambisonic domain that are distributed around a controllable center direction with selectable size and shape of the distribution.
  • MatrixMultiplier applies a TransformationMatrix object to the input signal.
  • MultiBandCompressor splits an Ambisonic signal into four bands and compresses them individually. Crossover frequencies can be adjusted and the individual compressors are fully configurable.
  • MultiEncoder encodes multiple sources (up to 64) and each source can be panned, muted and soloed individually along with gain adjustment.
  • MultiEQ is a simple multi-channel equalizer filtering up to 64 audio channels.
  • OmniCompressor is an Ambisonic compressor that may also be used as a limiter. It works like a regular audio compressor but uses the fW-channel as the driving signal and applies the calculated gains to all the whole Ambisonic signal.
  • ProbeDecode samples/decodes an Ambisonic input for one specific direction and listen to the output that is used mostly for testing purposes.
  • RoomEncoder can place a source and a listener into a virtual shoebox-shaped room and render over two hundred wall reflections. You can move the source and the listener freely within the room, which will automatically generate the Doppler-shift effect.
  • SceneRotator rotates an Ambisonic scene. It can be used with Yaw-Pitch-Roll rotation data or Quaternions. In combination with binaural playback head-movements can be compensated for with tracking data.
  • SimpleDecoder reads JSON configuration files created with the with AllRADecoder and decodes the Ambisonic input signal to loudspeaker signals with subwoofer support.
  • StereoEncoder encodes mono or stereo audio signals into Ambisonics. Provides Azimuth, Elevation and Roll sliders to pan the source and a Width slider to separate the input channels.
  • ToolBox provides a variety of features including flipping the Ambisonic input signal along the x, y, or z axis and mixing Ambisonic signals with different orders.

I’ve also uses the a1/a3 Bundle of Ambisonics plug-ins from SSA Plugins,, along with Harpex-X,,  for extracting virtual microphones and binaural decoding.

SSA offers the following plug-ins:

  • aXPanner controls the azimuth and elevation of the tracks to place them anywhere in the soundfield. Width increases the size of mono tracks or spreads stereo materials.
  • aXRotate changes the orientation of an Ambisonic material by changing yaw, pitch, and roll. When connected to a head-tracker sound sources stay in place as the listener moves their head.
  • aXMonitor convert Ambisonics signals to binaural for 3D headphone listening. Supports loading HRTFs as .sofa format. Also converts to “Super Stereo” UHJ format for playback over speakers.
  • Dynamics tames the dynamics of an Ambisonic material while retaining spatial balance.
  • aXCompressor catches peaks and evens out dynamics of Ambisonic material with control of temporal and level parameters.
  • aXGate is a companion to aXCompressor that provides control of how much the gate opens to reduce the background noise in Ambisonic materials.
  • aXDeesser reduces sibilance in Ambisonic microphone recordings using a virtual microphone to create a side-chain for the compression.
  • aXEqualizer provides 8 bands of EQ with low-pass, high-pass, peak, shelves, and other filters while preserving the spatial balance.
  • aXDelay has five modules each with individual delay, gain, and feedback controls. It can also sync to the tempo of your track. The delayed signals can be sent anywhere in the sound field.
  • aXMeter displays you the levels in each channel grouped in Ambisonic orders, providing a clear overview of signal levels.

I’ve been using the free dearVR AMBI MICRO,, for optimized A-to-B conversion for the Sennheiser AMBEO VR microphone and dearVR MICRO, provides another tool for binaural rendering, HRTF selection, panning etc. and it's free, although I prefer to use Harpex-X for binaural rendering and A-format to B-format conversion, but it’s not free.

If your goal is to produce a 360 video or an immersive installation without computer-based interactivity, or an audio-only binaural experience, Reaper or ProTools are good options with support for Ambisonics editing and mixing.

On the other hand, if you intend to create an experience for an interactive installation or a VR experience, you'll want to consider the Wwise from AudioKinetic suite of design and development tools that are tailor made for prototyping and deploying interactive audio experiences. My own experience is limited to Reaper and Unity, but Wwise look intriguing for interactive works.

It is preferable to mix and deliver the soundtrack as 3OA because this will provide better spatial resolution (a more immersive audio experience for the participant). Even if your Ambisonics field recordings are 1OA, any mono or stereo sound sources encoded to Ambisonics will benefit from better spatial resolution if you encode them to 30A instead of 1OA. Another benefit of mixing with 3OA is better decoding to binaural.

Wwise, Reaper, and ProTools all support working with Ambisonics up to 3OA. Reaper and ProTools Ultimate can supports up to 7OA (64 channels) which will be required when working with large speaker array installations. However, for many VR and installation use cases, 3OA (16 channels) provides a good balance between computational resource requirements and spatial resolution of the end results.

Most playback platforms only support 1OA, for example, YouTube, the most popular platform for 360 content at this time, however, working in 3OA offers more deployment flexibility in the future and better binauralization right now for sources that originated as mono or stereo.

For my current project I'm using Reaper with a 3OA mix, which I've determined is good enough for my use case. I have been using the Harpex-X plug-in for up-conversion from 1OA recordings made with my Sennheiser AMBEO VR microphone to 3OA with good results. You can just mix 1OA straight into a 3OA mix since the first four tracks of any B-format HOA does make up the first order. However, based on my experience so far, I get slightly better results up-converting 1OA material to 30A first, but this could very well be a placebo effect.

BTW, 7OA mixing might be a bit of overkill, for many art installations and VR experiences, I believe that 3OA (16 channels) is good enough when working with a variety of sources including stereo sound effects and mono dialogue and 1OA field recordings (that’s the format I’m using for the sound design for my current project), however, the advantage of 7OA is very precise placement of mono and stereo sources in the sound field, which may be required for a large dome installation and also leads to better binaural decoding.

I disagree with the New York Times research report on spatial audio that advocates for Dolby Atmos and Pro Tools are the way to go for mixing immersive audio; there is a lot of hype surrounding Dolby Atmos at the moment,  the dust has yet to settle, and for installations and binaural deployments it’s a waste of energy to even think about Dolby Atmos, I believe the most prudent thing to do is to avoid it, see “Gaslighting Your Fans w/ Dolby ATMOS™” and “ATMOS doesn't make sense” below. 



Ambisonics — additional materials 

additional materials we did not have time for


2nd order Ambisonics (2OA) microphones

offers significant improvements over 1st-order microphones, these improvements are particularly attractive for three reasons: (1) 2OA microphones are much better at preserving the perceptual cues necessary for a listener to precisely locate sound sources; (2) 2OA microphones provides a larger sweet spot for listeners, this is particularly valuable in dome or room installations (while 1OA microphones have a sweet spot around the size of a human head, a 2OA microphone can accommodate multiple listeners without degrading a recording's sound location perceptual cues); and (3) 2OA microphones can be used 50% farther away from the sound source while maintaining the same directivity index. An example of a 2OA microphone is the Core Sound OctoMic, see for more information. 


3rd order Ambisonics (3OA) microphones

offers improvements over 2OA microphones, offering excellent spatial cues and a large sweet spot for more convincing being-there experiences suitable for dome and small venues. The Zylia ZM-1 is an example of a 3OA microphone. It sports 19 digital MEMS sensors distributed in a sphere and is part of a system providing direct recording to a laptop or tablet via USB, or the ZR-1 portable recorder. The ZYLIA ZR-1 records 22 channels in total: 19 from the microphone capsules plus 2 channels stereo (pseudo binaural) and one channel timecode. See for more information.


Copyright 2023 by David Tamés, some rights reserved. This presentation (not the copyrighted images and figures) is released are under a Creative Commons Attribution-ShareAlike license.

Figures have been reproduced from various sources under several licenses; please check their caption for source information. Copyrighted images and figures are used for educational purposes under fair use guidelines. Uncredited images of products are from their respective vendors. Images credited with “d.t.” are by the author and are released under the same license as the presentation.