Sound Recording / Editing / Mixing / Design Glossary

This is a glossary of terms you’ll come across working with sound recording, editing, mixing, and design for moving image production.

2-pop. Following on from a leader, a single frame with a number ‘2’ on it will appear that has a short audio tone for its duration. This tone allows sound editors to correctly synchronize the audio to the picture. Much less used now that digital files can easily be lined up with timecode, but sometimes this technique is used as a backup in case there are problems with timecode.

AC-3. The industry-standard designation for Dolby Digital.

Acoustics. The science of sound wave transmission. In general, the term is used to refer to the characteristics of rooms, theaters, auditoriums, and studios in terms of their design and audio characteristics.

ADR. Automated Dialog Replacement. This process involves re-recording actors’ dialog in a studio and syncing it up to their moving lips on screen as if it was recorded on-set. This is usually performed when dialog is recorded poorly or to change certain lines.

AAC (Advanced Audio Codec). A high quality compressed audio file format suitable for distribution but not production. Original recordings should always be made with uncompressed formats (e.g. WAV). See MP3.

AIF (a.k.a. AIFF). An uncompressed audio file format. See WAV.

Automatic Gain Control (AGC). A method available on some audio recorders and the audio section of video cameras in which audio levels are automatically controlled. On quiet passages, the camera raises the gain (raising the noise floor), and on loud passages, it will reduce the gain. You can hear this “pumping” of the gain in the soundtrack. In a pinch, it’s acceptable to use AGC, however, set levels manually whenever possible. If your camera offers a limiter option, that is a better way to deal with unexpected peaks.

Ambient noise. The total sound in an environment which is unique to that environment. Also known as room tone. Plays an important role in making seamless audio edits, which requires that the “silence” between words and sentences contains ambient noise that matches the environment in which the dialogue takes place.

Amplitude. The strength of an electronic signal is measure by the height of its waveform.

Analog. A signal that varies continuously in relation to some reference. In contrast, a digital signal varies in discreet steps. Due to its continuous nature, analog can be better at conveying small signal fluctuations where a digital signal with a low sample rate would miss them. With high sampling rates and higher bit-depths, the differences between an analog signal and its digital representation disappear.

Analog-to-Digital Converter (ADC). A device used to convert analog electrical signals (e.g. from a microphone or analog mixer) to digital data that represent the level and frequency information contained in the original analog signal.

Analog recording. A means of recording audio whereby the recorded signal is a physical representation of the waveform of the original signal. 1/4-in. reel to reel magnetic tape is an example of an analog audio format. Whenever a copy is made of a recording in an analog format, the copy exhibits additional artifacts, not in the original.

An audible effect caused by an error or limitation in the system. See also noise.

Attack. The time taken for a sound or musical note to rise to peak amplitude.

Attenuate. To reduce the strength of a signal.

Attenuator. A device that reduces signal strength. For example, line levels need to be attenuated before they can be fed into a device that only accepts microphone level signals, so you would use an attenuator in this situation.

Audible spectrum. Sound waves in the frequency range between 20 and 15,000 Hz or so that move through the atmosphere and produce an audible sensation in the average human.

Audio Sweetening. See Sweetening.

Balanced line. An audio circuit with 3 wires, two carry the signal, and the third provides the ground. Compared to an unbalanced circuit using a single signal wire and a ground, balanced signals are much less susceptible to picking up interference. Therefore, professional sound recording equipment is usually designed to work with balanced wiring. While XLRs are the most widely used connectors with balanced wiring, a particular connector does not guarantee the existence of balanced wiring. Some cameras and most professional sound recorders provided balanced XLR connectors for audio input.
 See unbalanced line.

Background. Term used to describe the ambience in a scene or to relative volume, for example, “put the cracking sound in the background.”

Bandwidth. The amount of information that can be passed through a system at a given time. Typically, the greater the bandwidth the better the audio quality, however, the compression techniques (if any) used also influence this, since some compression formats allow for a reduction of bandwidth while maintaining very similar audio quality.

Beat. A periodic variation of amplitude resulting from the addition of two frequencies that are slightly different.

Bed. Background music or ambient sound used underneath the dialogue track in a sound mix.

Beep. 1. A tone placed in a particular position on a sound track in post-production in order to establish a sync point. The tone is used to align the audio track with the picture for precise synchronization. A fool proof method that is often used as a backup even when time code is being used. For example, your composer might give you audio tracks and place a beep two seconds prior to the start of picture so you can line up the music with your project. 2. Sound made by the Roadrunner.

Bin. Originally a storage bin for editorial film reels but now commonly used to refer to hierarchical folders for storing clips in an NLE.

Bit. 1. A single element (1 or 0) of digital representation of information. 2. A minor role in which an actor may speak only a few lines of dialog. Also known as a bit part.

Bit rate. The amount of data transported in a given amount of time, usually defined in Mega (Million) bits per second (Mbps).

Black box. A term used to describe a piece of equipment dedicated to one specific function.

Blip tone. See Beep.

A pole used to extend a microphone above the subject or actor you want to record, permitting sync sound recording without interference with the subject or actor’s movement. Boom poles are available in a range of lengths, materials (aluminum or super-light carbon fiber), and with or without internal wiring.

Box rental. A fee paid to a crew member for providing their own equipment or other specialized gear for use in a production.

Breakaway cable. See ENG Snake.

Broadcast quality. An nebulous term used by marketing people to describe their products.

Broadcast Wave Format (BWF). An extension of the popular WAV audio format and the recording format of most file-based non-linear digital recorders used for motion picture, radio and television production. See PolyWAV.

Bus. A network that combines the output of two or more channels on a sound mixer.

Byte. 8 bits. A common unit of digital information. The combination of 8 bits into 1 byte allows each byte to represent 256 possible values. (see Megabyte, Gigabyte, Terabyte).

C-Stand. A versatile stand used to support equipment on the set. Usually outfitted with a grip head and a gobo arm. Can be used for hanging sound blankets or holding a Boom Baby (accessory for holding a boom pole that connects to a grip head mounted on a C-Stand or light stand). See Grip head, Gobo arm.

Capacitance. The ability of an electrical component to store electrical charges. Condenser microphones work on the principle of capacitance.

CD (Compact Disc). A digitally encoded audio storage format containing over an hour of music digitized with a sampling frequency of 44.1 KHz and a bit depth of 16 bits. The data is read from tiny pits on the surface by a laser beam.

CD quality. An nebulous term used by marketing people to describe audio products.

Channel. One of possibly several audio tracks in an audio file. Stereo files have two channels, 1 (Left) and 2 (Right). Some file formats support additional channels to contain extra audio data. For example, the MixPre-3 can record the L/R Stereo mix to Channels 1, 2 in the WAV file and ISO (isolated) channels 1, 2, 3 recorded on channels (or tracks) 3, 4, 5 respectively.

Cinéma vérité. In French, literally, “cinema truth.” A style of documentary filmmaking in which the filmmaker captures real people in real situations with spontaneous use of hand-held camera, naturalistic sound recording, and with participation on the part of the filmmaker, for example, Chronicle of a Summer (1961, Jean Rouch & Edgar Morin, French title: Chronique d’un été). Also called direct cinema, however, direct cinema sometimes refers to a different style that was dominant in the United States in the 1960s and differed in terms of much less filmmaker involvement, for example, Salesman (1968, Albert & David Maysles).

Clapper board. See Slate.

Click track. A prerecorded track of metronomic clicks used to ensure proper timing of music to be recorded. Used in music scoring sessions.

Compander. An audio device or software filter that compresses an input signal and then expands the output signal in order to reduce noise.

Constant Bit Rate (CBR). An audio compression technique where the amount of compression does not change. For example, MP3 files can be either Constant Bit Rate or Variable Bit Rate.

Clip. A short piece of video or audio that is usually part of a larger recording.

Clipping. When an input signal exceeds the capability the equipment to reproduce the signal, clipping occurs. In an analog recording system the results are audible distortion, however, in a digital system you end up with incomprehensible noise.

Codec. A piece of software designed to encode and/or decode video or audio data into a form readable by a computer.

Compression. 1. Audio: The reduction of a span of the greater amplitudes in an audio signal for the purpose of limiting the reproduction of those particular amplitudes with the effect of reducing the difference between peak amplitudes and average amplitudes, making the overall signal sound louder when some gain is added (since peaks will no longer over modulate).  2: Data: A method for reducing the bitrate of a digital representation of an audio signal in order to reduce te storage requirements of the representation. Methods like AAC and MP3 involve the use of psychoacoustic models to discard portions of the audio signal that people will not notice, but always results in artifacts. For professional audio recording, always work with uncompressed audio file formats (e.g. WAV or AIF).

Compression ratio. The ratio of the amount of data in the original video compared to the amount of data in the compressed video. The higher the ratio the greater the compression.

Condenser microphone. A microphone design in which sound causes the movement of a plate (diaphragm) in relation to a fixed backplate. This movement causes a change in capacitance (electrical charge) which is translated to voltage by an amplifier. Therefore, condenser microphones require electrical power to operate. Microphones designed for video production can usually be powered using phantom power from a camera, mixer, or recorder. Some condenser microphones have an onboard power supply and thus require the use of a battery.

Crossfade. The gradual mix of an incoming and outgoing audio signal with the aim of easing abrupt transitions between the two. Typically a software effect that simulates the simultaneous manipulation of two or more mix console faders or a simple transition effect in an editing system.

Crossover. The frequency at which an audio signal is split in order to feed separate drivers of a loudspeaker system.

Crosstalk. This is the amount of audio signal bleed between channels measured as separation (in dB) between the desired sounds of one channel and the unwanted sounds from the other channel.

Cueing. A term with a broad range of uses meanings depending on the context. For Voice-Over Narration or Dialogue Replacement, the marking of the cue point in a way which will permit a signal to be given to the talent to begin each element of dialog at the appropriate time. In general, any system used by a person to signal another person that recording should begin.

Cue sheet. A list of music or library sound effects used in a production for the purposes of obtaining usage rights.

DAW (Digital Audio Workstation). A computer-based system used for recording, editing, processing, and mixing sounds. Originally referred to expensive workstation-based systems, today many software-based DAWs are available that run on standard hardware including Pro Tools, Logic, and Reaper.

Dead cat. See Windshield.

Dead spot. An area within a location in which sound waves are canceled by reflections arriving out of phase with the desired signal thus creating an area of reduced audibility.

Decibel (dB). 1. A unit for expressing the ratio of two levels of electric or acoustic signal power equal to 10 times the common logarithm of this ratio. 2. A unit for expressing the relative intensity of sounds.

Decay. The time taken for a sound or musical note to go from peak amplitude (attack) to the sustain level.

Decibel (dB). A unit used to describe sound levels. The decibel quantifies sound levels relative to some 0 dB reference. The reference level is typically set one of several ways: 1. when referring to sound pressure levels (SPL) the reference is set to the threshold of perception of an average human; 2. In digital recording, you set the level in a recording system relative to as 0 dBfs where fs refers to “full scale,” or the strongest signal that can be recorded without distortion, digital level meters read in negative numbers from left to right like -20dB, -12dB, -6dB, -3dB, 0dB; 3. when adjusting audio levels in audio clips in a non-linear editing system, typically 0dB for each clip is the normal level and you go plus or minus in terms of dB in order to make the clip softer or louder. Decibels are actually ratios. The ratio of the sound pressure at the threshold of hearing to the limit that ears can hear without harm is above a million. Because the power in a sound wave is proportional to the square of the pressure, the ratio of the maximum power to the minimum power is above one trillion. To deal with such a range of numbers, logarithmic units are useful: the log of a trillion is 12, so this ratio represents a difference of 120 dB. It’s easier to deal with numbers between 0 dB and 120 dB to talk about the dynamic range of sound rather than a trillion. We typically work with sound adjustments in 3dB (for a small change) and 6 dB (more noticeable change) increments. Even though an increase of 3 dB represents a doubling of the intensity of the sound, we don’t perceive it that way. Perception studies have shown that a 3 dB change in sound level is barely noticeable. Most listeners don’t report a significant change unless it’s 6 dB and it requires a big change of 10 dB before the average listener hears a “doubling” of the sound.

Decode. The process of reading data in one format and outputting it in another. For example, MS stereo may be decoded into L/R stereo. See also Decoder, Encode.

Decoder. A device or software component that reads a signal and turns it into some form of usable information. For example, an MP3 decoder takes audio that was compressed with an MP3 encoder and converts it to sound data that can be played back on a computer or iPod. The same goes for H.264 video.

Dead spot. A place where sound waves are canceled by out-of-phase reflections, resulting in silence or poor audibility.

Dialogue. Synchronous speech in a film with the speaker usually, but not always, visible.

Dialogue Editor. A sound editor that focuses purely on dialogue. His job is to assemble, synchronize and edit the dialogue in production, with the aim of producing the clearest dialogue possible for the sound editor to work with.

Dialogue track. A sound track which contains sync dialog. Typically while editing dialog tracks are kept separate so they can be processed differently from ambience, music, and sound effects tracks.

Diegetic. Typically refers to the internal world of the story (the diegesis) that the characters themselves experience and encounter including those not actually shown on the screen but referred to in some way within the story. Thus, film elements can be “diegetic” or “non-diegetic.” The term is most often used in reference to sound, but can apply to other element in a film. For example, titles, subtitles, background music, and voice-over narration (with exceptions) are non-diegetic elements.
Diegetic music. Music from a source within the film scene, such as a “live” orchestra or a radio playing. See Non-diegetic music.

Diegesis. See Diegetic.

Diegetic sound. Music or sound effects originating from a source apparent within a film scene. Sound. This is in contrast to the music score for example, which accompanies the movie but generally does not appear to come from within it. See Diegetic.

Digital. A representation format in which data is translated into a series of ones and zeros. Numerical data (base 10) is translated into binary numbers (base 2). Symbolic data is translated according to codes (for example, the ASCII code system assigns binary numbers to characters so they can be encoded digitally). Audio and images are sampled. See also sample, sampling rate.

Digital recording. A method of recording video (or audio) in which samples of the original analog signal are encoded as binary information for storage and retrieval. Unlike analog recordings, digital video (or audio) can be copied repeatedly without degradation. Digital recording has pretty much replaced analog recording techniques for practically all image and sound applications.

Digitize. The process of taking analog audio and converting it to digital form. The term is often used synonymously with ingest or capture, which is the process of transferring a digital audio format into a non-linear editing system (it’s already digital, so you are simply capturing or ingesting, you’re not actually digitizing). See capture.

Directional characteristic. The variation in response at different angles of sound incidence.

Distortion. The addition of artifacts to the original audio signal appearing in the output which was not present in the input.

DME. Dialogue, Music, and Effects. A file with dialogue, music, and effects split into separate stems for foreign language dubbing or trailer editing.

Dolby 5.1. A six-channel (five speakers (Left, Center, Right, Left Surround, Right Surround) and one subwoofer for bass) digital surround sound system developed by Dolby Labs. See Dolby Digital.

Dolby Digital. A multi-channel audio format originally synonymous with Dolby AC-3, is the name for what has now become a family of audio compression technologies developed by Dolby Laboratories.. Widely used on industry movie releases. There are several variations of Dolby Digital. See Dolby 5.1.

Dolby Stereo. The analog predecessor to Dolby Digital.

Double-system sound. The technique of recording sound and image using separate recording devices. In film production this was the normal methodology since film cameras can’t record sound, however, it is often used in digital video production when mobility is required by the sound recordist who may want to avoid running wires to feed the video camera with the audio signal.

Down-conversion. Converting from a higher quality format to a lower one.

Drop-frame timecode. Timecode that is modified to remain in sync when 29.97 fps video is played back at 30 fps. In order to retain accuracy, the first two timecode frames of every minute are dropped, with the exception of every tenth minute. Note that only the timecode references are skipped; not the actual frames themselves. Drop-frame timecode is indicated with a semicolon before the frame component (01:00:00;00) or between every component (01;00;00;00).
See also non-drop frame timecode.

Dropout. A brief loss of signal that results in a “blank” area of video or audio, or adds excess noise to an image.

Dubbing. Adding extra voice tracks to a soundtrack in order to change lines or prepare the film for foreign markets. Can also mean the looping process or the process of copying an audio tape (when exporting digital files from a DAW, the term bounce is used more often).

Dynamic Range. The difference in decibels between the loudest and quietest portions of audio that a system is capable of processing.

Echo. A sound wave that has been reflected and returned with sufficient magnitude and delay as to be perceived as a wave distinct from the wave that was initially transmitted.
Effective output level. The sensitivity rating of a microphone defined as the ratio in dB of the power available relative to sound pressure.

Encode. The process of writing data or converting a signal to a different format. For example, L/R stereo may be encoded into MS stereo. See also decode.

ENG snake. A cable designed to connect the output of a field mixer to a video camera. It usually includes two channels of balanced audio, a headphone return, and a quick release connector on the camera end (thus it’s also know as a breakaway cable) in order to allow the camera to move independent of the cable when needed.

Envelope. The shape of the graph of an audio signal as amplitude is plotted against time. The envelope of a sound includes the attack, decay, sustain and release.
Envelopes include a sound’s attack, decay, sustain and release (ADSR). Envelope.

Environmental sound. General sounds at a low volume level coming from the action of a film which can be either synchronous or non-synchronous.

Equalization. The modification of specific ranges of sound frequencies for a specific purpose, e.g. to improving the clarity of speech or removing a frequency range with unwanted noise.

Foley. Creating sound effects by watching the picture and mimicking the action, often with props that do not exactly match the action but sound good. For example, walking on a bed of crushed stones in order to simulate walking on the ground.

Foley artist. A person who records sound effects using the foley process. Also known as foley walker.

Foreground music. See diegetic music.

Frame line. The line that designates the top of the frame. When using a boom microphone, the boom operator communicates with the camera operator to understand where the frame line is in order to avoid getting the boom in the shot.

Frequency. The number of times a signal vibrates per second, expressed in Hertz (Hz), which is the number of cycles per second.

Frequency response. The sensitivity of a given microphone or sound recording and playback system in terms of frequency and a variation, e.g. 20 to 15,000 Hz +/- 3 dB.

Gain. 1. In audio recording, how much the input signal level is increased, expressed in decibels (dB); 2. In audio post-production, how much the audio signal of a clip or audio track is adjusted, expressed in decibels (dB).

Gigabyte. 1 Billion bytes.

Gobo arm. A grip head mounted on the end of a ⅝” diameter, 30” long arm used as a device for holding sound blankets and other equipment. See Grip head, C-Stand.

Gobo head. See Grip head.

Grip arm. See Gobo arm.

Grip head. A fully rotatable, adjustable clamp usually mounted on the top of a C-Stand and used to support a Gobo arm, equipment, or a sound blanket. Its core component is a gobo head, which accepts the pin on a flag or a ⅝-in. gobo arm. See Gobo arm, C-Stand.

Handle. Extra material beyond the in and out points to allow an audio or video clip to be extended and provide additional material for transitions.

Harmonic distortion. Audio distortion characterized by undesirable changes between input and output at a given frequency.

Headroom. The “room” in the signal between the peaks and 0 dBFS (maximum signal level).

Hertz (Hz). A unit for specifying the frequency of a signal, formerly called cycles per second (cps).

High-pass filter. An electronic or software audio filter used to attenuate all frequencies below a chosen frequency, thus the name, “high pass.”

High-shelf filter. An audio filter that allows allows frequencies above a certain threshold to pass through while reducing lower ones.

Hiss. 1. Noise that is caused by normal imperfections in the surface of analog recording tape. Also known as asperity noise (literally, roughness noise). Often used to describe constant broad spectrum noise resembling hiss, but when splitting hairs, hiss actually refers to an analogue tape artifact. See Noise.

Impedance. As long as you stick with microphones and mixers designed for location sound recording and video production, you will not have to worry about impedance matching. The nominal load impedance for a microphone indicates the optimum matching load which utilizes the microphone’s characteristics to the fullest extent. Impedance is a combination of DC resistance, inductance and capacitance, which act as resistances in AC circuits. An inductive impedance increases with frequency; a capacitance impedance decreases with frequency. Either type introduces a change in phase.

Import. The process of transferring digital audio files from the storage media used by a recording device into a non-linear editing system. See also Capture.

Inductance. The resistance of a coil of wire to rapidly fluctuating currents which increases with frequency.

Intermodulation distortion. An amplitude change in which the harmonics (sum and difference tones) are present in the recorded signal.Inverse square law. Sound from a point source falls off inversely to the square of the distance. Or, put another way, if you double the sound source to microphone distance, you end up with only a 1/4th of the original sound energy.

J-Cut. An edit in which the in (or out) points of the video and audio are different. In the case of a “J” cut, the sound cut occurs before the picture cut. So named by the “J” pattern of the cut in the NLE timeline. This is often done to have audio lead the video, in other words, you hear some one start to talk before you see them. See L-Cut.

Jet. 1. To leave the set quickly after the shoot. 2. A type of aircraft that sometimes flies over the set in order to provide interesting sound problems.

Kilobyte. One thousand bytes. Actually 1,024 bytes because computer storage is measured using base 2 (binary) number system with each digit’s value based on a power of 2 (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1,024) rather than base 10 based on powers of 10 (1, 10, 100, 1,000) which is our everyday number system.

L-Cut. An edit in which the in (or out) points of the video and audio are different. In the case of a “L” cut, the sound cut occurs after the picture cut.  So named by the “L” pattern of the cut in the NLE timeline. This is often done to have picture lead the sound, in other words, you see some one= before you hear them. See J-Cut.

Lavalier (Lav). A small microphone designed to work attached to the actor or subject’s chest or placed near the neck. The can be placed over or under clothing. Because of their small size, when combined with a wireless system, they are excellent for shooting “walking and talking” actors or subjects. Don’t forget to pair them with a lavalier windscreen on a windy day.

Layback. Transfer of the finished audio mix back onto the edit master.

Leader. Extra information added to the beginning or the end of a program for technical purposes. A leader may include a countdown, information about the program, and a 2pop to aid audio synchronization.

Level. The ratio of an acoustic quantity to a reference quantity, usually a measurement of audio signal amplitude in decibels (dB).

Lip sync. Dialogue or narration that is precisely synchronized with the lip movements of a character or narrator on the screen. See Synchronization.

Location shooting. Filming in an actual setting with all sorts of noise problems, either outdoors or indoors, rather than in a quiet, controlled motion picture studio.

Log. 1. A record of start and end timecode, scene, shot, and take numbers, scene descriptions and other information for a specified clip. 2. Logarithmic (e.g. the Decibel (dB) scale is logarithmic).

Looping. The process of having actors dub lip-sync sound to scenes which have already been photographed. Also called ADR for automated dialog replacement or additional dialog recording. Called looping because in the old days a film loop of the scene would be put on the projector with cue marks on the film so the director and actor could see the scene while they were looping and multiple takes would be recorded.

Lossless. A compression scheme that results in no loss of data from the file when it is decompressed. Lossless files are generally quite large (but still smaller than uncompressed versions) and sometimes require considerable processing power in order to decode the data. The opposite of lossy compression.

Lossy. A compression scheme that discards data in order to lower file sizes. The opposite of lossless compression.

Lowpass filter. A filter that attenuates frequencies above a specified frequency and allows those below that point to pass.

Low-shelf filter. An audio filter that allows frequencies below a certain threshold to pass through while reducing higher ones.

Longitudinal timecode (LTC). Timecode recorded on one of the audio channels of video. It can only be read if the video file is playing at sound speed (e.g. 24, 25, or 30 fps)

M & E (Music and Effects). A file with music and effects split into separate stems for foreign language dubbing.

M-S (Mid-Side). A stereo microphone technique in which two microphone elements (a middle element with a cardioid or hyper-cardioid pattern and a side element with a bidirectional pattern) are incorporated into a special configuration for recording. Offers the advantage over other techniques in that it offers excellent mono compatibility without phase cancellation issues.

Masking. A phenomenon whereby one or more sounds “tricks” the ear into not hearing other, weaker, sound that is also present.

MB. Acronym for Megabytes; the equivalent of 1,024 bytes.

ME track (Music and Effects track). Refers to the music and effects tracks split apart from the dialogue tracks for use in dubbing (foreign language re-recording of a film or video).

Megabyte. 1 million bytes.

MIDI. Musical Instrument Digital Interface. A protocol for communicating between audio devices and musical instruments.

Mickey mousing. Creating music that mimics or reproduces a film’s visual action, as, for example, in many Walt Disney cartoons.

Mini connector. 1. A 1/8-in. TRS (Tip, Ring, Sleeve) connector that is typically used for connecting headphones to cameras and mixers, however, some mixers have a 1/4-in. TRS headphones connector, so it’s always good to have an adapter in your kit; 2. Some consumer cameras used a 1/8-in. TRS connector for microphone input. Sometimes these inputs provide 5V plug-in power, the consumer equivalent of phantom power.

Mix. To combine sound from two or more sources onto a single sound track. Also called sound mixing.

Mixer. See Sound mixer.

Monologue. A character speaking alone on screen or, without appearing to speak, articulating her or his thoughts in voice-over as an interior monologue.

MOS. Shooting image without recording sound. Lots of colorful stories have evolved in an attempt to explain the origin of this curious term: one story suggests that a famous Hollywood director from Germany used to say “mitt-out-sound” while other explanations are technically oriented, suggesting it means “minus optical stripe” (since some old sound recording systems recorded the audio signal as visual variations on light sensitive film), or it could simply mean “motion omit sound,” but no one really knows the origin of this term, or do they?

MP3. A compressed audio format suitable for distribution but not production. Original recordings should always be made with uncompressed formats (e.g. WAV). See AAC.

Musical. A film genre that incorporates song and dance routines into the film story. Also called musical film.

Narration. Information or commentary spoken directly to the audience rather than indirectly through dialogue, often by an anonymous “voice of god” off-screen voice. See voice-over.

MXF. Material eXchange Format. A professional cross-platform container format for video, audio and metadata.

NG (No Good). Commonly seen on camera, location sound, and editor reports to indicate a particular take is unusable.

Noise. 1. Electrical interference or other unwanted sound introduced into an audio system (i.e. hiss, hum, rumble, crosstalk, etc.). See also Hiss, Artifact.

Non-diegetic music. Music in a film which does not have an apparent source within story world. Often called background music. See Diegesis

Non-diegetic sound. Sound in a film which does not have an apparent source within story world.  See Diegesis.

Non-drop frame timecode (NDF). Timecode that counts every frame and does not compensate for the innacuracies that occur when 29.97 fps is converted to 30 for NTSC broadcast. See also drop frame timecode.

Non-linear editor (NLE). A video editing system characterized by digital storage and random access. Avid Media Composer, Adobe Premiere Pro, DaVinci Resolve, and Final Cut Pro X are examples of non-linear editors.

Non-synchronous sound. Sound whose source is not apparent in a film scene or which is detached from its source in the scene; commonly called off-screen sound. See synchronous sound.

Octave. The interval between two sounds having a basic frequency ratio of 2 to 1.

Off-screen sound. See non-synchronous sound.

OMF. Open Media Framework. A file format intended for transferring media between different software applications on different platforms. It is commonly used for transferring audio from a video editing system to a DAW.

Over-modulation. A sound signal with an intensity greater than the levels a system is designed to accept. Digital systems can’t tolerate over-modulation, when your audio is too loud it will sound like raspy unintelligible noise. Avoid over-modulating audio just like you avoid over-exposing video.

Petabyte. 1,000 Terabytes, or 1 million Gigabytes.

Phase. The timing relationship between two signals.

Phase shift. The displacement of a waveform in time. When various frequencies are displaced differently, distortion occurs. Cancellation of the signal may occur when two equal signals are out of phase. Severe shifting can cause dead spots.

Phantom power. A method of powering the amplifier in condenser microphones by sending the voltage through the audio cable in a manner that does not interfere with the audio signal. Most professional cameras, mixers, and recorders provide the option of supplying +48V phantom power to microphones.. Often found in two flavors, +12V and +48V. See also Plug-In Power.

Phono plug. See RCA connector.

Pick-up pattern. A polar diagram showing how a microphone responds to sounds from various directions. Usually these diagrams also show how directionality varies based on the frequency of the sound. Common patterns include: omnidirectional, cardioid, hyper-cardioid, super-cardioid, and shotgun (lobar).

Pink noise. An audio test signal that has an equal amount of energy per octave or fraction of an octave.

Pitch. The frequency of audible sound.

Playback. A technique of filming music action that involves playing the music through loudspeakers while performers sing, dance, play instruments, etc.

Plug-in Power. A consumer version of Phantom Power, delivering +5V to the microphone over unbalanced lines. See Phantom Power.

Poly WAV. Multi-channel BWF files that contain extra metadata identifying the channels, etc. Many multi-channel portable recorders, such as the Sound Devices MixPre series and others, generate poly WAV files. NLEs and DAWs used in professional production are able to import Poly WAV files directly or to split them up into their component mono WAV files.

Post Production. The final stage of the filmmaking process, normally involving picture editing, dialogue editing, sound effects editing, foley work, sound design, visual effects and outputting the moving image work to a format suitable for release.

Post Production Coordinator. An assistant to the Post Production Supervisor who focuses on logistical aspects such as scheduling, budgeting and ensuring the smooth operation of the Post Production department.Post Production Supervisor. The person in charge of the entire Post Production department. They are in charge of seeing that the director’s requirements are met on time and on budget, and liaise with vendors such as optical houses and sound facilities.

Post Production. The phase in a project that takes place after the production phase, or “after the production.” Included in post-production is picture editing, sound editing, scoring, sound effects editing, sound design, motion graphics, titles, color correction, sound mix, mastering, etc

Post-synchronized sound. Sound added to images after they have been photographed and assembled, sometimes called dubbing.

Preamplifier. A device for boosting the strength of a weak signal. Mixers and recorders usually take in microphone level signals and route the signal to a preamplifier to boost the signal prior to analogue to digital conversion.

Production sound. Audio recorded on location during a shoot. Typically recorded using a dedicated digital recorder (double system) or directly to the video camera (single system). This is in contrast to ADR, foley and audio created by the Sound Designer during post production. See Single system, Double system.

Production values. A nebulous term used to describe the visual quality or professional look of a movie. A significant component of production value is the quality of the sound.

QuickTime. Cross-platform video compression software developed by Apple and used extensively for the exchange of time-based media files. Supports multiple tracks and formats, including audio, video, and timecode.

RCA connector. A connector widely used for consumer line-level audio interconnects. Typically color coded as white (or black or grey) for audio channel 1 (left), and red for audio channel 2 (right). In most cases, cables with RCA connectors are interchangeable. Also known as a phono plug, as this connector is used for analog turntable interconnects as well.

Reference tone. An audio tone of fixed frequency and amplitude that occurs at the beginning of a tape or reel, allowing the operator to set the correct audio level for playback.

Re-recording. The process of mixing all audio in a production for mono, stereo or multichannel output.

Resolution. The amount of data used to make up a digital video or audio file, specified as the number of pixels (for video) or the number of sample bits or bit depth (for audio).

Reverberation. The presence of additional sound in a recording due to repeated reflections from walls, ceilings, floors, objects, etc. Reverberation is very challenging to eliminate in post-production. This is in contrast to an echo, where there is generally only one surface reflecting the sound and the echoed sound is much clearer. See Sound blankets.

RMS. Root-Mean-Square. A measurement of sound pressure.

Room tone. Background sound recorded on set for the purpose of enabling the seamless modification and removal of audio in post production. See also Ambient noise.

Run and gun. A style of video and audio production that is fast, unpredictable, and often involves covering action in multiple locations in a short amount of time. A great deal of documentary and broadcast journalism is done in this manner.

Sampling frequency. The number of sample measurements taken from an analog signal in a given period of time. These samples are then converted into numerical values stored in bytes to create the digital signal.

Score. Original music composed specifically for a film and usually recorded after the film has been edited.

Selective sound. A sound track that selectively includes or deletes specific sounds.

Shotgun. The term used to describe an interference tube (thus the name) microphone with a lobar-super-cardioid pickup pattern. Typically used for recording dialog outdoors and in environments with high ambient noise levels due to their rejection of off-axis sounds. For recording dialog in quiet setting, hyper-cardoid microphones provide better sound, since interference tubes not only reject off axis sounds, but also color these sounds.

Sibilance. Exaggerated hissing in voice patterns.

Signal. The variation over time of a wave whereby information is conveyed in some form which could be acoustic information (vibrations in air) or electronic voltages (representing sound).

Signal-to-Noise Ratio (S/N). The ratio of the desired signal to unwanted noise in an audio or video recording system.

Single System Sound. A method of recording sound and picture on the same device, this is often done on lower budget video productions. See Double System Sound.

Slate. 1. A device used to place an identifier in front of the camera at the beginning of a take. When shooting double system sound in the days of film, the clapping motion and the clapping sound was used to synchronize the audio to the picture in post production. Timecode slates (that display the time of day or the timecode of the audio recorder) are often used for backup when automated methods of synching picture and audio fail (which requires decent audio recording on the camera to match the sound recording when timecode is not being used). 2. A good roofing material that can last well over a hundred years and will never become part of the landfill problem.

Snake. 1. A multi-channel audio cable intended for use with microphone and/or line level signals. See ENG snake. 2. A producer who does not treat their crew honestly and with respect.

Sound bridge. Sound which continues across two shots that depict action in different times or places, thus providing an audio transition between the two scenes.

Sound Designer. A sound specialist responsible for the development of all sound materials in a film or video production and ultimately in charge of the entire sound production.  Can also refer to a person responsible for creating unique sounds from scratch.

Sound effects (SFX). Any sound in a film that’s not dialogue, narration, or music. A recorded or electronically processed sound that matches the visual action taking place onscreen in some interesting, creative manner.

Sound mixer. 1. A device for taking multiple sound inputs and routing them to (typically) a stereo output bus. May include signal processing features like a limiter. 2. Another term for sound recordist, see Sound Recordist.

Sound recordist.  The person responsible for recording sound on location, they determine the right microphones to use and how to place them. They sometimes work in conjunction with a boom operator, on smaller productions the sound recordist and boom operator are one.

Soundtrack. 1. The music contained in a film. 2. The entire audio portion of a film, including dialog, effects, and ambience.Source music. See background music.

Speed of sound. Sound travels through air at about 770 miles per hour, which varies depending on ambient temperature and air pressure.

Spotting. The process of analyzing a movie to map out the sound, music and visual effects work to be performed. In scoring and sound effects editing the process of spotting is used to identify the specific scenes or points where music cues or effects cues take place.

Standing waves. A deep sound in a small room caused by low frequency (long waves) with short reflection patterns.

Stem. A stem is a separate audio output for a group of tracks. In a DME separate outputs are created for dialogue, music and sound effects. See DME.

Stereo. Sound recorded on separate tracks with two or more microphones and played back on two or more loud speakers to reproduce and separate sounds more realistically.

Sustain. The amplitude of a sound or musical note while it is being held. This occurs after the attack and decay phases.

Surround. Multi-channel audio corresponding to multiple speakers positions.

Sweetening. Enhancing the sound of a recording or particular sound effect with equalization or other signal processing techniques.

Synchronization. A precise match between image and sound. Also called sync.

Synchronous sound (sync sound). 1. Recording sound in synchronization with recording image. Can be single or double system. In single system sound recording the camera records sound and image, with double system sound recording, the camera is used to record images and a separate sound recorder is used to record sound. 2. Sound whose source is apparent and matches the action in a scene. See non-synchronous sound.

Terabyte. One trillion bytes. Equivalent to a heaping amount of video or an insane amount of audio.

Timecode. An indexing system that provides a unique index for each frame of audio or video, in the form hh:mm:ss:ff. This makes it easy to locate and reference a particular frame in video or a point in the audio track or tracks that match the respective frame.

Timeline. A visual representation of a movie over time, consisting of video clips laid horizontally across the screen. This is a common interface in a digital audio workstation (DAW) or a non-linear video editing system (NLE).

Track. A separate audio or video layer on a timeline.

Tracking. The initial recording of individual tracks of music to be mixed together later. It’s really just another word for recording.Unbalanced line. A transmission line with a signal conductor plus a ground, which is prone to noise and interference. See balanced line:

Underscore. Music that provides atmospheric or emotional background to the primary narration or dialog.

Variable Bit Rate (VBR). A compression method in which the amount of compression is varied to allow for minimum degradation of sound quality in scenes that are more difficult to compress. For example, when encoding MP3 audio, you can choose to encode it as VBR or CBR. See Constant Bit Rate.

Voice-over (VO). A term used to describe off-camera narration that is not part of a scene (non-diegetic).

VU meter. A meter designed to measure analog audio level in volume units which generally correspond to perceived loudness. VU meters do not show peaks, peaks are typically indicated with a separate peak indicator. Still found on professional analog recorders and some consumer gear evoking the retro look. Digital meters behave in a totally different manner. The meters in Premiere Pro show both VU (solid bars) and peaks (thin bars that stay on momentarily and then fade).

WAV. A lossless digital file format suitable for production sound. See also Broadcast WAV, Poly WAV.

Wave. A regular variation in signal level or sound pressure level.

Walla. Background ambience or noises added to create the illusion of sound taking place outside of the main action in a picture.

White noise. A signal having an equal amount of energy per Hertz.

Wild line. A non-sync line of dialogue recorded by an actor without picture.

Wild track. A non-sync sound effect recorded without picture.

Windshield. A device placed over a microphone that reduces the effect of wind noise on the microphone. There are two main types of windshields, modular systems and integral slip-on systems. A  modular system (often called a blimp or zeppelin) consists of a flexible grey plastic netting tube (thus the name) with a screening material and a suspension system for the microphone (e.g. Rycote Modular Windshield). A furry synthetic fur cover, often called a windjammer, can be placed over the zeppelin for additional wind noise attenuation. In documentary and ENG applications one-piece slip on windshields consisting of a cellular foam base surrounded by synthetic fur are quite popular (e.g. Rycote Softie Windshield). The foam wind screen that comes with most microphones is only good to prevent wind noise due to movement of the microphone, outdoor shooting requires a windshield. Furry slip on systems or windjammers are sometimes called a dead cat. Some folks refer to a blimp’s windjammer attachment as a Wookie since they are typically larger than dead cats.

Windjammer. See Windshield.

Wild sound. Audio elements that are not recorded synchronously with the picture. It’s a good idea to record wild sound wherever you go. These wild tracks of the environment can be used to build ambient sound beds or fix audio problems in dialog when you need to fill gaps of empty track.

X-Y Pattern. A pair of cardioid microphones or elements aimed in crossed directions which feed two channels for stereo pickup.

XLR. A widely used connector for professional sound applications typically having three conductors for mono connections (or five conductors for stereo connections) plus an outer shell which shields the connectors and locks it in place. The connectors either consist of pins (a.k.a. “M”) or sockets (a.k.a. “F”). Microphones and mixer outputs usually have pin connectors while mixer inputs and camera inputs have sockets. XLR connectors are designated with the pin count and whether they are plugs (“M”) or sockets (“F”), e.g. XLR-3M.

Zeppelin. See Windshield.

This glossary was last updated on September 23, 2021.