Lindos Articles

Digital Audio

Digital Sampling

Return to article list

Digital sampling, 'PCM sampling', or just 'sampling' is the process of representing a signal waveform as a series of numbers which represent the measurement of the signal's amplitude, taken at regular intervals. This process, also commonly referred to as PCM is widely used in modern audio and video systems, including television and telephone networks.

Strictly speaking, the process of sampling must be regarded as separate from the process of digitising. Sampling produces a series of values which may be represented in various ways - the output from the process can be a series of analog pulses Pulse-height modulation or a series of fixed amplitude pulses Pulse position modulation or Pulse-width modulation. Most commonly though, the samples are represented by binary numbers, in a process known as PCM, an acronym for Pulse-code modulation, because they are then amenable to storage and processing in digital systems such as computers.

The basic theory of digital sampling in relation to audio and video is widely misrepresented, based quite simply on the totally wrong idea that the samples are the signal. This misconception is understandable, given that it is indeed possible to listen to digital samples directly (as is done in some cheap players) or to view video samples directly (as is done in most standard (non-HD) televisions) but this must be regarded as a 'cheap and cheerful' approach, and misses out a vital component of basic sampling theory - the Reconstruction filter.

While it is obvious to anyone that sampling an original waveform and then presenting the sample values as joined-up segments will produce an approximation to the original, especially if the original only changes slowly between samples, this is not what sampling is really about.

What Nyquist realised was that if an original signal is filtered (band-limited) to remove all frequencies above what we call the Nyquist frequency, then it is possible to reproduce the exact (band-limited) waveform by processing the samples in a 'reconstruction filter which is simply another low-pass filter with a cut-off frequency equal to the Nyquist frequency. There is no approximation, no distortion, what goes in comes out, apart from any components above the Nyquist frequency.

Errors resulting from the Nyquist limitation

This is only literally true if the two filters employed are 'brick-wall filters', in other words they cut off totally above the Nyquist frequency. Even if such filters were realisable in practice basic theory says that they would have infinite delay - they would take forever to produce any output. This must not be seen as an obstacle to perfect reproduction though. By designing with a 'Guard band' it is possible to use imperfect filters to obtain output that is as accurate as we care to make it (within the bandwidth limitation).

Quantising errors resulting from the process of digitisation

In digital sampling, the accuracy of the resulting waveform is also affected by the stepwise nature of the digitising process, resulting in what is referred to as 'Quantisation error. This error, which occurs from sample to sample, is not necessarily random, but may be correlated with the signal, producing serious audible distortion in audio systems that do not take steps to eliminate it. Some early CD's suffered from Quantising distortion which was especially audible on quiet piano notes, adding a granular noise that sounded like 'sand in the speakers'. It could also be heard as spurious tones accompanying higher frequencies. Quantising distortion soon became a thing of the past though, with a better understanding of the process of 'Dither' which involved adding a low level of noise to the signal before sampling in order to randomise the individual sample errors and hence 'de-correlate' the resultant errors from the signal, so that all that was heard was noise (hiss).

Digital sampling in Audio

Audio waveforms are commonly sampled at 44.1k samples/s (CD) or 48k samples/s (professional audio). CD's use 16-bit digital representation, and would sound 'granular' because of the quantising noise, were it not for the addition of a small amount of noise to the signal before digitisation, known as 'dither'. Adding dither eliminates this granularity, and gives very low distortion, but at the expense of a small increase in noise level. Measured using ITU-R 468 noise weighting, this is about 66dB below alignment level, or 84dB below FS (full scale) digital, which is somewhat lower than the microphone noise level on most recordings, and hence of no consequence (see ‘Programme Levels’ for more on this).

Optimising dither waveforms

In a seminal paper published in the AES Journal Lipschitz and Vanderkoy pointed out that different noise types, with different probability density functions (PDF's) behave differently when used as dither signals, and suggested optimal levels of dither signal for audio. Gaussian noise requires a higher level for full elimination of distortion than Rectangular PDF or triangular PDF noise. Triangular PDF noise has the advantage of requiring a lower level of added noise to eliminate distortion and also minimising 'noise modulation'. The latter refers to audible changes in the residual noise on low level music that are found to draw attention to the noise.

Noise shaping for lower audibility

An alternative to dither is noise shaping, which involves a feedback process in which the final digitised signal is compared with the original, and the instantaneous errors on successive past samples integrated and used to determine whether the next sample is rounded up or down. This smooths out the errors in a way that alters the spectral noise content. By the neat device of inserting a weighting filter in the feedback path the spectral content of the noise can be shifted to areas of the 'equal-loudness contours' where the human ear is least sensitive, producing a lower subjective noise level (-68/-70dB typically ITU-R 468 weighted).

24-bit and 96kHz pro-audio formats

24-bit audio does not require dithering, the noise level of the digital convertor being far higher in practise than the required level of any dither that might be applied.

The recent trend towards higher sampling rates, at two or four times the basic requirement, has not been justified theoretically, or shown to make any audible difference, even under the most critical listening conditions, but nevertheless a lot of 96kHz equipment is now used in studio recording, and 'super-audio' formats are being promised to consumers, mostly as a DVD option. Most articles purporting to justify a need for more than 16-bits and 48kHz state that the 'dynamic range' of 16-bit audio is 96dB, a figure commonly derived from the simple ratio of quantising level to full-scale level, which is 2 to the power 16 (65536). This calculation fails to take into account the fact that peak level is not maximum permitted sine-wave signal level, and quantising step size is not rms noise level, and even if it were it would not represent loudness, without the application of the ITU-R 468 weighting function. A proper analysis of typical programme levels throughout the audio chain reveals the fact that the capabilities of well engineered 16-bit recording far exceed those of the very best hi-fi systems, with the microphone noise and loudspeaker headroom being the real limiting facors.

There is however, a good case for 24-bit recording in the live studio, because it enables greater headroom (often 24dB or more rather than 18dB) to be left on the recording without compromising noise. This means that brief peaks are not harshly clipped, but can be compressed or soft-limited later to suit the final medium.

Digital sampling in Video

8-bit digitising is widely used for video, giving 256 shades of brightness. The eye is far less sensitive to noise than the ear, and the use of three 8-bit colour signals, for red, green and blue, results in an enormous number of possible colour variations helping to further increase resolvable level differences. Dither is not usually used, as the slight 'contour' effect produced by quantising is less objectionable than added noise would be, and there is no equivalent of distortion to which the eye is at all sensitive, a non-linear 'gamma law' usually being inserted to in effect compress the range of brightess levels since no projector or screen is capable of anything approaching realistic levels for a bright scene. 10 or 12-bits are currently used in the best professional video equipment, giving a signal that is less likely to be degraded when passed through multiple stages of processing as happens in production and TV broadcasting.

Standard definition (SD) video uses 720 (actually 702?) wide by 576 pixels (UK PAL 625-line) for the visible picture area, the figure of 625 referring to the number of broadcast lines, some of which are allocated to teletext and other data.

High definition (HD) video is currently moving towards two standards referred to as 720p (progressive scan) and 1080i (interlaced), which all 'HD-Ready' sets will be able to display.

Reconstruction filtering is also needed for video

Most TV's do not achieve basic SD quality - because they do not reconstruct the vertically sampled image properly. Digital video produces a 2-dimensional set of samples of each frame, which requires a 2-dimensional 'brick-wall' reconstruction filter for proper reproduction of the image. CRT displays produce a raster scan of horizontal lines, and the digital signal is low-pass filtered along the horizontal lines, giving good resolution of vertical lines without aliasing, but reconstruction is not usually attempted vertically, so that the resulting picture contains very visible artifacts (loss of resolution, staircasing effects, fringing pattern, sampling harmonics etc).

Proper 2-dimensional reconstruction requires a final display with many more pixels than the signal format, and modern HD sets can provide this, producing much better resolution pictures than even a top studio monitor can from SD signals (though they are not so good regarding grey-level accuracy especially near black level).

As with audio, this theoretical need for reconstruction is not commonly realised, though it was recognised by the BBC who then backed off from broadcasting HD but started to record programmes in HD.

To get a true HD image you really need a 'super HD' display, with at least twice as many pixels again (3840 x 2160)!! Worth bearing in mind though not currently practical. Nevertheless, HD does a very significant increase in resolution over SD when both are compared on a HD television, the higher Nyquist frequency bringing improvements despite the fact that the image is not properly reconstructed on currently available displays.

By Pete Skirrow


Articles Links

Like Lindos / Share

MiniSonic Mic Kit

Lindos has launched a new range of microphones and accessories - bringing clarity and presicion to your music and video recordings.

Find out more!

MS20 Testimonial

"In short, I love it. Extremely useful; wonderfully succinct -- I've been looking for a tool like this for 20 years."

Phillip Sztenderowicz,
Technical Department,
Sterling Sound

LA100 Carry Cases

New padded carry case for the LA100 is now available.

LA100 Carry Case