October 5, 2009
NAB.org   |   Technical Resources  

 

Lip Sync Revisited - Solutions Are in Sight

Previous editions of TV TechCheck have reported on the problem of audio-video synchronization, commonly known as "lip sync," where noticeable errors are seen on television channels distributed on terrestrial broadcasting, cable and satellite. There are various reasons for this, including the introduction of complex digital processing for audio and video that can introduce differential delays for audio and video in production, post and distribution systems.

Standards organizations, including ITU, ATSC, SMPTE, EBU, IEC, AES, and SCTE, have been studying the problem for many years, and there are several standards and recommended practices for tolerances that should be adopted for different parts of the broadcast chain. The ITU-R BT 1359 recommendation states that the threshold of detectability of lip sync errors is about +45 ms to -125 ms (audio early to audio late) and that the threshold of acceptability is about +90 ms to -185 ms, on the average. However, because of the uncertainty of synchronization that may exist in the source material, the ITU recommends that the timing difference in the path from the output of the final program source selection element to the input to the transmitter for emission should be kept within the values +22.5 ms and -30 ms. These distribution tolerance figures are comparable to the recommendations of the ATSC IS/191 finding, which gives a range of +15 ms and -45 ms at the inputs to the DTV encoding devices, while the EBU recommendation R37-2007 gives a range of +40 and -60 ms at the output intended to feed broadcasting transmitters.

In the above-referenced documents, the ITU states that if correction of errors is not possible, then each downstream segment that is not under the control of the broadcaster shall not introduce any timing error in excess of +/-2 ms, while the ATSC says that designers should strive for zero differential offset throughout the system, and the EBU recommends that the accuracy of synchronization at each stage should lie within the range of audio 5 ms early to 15 ms late. Some broadcasters, and others, have questioned the particular tolerance recommendations mentioned, but there is general agreement on the need to reduce errors to a small value at each step in the broadcast chain.

It has been known for some time that some consumer DTV receiver products can introduce lip sync errors due to incorrect decoding and handling of MPEG-2 signals. As reported in TV TechCheck of August 3, 2009, the CEA has addressed the problem (although only for new designs) with a recently published Recommended Practice on A/V Synchronization Implementation in Receivers. Apart from that publication, however, as yet, there are few recommendations from the standards organizations as to how the specified levels of synchronization should be achieved through the broadcast chain.

Progress with Solutions for Lip Sync Errors
A prerequisite for A-V sync correction is first to measure the error. It is comparatively straightforward to assess or measure lip sync errors in out-of-service paths using test signals, either of the "beep-flash" type, such as the VALID8™ system from Snell/Pro-Bel, or test signals such as the Visualizer™ Test Pattern from Sarnoff (first mentioned in TV TechCheck of October 29, 2007). It is, however, much more difficult to make such measurements in-service, with regular programming, but this is where monitoring and correction is most important. One current product that allows absolute measurements without affecting the program is the LipTracker™ system from Pixel Instruments, but this only works when lip movements are visible and is perhaps impractical to use for all programming. The Tektronix AVDC-100 product was briefly available to detect and correct changes over parts of the program chain using a watermark signal but it was for SD video only and has been discontinued. There have been other products that permit changes in A-V sync to be detected and corrected, including those from Sigma Electronics and K-WILL Corporation, these have been used successfully in facilities where both input and output signals are available at one point.

At the NAB Show 2009 earlier this year, Evertz introduced the IntelliTrak™ system, for in-service measurement and correction of changes in A-V synchronization that may occur within a facility or over multiple links in a broadcast chain. At IBC in September, Miranda introduced their Densite HLP-1801 product with similar capabilities. Both these systems are based on the concept of analyzing particular characteristics of the audio and video incoming program signals and generating an audio-video sync signature or fingerprint that can be transmitted as metadata to a downstream point in the program chain. At the downstream point, the audio and video signals are again analyzed and a new signature generated that can be compared with the signature from upstream. This comparison results in a measurement of the change in A-V delay, or lip sync error, between the two points, which can be used to delay the audio or video by the appropriate amount to correct the error. Dolby Laboratories have also been working on a similar technology and Kent Terry will present a paper Detection and Correction of Lip-Sync Errors Using Audio and Video Fingerprints at the SMPTE Technical Conference in Hollywood later in October. The figure below, taken from the Dolby paper, shows the basic principles involved and is used with permission. It should be noted that the Dolby system is still at the laboratory stage and is not an available product.

Real-Time A/V Signature System Block Diagram
(with acknowledgements to Dolby Laboratories)

Although there are some differences, it is believed that all these fingerprint-based systems have similar capabilities and the signature or fingerprint comparison will work even when the program material has been passed through multiple links in the chain, including addition of bugs and overlays, distribution as compressed bitstreams and other video and audio processing. The method by which the signature or fingerprint is carried from one point to another is not critical, and can be implemented as metadata embedded in the video signal (e.g., in baseband VANC) or through a separate network connection or via the Internet. The other common feature is that they all measure the change in A-V sync from the input reference point, and do not measure the absolute sync accuracy of video and audio. Coordination with upstream program sources is therefore still required to ensure a high degree of confidence that incoming signals have minimum A-V sync errors.

The Need for a Standard
As they have been developed independently, the algorithms used to produce the A-V sync signature or fingerprint, and the exact form of the metadata are almost certainly different in the three systems mentioned above and will require the equipment at different points in a broadcast chain to be from the same manufacturer.

In the world of broadcasting, however, the entire chain from production to station output is not usually under the control of the broadcaster and it is highly desirable that equipment from different manufacturers should interoperate when used at different points in the chain. Ideally, the form of the fingerprint signal, and its method of carriage should be standardized, such that the fingerprint generated by equipment from one manufacturer, say at a production studio, post house or network release center, will be able to be read by equipment from another manufacturer downstream, say at a broadcast station, and used as input to the delay measurement that can be used to correct any lip sync error detected, before transmission. The same techniques can be used by broadcasters to detect A-V errors introduced in their signals by downstream systems such as cable or satellite.

The SMPTE 22TV Lip Sync Ad Hoc Group, chaired by Graham Jones of NAB, has just started to investigate the possibility of producing a SMPTE standard for the fingerprint signal and/or the method(s) of metadata carriage. Potentially, this could require the manufacturers of existing products for these systems to cooperate to harmonize the fingerprint signals used in their products so they can interoperate. This is much more likely to happen if broadcasters, who are perhaps the main customers for such systems, make known their desire for a single standardized fingerprint system. For those with long memories, this situation has happened before, when pressure from customers resulted in Ampex and Sony modifying their similar, but slightly different, 1-inch VTR products to comform with what became the SMPTE Type C Standard, which for many years was the most widely used professional videotape format.

Readers of TV TechCheck who would like to support an initiative for a single A-V Sync Fingerprint standard are encouraged to contact Graham Jones at gjones@nab.org with their comments.

2010 NAB Show Call for Speakers

Call for Technical Papers – NAB Broadcast Engineering Conference

The 2010 NAB Show will host the 64th Broadcast Engineering Conference. This world-class conference addresses the most recent developments in broadcast technology and focuses on the opportunities and challenges that face broadcast engineering professionals. Each year hundreds of broadcast professionals from around the world attend the conference. They include practicing broadcast engineers and technicians, engineering consultants, contract engineers, broadcast equipment manufacturers, distributors, R&D engineers plus anyone specifically interested in the latest broadcast technologies.

Do you have something to share?
If you feel qualified to speak at the NAB Broadcast Engineering Conference, we invite you to submit a technical paper proposal. Not all acceptable submissions can be included in the conference, due to the large number of submissions that are received and the limited number of available time slots.

PLAN TO ATTEND!
The IEEE Broadcast Technology Society
59th ANNUAL BROADCAST SYMPOSIUM

October 14-16, 2009
The Westin Alexandria
Alexandria, VA, USA
www.ieee.org/bts/symposium

2009 ATSC Seminar on Audio Loudness
Wednesday, November 4, 2009
Wiley Rein Conference Center
1776 K St, NW
Washington, DC 20006
http://www.atsc.org/seminars/loudness09.php
Cost for ATSC members is $50.00 for pre-registrants, $75.00 on-site.


 

The October 5, 2009 TV TechCheck is also available in an Adobe Acrobat file.
Please click here to read the Adobe Acrobat version of TV TechCheck.