International organisation for standardisation organisation internationale de normalisation

Download 0.57 Mb.
Date conversion08.07.2018
Size0.57 Mb.
  1   2



ISO/IEC JTC1/SC29/WG11 N13364

Jan 2013, Geneva, Swiss


Communication Group




White Paper on State of the Art in compression and transmission of 3D Video

3D Video Displays and Applications

The primary usage scenario for 3D video is to support 3D video applications, where 3D depth perception of a visual scene is provided by a 3D display system. There are many types of 3D display systems including classic stereo systems that require special-purpose glasses to more sophisticated multiview auto-stereoscopic displays that do not require glasses [8]. This section provides a summary of displaying technology, a more comprehensive review of 3D display technologies can be found in [13].

Stereoscopic displays are a most commonly used type of 3D video displays. Such systems require two views (stereo video), where a left-eye view is presented to the viewer's left eye, and a right eye view is presented to the viewer's right eye. The 3D display technology and glasses ensure that the appropriate signals are viewed by the correct eye. This is accomplished with either passive polarization or active shutter techniques.

Since depth perception is known to be dependent on such factors as display size and/or viewing distance, same stereoscopic content being viewed in different viewing environment may provide a different level of depth perception [15]. In some cases it can be desirable or even required to adjust depth perception in automatic mean or through an interaction with the end user. The multiview displays have much greater data throughput requirements relative to conventional stereo displays in order to support a given picture resolution, since 3D is achieved by essentially emitting multiple complete video sample arrays in order to form view-dependent pictures. Such displays can be implemented, for example, using conventional high-resolution displays and parallax barriers; other technologies include lenticular overlay sheets and holographic screens. Each view dependent video sample can be thought of as emitting a small number of light rays in a set of discrete viewing directions – typically between eight and a few dozen for an autostereoscopic display. Often these directions are distributed in a horizontal plane, such that parallax effects are limited to the horizontal motion of the observer.

3D Video Data Formats

This section describes the various representation formats for 3D video and discusses the merits and limitations of each in the context of stereo and multiview systems.

Stereoscopic and Multi-View Video

A 3D scene can be represented in a traditional video-only data format. In such format, video information from each view point is captured at the full spatial resolution with a temporally synchronized camera array. Resulting representation consists of a separate video sequence with full resolution for each viewing point and the required data rate for captured raw data is practically multiplied by the number of captured views. An example of such representation for 2‑view case (stereo) is shown in Fig. 1.

Fig. 1: An example of representation of 3D scene with stereo video at full spatial resolution [10].

Frame Compatible 3D Video Formats

Frame compatible formats refer to a class of stereo video formats in which the two stereo views are essentially multiplexed into a single coded frame or sequence of frames, i.e., the left and right views are packed together in the samples of a single video frame. In such a format, half of the coded samples represent the left view and the other half represent the right view. Thus, each coded view has half the resolution of the full coded frame.

There are a variety of options available for how the packing can be performed. For example, each view may have half horizontal resolution or half vertical resolution. The two such half-resolution views can be interleaved in alternating samples of each column or row, respectively, or can be placed next to each other in arrangements known as the side-by-side and top-bottom packings (see Fig. 2). The top-bottom packing is also sometimes referred to as over-under packing. Alternatively, a "checkerboard" (quincunx) sampling may be applied to each view, with the two views interleaved in alternating samples in both the horizontal and vertical dimensions (as also shown in Fig. 2).

Temporal multiplexing is also possible. In this approach, the left and right views would be interleaved as alternating frames or fields of a coded video sequence. These formats are referred to as frame sequential and field sequential. The frame rate of each view may be reduced so that the amount of data is equivalent to that of a single view.

Fig. 2: Common frame-compatible formats where ‘x’ represents the samples from one view and ‘o’ represents the samples from the other view.

The primary benefit of frame-compatible formats is that they facilitate the introduction of stereoscopic services through existing infrastructure and equipment. Representing the stereo video in a way that is maximally compatible with existing encoding, decoding and delivery infrastructure is the major advantage of this format. The video can be compressed with existing encoders, transmitted through existing channels, and decoded by existing receivers [12]. However, legacy devices designed for monoscopic content may not recognize the format and may therefore display the frame-packed video (e.g. both views side by side).

Service-compatible 3D Video Formats

Service-compatible 3D video format refers to a class of stereoscopic video format in which one of the two stereoscopic views has full resolution which can be used for legacy 2D devices, whereas the other view may have the same resolution as the base view, or 3/4, 2/3, 1/2 resolution of base view (see Fig. 3). In this format, left and right views are transmitted as an independent video elementary stream. The benefit of the service-compatible 3D video formats is that it can be consumed simultaneously by both legacy 2D device and 3D device with high 3D video quality.

Fig. 3: Example of Service-compatible 3D video format

Depth-Enhancement Video Formats

3D scene video representation can be enhanced with supplementary, data e.g. depth information. Such information being available at display side can enable the generation of virtual views through depth-based image rendering techniques [14], and would facilitate a deployment of auto-stereoscopic multiview displays [9][10] and/or stereoscopic displays with adjustable depth perception.

The ISO/IEC 23002-3 (also referred to as MPEG-C Part 3) specifies the representation of auxiliary 2D video and supplemental information, e.g. in the form of a 2D plus depth format (see Fig. 4). In particular, it enables signaling for depth map streams to support 3D video applications.

Fig. 4: Visualization of 2D plus depth format concept.

However, the 2D plus depth format was found to only enable virtual view rendering within a limited viewing angle and not able to handle occlusions and holes resulting from rendering other views. In addition, stereo or multiview signals are not supported by this format, i.e., receivers would be required to generate the second view from the 2D video plus depth data for a stereo display, which is not the convention in existing displays.

To overcome the drawbacks of the 2D plus depth format, a multiview video plus depth format with a limited number of original input views and associated per pixel depth has been introduced, as shown in Fig. 5.

Fig. 5: Multiview video plus depth format for 2 views (MVD2).

Video and range data available from multiple viewing angles allows more sophisticated virtual view rendering algorithms and also provide more information for filling occlusion and/or holes when rendering novel views.

  1   2

The database is protected by copyright © 2016
send message

    Main page