Reading notes of Rufail's dissertation（Part II）

Chapter 6 Time Varying 3D Point Cloud Compression

The previous chapters developed compression and transmission techniques for 3D tele-immersive communications based on 3D Meshes. The codecs developed addressed the range from near lossless to lossy. The next step that is taken in this chapter is to develop compression for an alternative representation: 3D Point Clouds. 3D Point Clouds are similar to meshes but do not contain connectivity information. This makes them slightly easier to acquire and process than 3D meshes. This chapter focusses on the problems of lossy coding using inter predictive coding, exploiting redundancy between frames. In addition, it develops a technique for lossy coding of point cloud color attributes based on their spatial correlation.

The architecture design of the proposed 3D Video based on point clouds combines features from common 3D Point cloud (octree based) codecs and common hybrid video codecs (Figure 7).

Figure 7 Schematic of time-varying point cloud compression codec

The core part of this scheme is the intra frame coder and the inter frame coder.

Intra frame coder

扫描二维码关注公众号，回复： 3316420 查看本文章

The intra frame coder consists of three stages (1, 2 and 3 in Figure 7). It first filters outliers to remove the impact of the background/foreground points existing in the original cloud and computes a bounding box which is used as the root level of the octree. The bounding box can change from frame to frame. As a consequence, the correspondence of octree voxel coordinates between subsequent frames is lost, which makes inter-prediction much harder. So an approach is illustrated aiming to reduce bounding box changes between adjacent frames (Figure 8).

Figure 8 Bounding box alignment scheme

The scheme enlarges the bounding box with a certain percentage δ, and then, if the bounding box of the subsequent frame fits this bounding box, it can be used instead of the original bounding box. Otherwise, a new bounding box will be calculated

Second, the scheme performs an octree composition of space which is similar with that used in chapter 4. Only non-empty voxels are sub divided further and the decoder only needs the occupancy codes to reconstruct each LoD (Level of Detail) of the point cloud (Figure 9).

Figure 9 Octree Composition of Space

Third, the color of each point is coded by introducing a method based on existing legacy JPEG codec that exploits correlation in the point cloud reconstructed from natural inputs. Instead of mapping the octree directly to a graph and treating the color attributes as a graph signal, this method maps the color attributes directly to a structured JPEG image grid from a depth first tree traversal based on a zig-zag pattern (Figure 10).

Figure 10 Scan line pattern for writing octree values to an image grid

Inter frame coder

To perform inter-frame encoding between subsequent point clouds, this section presents an inter-frame prediction algorithm that re-uses the intra frame coder in combination with a novel lossy prediction scheme based on iterative closest point algorithm (ICP). All abbreviations used to describe the algorithms are given in Table 3.

Table 3 Symbols used for predictive encoding

Figure 11 outlines the inter-predictive coding algorithm. The algorithm codes the data in the P frame in two parts: C_i which contains the vertices that could not be predictively coded and a C_p with data that could be predicted well from the previous frame. The algorithm starts with the normalized and aligned I and P clouds. The Macroblocks M_i are generated at level K above the final LoD of the octree. In the next step each of the macroblocks M_p is traversed to find if a corresponding macroblock exists in M_i. If it is, it will be a candidate for the prediction. Two extra conditional stages are traversed before the predictive coding of the block is started. First, the number of points is asserted to be in the range of the two corresponding Macroblocks in M_i and M_p and second a check is done on the color variance of the points in the macroblock. The algorithm only performs inter-prediction in areas of the point cloud that have low color/texture variance. Those points which don’t satisfy the condition above will be written to a point cloud data structure and coded with the intra coding algorithm. For those blocks which is suitable for inter frame coding, the prediction is performed based on computing a rigid transform between the blocks mapping the points M_i to M_p. This computation is based on the iterative closest point algorithm, which only takes the geometric positions into account and not the colors.

The encoding of the rigid transformation T is as follows. It is first composed as a rotation matrix R, and a translation vector t. The rotation matrix R is converted to a quaternion q_or(s,t,u,v) and quantized using a quaternion quantization scheme of only 3 numbers Quat1, Quat2, Quat3 (16 bits per number). The translation vector t is quantized using 16 bits per component (T1, T2, T3).

Last, a color offset is optionally coded to compensate for color difference offsets between the corresponding macroblocks due to changes in lighting that have resulted in a brightness difference.

The position in M_p is stored as a key (x,y,z) k using 16 bit integer values for each component . This key can be used to decode the predicted blocks directly from the previously decoded octree frame M_i in any order. This random access indexing method also enables parallel encoding/decoding of the data which is important for achieving real-time performance. The memory outline of the data field is presented in Table 4.

Figure 11 Inter Predictive Point Cloud Coding Algorithm

Table 4 Data Structure of an inter coded block of data

This chapter presents a hybrid architecture for point cloud compression that combines intra frame coding and inter frame coding. The intra frame coding consists of typical octree based point cloud compression and a lossy real-time color encoding method based on legacy JPEG methods. The inter frame coding mainly uses the inter-prediction scheme to reduce the data size. This enables easy implementation and real-time performance. A subjective evaluation is performed which shows that the degradation introduced by the codecs is negligible.

Chapter 7 3D Tele-Immersive Streaming Engine

This chapter uses the codecs and transmission schemes to develop the overall streaming engine that can satisfy the need for stream setup between multiple sites/ multiple users. Figure 12 shows a simple outline of a modular 3D tele-immersive system architecture such as in the REVERIE platform.

Figure 12 Modular 3D immersive Architecture with Terminal Scalability,

end terminals load different types of modules

Application A loads natural user data (module A), audio capture (Capture B) and a module to render incoming natural user streams. User B, that can only render natural users (it is a passive user), can request the natural user stream from user A, but user A will not request any such streams from user B. In this combination, this modular architecture could allow terminal scalability from passive, very light users to users with powerful capture and render capabilities.

To address the limitation of traditional streaming transportation, the framework illustrated in Figure 13 is developed that can handle different types of incoming and outgoing media streams (UDP/TCP) representing 3D data and real-time messaging.

Figure 13 Data Streaming Framework Implementation

The architecture which is introduced in this chapter can be useful to 3D immersive communications in tele-immersive frameworks. For example, the API of the framework includes support for synchronization via a distributed virtual clock. Meanwhile, buffering of 3D audio and mesh data has led to approximate synchronization of audio and different 3D frames.

Conclusion

This thesis presents a real-time tele-immersive system which provides people with shared experiences, in particular, mixed or tele-immersive virtual reality. It is initiating a convergence between virtual and real world, which is fascinating for people like me. Some specific approaches are introduced in the thesis to deal with the problems which block the way to realizing real-time mixed reality. To be honest, Rufail’s work impressed me a lot.