Tag: VLSI

 
+

A Multiplication-Free Algorithm and A Parallel Architecture for Affine Transformation

Affine transformation is widely used in image processing. Recently, it is recommended by MPEG-4 for video motion compensation. This paper presents a novel low power parallel architecture for texture warping using affine transformation (AT). The architecture uses a novel multiplication-free algorithm that employs the algebraic properties of the AT. Low power has been achieved at different levels of the design. At the algorithmic level, replacing multiplication operations with bit shifting saves the power and delay of using a multiplier. At the architecture level, low power is achieved by using parallel computational units, where the latency constraints and/or the operating latency can be reduced. At the circuit level, using low power building blocks (such as low power adders) contributes to the power savings. The proposed architecture is used as a computational kernel in video object coders. It is compatible with MPEG-4 and VRML standards. The architecture has been prototyped in 0.6 μm CMOS technology with three layers of metal. The performance of the proposed architecture shows that it can be used in mobile and handheld applications.

 

Wael Badawy and Magdy Bayoumi, “A Multiplication-Free Algorithm and A Parallel Architecture for Affine Transformation,” The Journal of VLSI Signal Processing-Systems, Kluwer Academic Publishers, Vol. 31, No 2, May 2002, pp. 173-184.

+

Architectures for Finite Radon Transform

Two VLSI architectures for the finite Radon transform are presented. The first is a reference architecture using memory blocks and the second is a memoryless architecture. The proposed architectures use 7×7 size image blocks and are prototyped for processing the CIF image sequence. The simulation and synthesis results show that the core speeds of the two proposed architectures are around 100 and 82 MHz, respectively.

Published in:

Electronics Letters  (Volume:40 ,  Issue: 15 )

 

C. A. Rahman and W. Badawy, “Architectures for Finite Radon Transform“, The IEE Electronics Letters, Vol. 40, Issue 15, July 2004, pp. 931-932.

 

+

Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking

The new VLSI architecture for video object (VO) motion tracking uses a novel hierarchical adaptive structured mesh topology. The structured mesh offers a significant reduction in the number of bits that describe the mesh topology. The motion of the mesh nodes represents the deformation of the VO. Motion compensation is performed using a multiplication-free algorithm for affine transformation, significantly reducing the decoder architecture complexity. Pipelining the affine unit contributes a considerable power saving. The VO motion-tracking architecture is based on a new algorithm. It consists of two main parts: a video object motion-estimation unit (VOME) and a video object motion-compensation unit (VOMC). The VOME processes two consequent frames to generate a hierarchical adaptive structured mesh and the motion vectors of the mesh nodes. It implements parallel block matching motion-estimation units to optimize the latency. The VOMC processes a reference frame, mesh nodes and motion vectors to predict a video frame. It implements parallel threads in which each thread implements a pipelined chain of scalable affine units. This motion-compensation algorithm allows the use of one simple warping unit to map a hierarchical structure. The affine unit warps the texture of a patch at any level of hierarchical mesh independently. The processor uses a memory serialization unit, which interfaces the memory to the parallel units. The architecture has been prototyped using top-down low-power design methodology. Performance analysis shows that this processor can be used in online object-based video applications such as MPEG-4 and VRML

Wael Badawy and Magdy Bayoumi, “Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking,” The IEEE Transaction on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2002, pp. 227-237

+

A Proposed Hardware Reference Model for Spatial Transformation and Quantization in H.264,

 

This paper presents three Very Large Scale Integration prototypes to exploit spatial redundancy in the H.264 standard. The proposed architectures are: (1) forward 4 × 4 integer approximation of DCT transform and quantization, which is applied to all blocks of a frame, (2) the 4 × 4 Hadamard transform and quantization that is applied to the DC coefficients of the luma component when the macroblock is encoded in 16 × 16 intra prediction mode, and (3) the 2 × 2 Hadamard transform and quantization that is applied to the DC coefficients of the chroma component as a second level in the transformation hierarchy. The developed algorithms are adopted by the H.264 standard. A performance analysis shows that the architectures satisfy the real-time constraints required by different digital video applications.

 

I. Amer, W. Badawy, G. Jullien, “A Proposed Hardware Reference Model for Spatial Transformation and Quantization in H.264,” Elsevier Journal of Visual Communication and Image Representation, Volume 17, Issue 2, April 2006, Pages 533-552.

+

CAVLC Encoder Design for Real-Time Mobile Video Applications

Abstract

This brief presents a new context-based adaptive variable length coding (CAVLC) architecture. The prototype is designed for the H.264/AVC baseline profile entropy coder. The proposed design offers area savings by reducing the size of the statistic buffer. The arithmetic table elimination technique further reduces the area. The split VLC tables simplify the process of bit-stream generation and also help in reducing some area. The proposed architecture is implemented on Xilinx Virtex II field-programmable gate array (2v3000fg676-4). Simulation result shows that the architecture is capable of processing common/quarter-common intermediate format frame sequences in real-time at a core speed of 50 MHz with 6.85-K logic gates.

Published in:

Circuits and Systems II: Express Briefs, IEEE Transactions on  (Volume:54 ,  Issue: 10 )

C. A. Rahman and W. Badawy, “CAVLC Encoder Design for Real-time Mobile Video Applications”, The IEEE Trans. on Circuits and Systems II, Oct. 2007 Vol 54, Issue: 10, pp. 873-877.
Link to the list of other Peer Journal Publications

+

A Simplified 8×8 Transformation And Quantization Real-Time Ip-Block For Mpeg-4 H.264/Avc Applications: A New Design Flow Approach

Abstract

Current multimedia design processes suffer from the excessively large time spent on testing new IP-blocks with references based on large video encoders specifications (usually several thousands lines of code). The appropriate testing of a single IP-block may require the conversion of the overall encoder from software to hardware, which is difficult to complete in the short time required by the competition-driven reduced time-to-market demanded for the adoption of a new video coding standard. This paper presents a new design flow to accelerate the conformance testing of an IP-block using the H.264/AVC software reference model. An example block of the simplified 8 × 8 transformation and quantization, which is adopted in FRExt, is provided as a case study demonstrating the effectiveness of the approach.

To Download A SIMPLIFIED 8 × 8 TRANSFORMATION AND QUANTIZATION REAL-TIME IP-BLOCK FOR MPEG-4 H.264/AVC APPLICATIONS: A NEW DESIGN FLOW APPROACH

 

Ihab Amer, Wael Badawy, Graham Jullien, Marco Mattavelli, And Robert Turney, “A Simplified 8×8 Transformation And Quantization Real-Time Ip-Block For Mpeg-4 H.264/Avc Applications: A New Design Flow Approach,” Journal of Circuits, Systems, and Computers Vol. 16, No. 6 (2007) 1011–1026

Link to the list of other Peer Journal Publications

+

A Multiplication-Free Algorithm and A Parallel Architecture for Affine Transformation

Affine transformation is widely used in image processing. Recently, it is recommended by MPEG-4 for video motion compensation. This paper presents a novel low power parallel architecture for texture warping using affine transformation (AT). The architecture uses a novel multiplication-free algorithm that employs the algebraic properties of the AT. Low power has been achieved at different levels of the design. At the algorithmic level, replacing multiplication operations with bit shifting saves the power and delay of using a multiplier. At the architecture level, low power is achieved by using parallel computational units, where the latency constraints and/or the operating latency can be reduced. At the circuit level, using low power building blocks (such as low power adders) contributes to the power savings. The proposed architecture is used as a computational kernel in video object coders. It is compatible with MPEG-4 and VRML standards. The architecture has been prototyped in 0.6 μm CMOS technology with three layers of metal. The performance of the proposed architecture shows that it can be used in mobile and handheld applications.

 

Wael Badawy and Magdy Bayoumi, “A Multiplication-Free Algorithm and A Parallel Architecture for Affine Transformation,” The Journal of VLSI Signal Processing-Systems, Kluwer Academic Publishers, Vol. 31, No 2, May 2002, pp. 173-184.

+

Architectures for Finite Radon Transform

Two VLSI architectures for the finite Radon transform are presented. The first is a reference architecture using memory blocks and the second is a memoryless architecture. The proposed architectures use 7×7 size image blocks and are prototyped for processing the CIF image sequence. The simulation and synthesis results show that the core speeds of the two proposed architectures are around 100 and 82 MHz, respectively.

Published in:

Electronics Letters  (Volume:40 ,  Issue: 15 )

 

C. A. Rahman and W. Badawy, “Architectures for Finite Radon Transform“, The IEE Electronics Letters, Vol. 40, Issue 15, July 2004, pp. 931-932.

 

+

Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking

The new VLSI architecture for video object (VO) motion tracking uses a novel hierarchical adaptive structured mesh topology. The structured mesh offers a significant reduction in the number of bits that describe the mesh topology. The motion of the mesh nodes represents the deformation of the VO. Motion compensation is performed using a multiplication-free algorithm for affine transformation, significantly reducing the decoder architecture complexity. Pipelining the affine unit contributes a considerable power saving. The VO motion-tracking architecture is based on a new algorithm. It consists of two main parts: a video object motion-estimation unit (VOME) and a video object motion-compensation unit (VOMC). The VOME processes two consequent frames to generate a hierarchical adaptive structured mesh and the motion vectors of the mesh nodes. It implements parallel block matching motion-estimation units to optimize the latency. The VOMC processes a reference frame, mesh nodes and motion vectors to predict a video frame. It implements parallel threads in which each thread implements a pipelined chain of scalable affine units. This motion-compensation algorithm allows the use of one simple warping unit to map a hierarchical structure. The affine unit warps the texture of a patch at any level of hierarchical mesh independently. The processor uses a memory serialization unit, which interfaces the memory to the parallel units. The architecture has been prototyped using top-down low-power design methodology. Performance analysis shows that this processor can be used in online object-based video applications such as MPEG-4 and VRML

Wael Badawy and Magdy Bayoumi, “Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking,” The IEEE Transaction on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2002, pp. 227-237

+

A Proposed Hardware Reference Model for Spatial Transformation and Quantization in H.264,

 

This paper presents three Very Large Scale Integration prototypes to exploit spatial redundancy in the H.264 standard. The proposed architectures are: (1) forward 4 × 4 integer approximation of DCT transform and quantization, which is applied to all blocks of a frame, (2) the 4 × 4 Hadamard transform and quantization that is applied to the DC coefficients of the luma component when the macroblock is encoded in 16 × 16 intra prediction mode, and (3) the 2 × 2 Hadamard transform and quantization that is applied to the DC coefficients of the chroma component as a second level in the transformation hierarchy. The developed algorithms are adopted by the H.264 standard. A performance analysis shows that the architectures satisfy the real-time constraints required by different digital video applications.

 

I. Amer, W. Badawy, G. Jullien, “A Proposed Hardware Reference Model for Spatial Transformation and Quantization in H.264,” Elsevier Journal of Visual Communication and Image Representation, Volume 17, Issue 2, April 2006, Pages 533-552.