Tag: Motion estimation

Towards an H.264/AVC HW/SW Integrated Solution: An Efficient VBSME Architecture

Abstract:

This paper presents an efficient real-time variable block size motion estimation architecture. The proposed architecture provides motion vectors for each 16 times 16 block and its 40 sub-blocks. The proposed architecture is a single-instruction multiple-data architecture integrated with embedded SRAMs on one chip. The architecture has been prototyped using Xilinx Virtex-4 XC4VSX35-10 field-programmable gate array. It processes 30-CIF fps using 71-MHz clock frequency. Its maximum clock frequencyuency is 187.7 MHz and the maximum throughput is 20 4CIF fps. The prototyped architecture has 175 k gates and 18 kbits embedded SRAM.

Published in:

Circuits and Systems II: Express Briefs, IEEE Transactions on (Volume:55 , Issue: 9 )

Page(s):: 912 – 916
ISSN :: 1549-7747
INSPEC Accession Number:: 10185530
DOI:: 10.1109/TCSII.2008.923398

Date of Publication :: 23 May 2008
Date of Current Version :: 29 August 2008
Issue Date :: Sept. 2008
Sponsored by :: IEEE Circuits and Systems Society
Publisher:: IEEE

Mohammed Sayed, Wael Badawy, and Graham Jullien, “Towards an H.264/AVC HW/SW Integrated Solution: An Efficient VBSME Architecture”, IEEE Transactions on Circuits and Systems II, Volume: 55, Issue: 9, pp. 912-916, Sept. 2008.

Link to the list of other Peer Journal Publications

Originally posted 2018-03-21 05:37:00.

WaelBadawy November 8, 2018 block size motion estimation, data compression, frequency 187.7 MHz, frequency 71 MHz, H.264, H.264/AVC HW/SW, Motion estimation, single-instruction multiple-data (SIMD) architecture, SRAM, telecommunication standards, variable block size motion estimation (VBSME), VBSME, video codecs, video coding, Xilinx Virtex-4 XC4VSX35-10 field-programmable gate array Journal Papers Comments Off

An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation

Abstract

A new integrated programmable platform architecture is presented, with the support of multiple accelerators and extensible processing cores. An advanced application for this architecture is to facilitate the implementation of H.264 baseline profile video codec. The platform architecture employs the novel concept of virtual socket and optimized memory access to increase the efficiency for video encoding. The proposed architecture is mapped on an integrated FPGA device, Annapolis WildCard-II™ or WildCard-4™, for verification. According to the evaluation under different configurations, the results show that the overall performance of the architecture, with the integrated accelerators, can sufficiently meet the real-time encoding requirement for H.264 BP at basic levels, and achieve about 2–5.5 and 1–3 dB improvement, in terms of PSNR, as compared with MPEG-2 MP and MPEG-4 SP, respectively. The architecture is highly extensible, and thus can be utilized to benefit the development of multi-standard video codec beyond the description in this paper.

Yifeng Qiu, Wael Badawy and Robert Turney, “An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation” Journal of Signal Processing Systems, Volume 57, Number 2 / November, 2009, 123-137.

Link to the list of other Peer Journal Publications

Originally posted 2018-03-20 05:13:00.

WaelBadawy November 8, 2018 Accelerator, Architecture, CAVLC, DCT/Q, Deblocking, H.264/AVC, IDCT/Q-1, Motion estimation, Multi-core, Video codec, Virtual socket Journal Papers Comments Off

A Low Power VLSI Architecture for Mesh-based Video Motion Tracking

This paper proposes a low-power very large-scale integration (VLSI) architecture for motion tracking. It uses a hierarchical adaptive structured mesh that generates a content-based video representation. The proposed mesh is a coarse-to-fine hierarchical two-dimensional mesh that is formed by recursive triangulation of the initial coarse mesh geometry. The structured mesh offers a significant reduction in the number of bits that describe the mesh topology. The motion of the mesh nodes represents the deformation of the video object. The architecture consists of motion estimation and motion compensation units. The motion estimation architecture generates a progressive mesh code and the motion vectors of the mesh nodes. It reduces the power consumption, uses a simpler approach for mesh construction, approximates the mesh nodes motion vector by using the three step search algorithm and uses a parallel motion estimation core to evaluate the mesh nodes motion vectors. Moreover, it maximizes the lifetime of the internal buffers. The motion compensation architecture uses a multiplication-free algorithm for affine transformation, which significantly reduces the complexity of the motion compensation architecture. Moreover, using pipelined affine units contributes to the power savings. The video motion compensation architecture processes a reference frame, mesh nodes and motion vectors to predict a video frame. It implements parallel threads in which each thread implements a pipelined chain of scalable affine units. This motion compensation algorithm allows the use of one simple warping unit to map a hierarchical structure. The affine unit warps the texture of a patch at any level of hierarchical mesh independently. The processor uses a memory serialization unit, which interfaces the memory to the parallel units. The architecture has been prototyped using top-down low-power design methodology. The performance analysis shows that this processor can be used in online object-based video applications such as in MPEG and VRML.

Published in:

Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on (Volume:49 , Issue: 7 )

Page(s):: 488 – 504
ISSN :: 1057-7130
INSPEC Accession Number:: 7460367
DOI:: 10.1109/TCSII.2002.805248

Date of Publication :: Jul 2002
Date of Current Version :: 10 December 2002
Issue Date :: Jul 2002
Sponsored by :: IEEE Circuits and Systems Society
Publisher:: IEEE

Wael Badawy and Magdy Bayoumi, “A Low Power VLSI Architecture for Mesh-based Video Motion Tracking,” The IEEE Transactions on Circuits and Systems II, Vol. 49, July 2002, pp. 488-504.

WaelBadawy September 25, 2018 Affine transformation, CMOS digital integrated circuits, CMOS VLSI circuits, coarse-to-fine hierarchical two-dimensional mesh, content-based video representation, digital signal processing chips, Energy consumption, Geometry, hierarchical adaptive structured mesh, image representation, image sequences, internal buffer lifetime, Large scale integration, low power VLSI architecture, low-power electronics, memory serialization unit, mesh construction, mesh generation, mesh node motion, mesh topology, mesh-based video motion tracking, Motion compensation, motion compensation architecture, Motion estimation, motion estimation architecture, motion vectors, MPEG, multiplication-free algorithm, online object-based video applications, parallel motion estimation core, parallel threads, patch texture warping, Performance analysis, pipeline processing, pipelined affine units, power consumption, progressive mesh code, recursive triangulation, reference frame processing, three step search algorithm, top-down low-power design methodology, Topology, Tracking, Very large scale integration, video coding, video frame, video object deformation, video signal processing, VRML, warping unit, Yarn Journal Papers Comments Off

A VLSI Architecture for Video Object Motion Estimation Using a 2D Hierarchical Mesh Model

This paper proposes a novel hierarchical mesh-based video object model and a motion estimation architecture that generates a content-based video object representation. The 2D mesh-based video object is represented using two layers: an alpha plane and a texture. The alpha plane consists of two layers: (1) a mesh layer and (2) a binary layer that defines the object boundary. The texture defines the object’s colors. A new hierarchical adaptive structured mesh represents the mesh layer. The proposed mesh is a coarse-to-fine hierarchical 2D mesh that is formed by recursive triangulation of the initial coarse mesh geometry. The proposed technique reduces the mesh code size and captures the mesh dynamics. The proposed motion estimation architecture generates a progressive mesh code and the motion vectors of the mesh nodes. The performance analysis for the proposed video object representation and the proposed motion estimation architecture shows that they are suitable for very low bit rate online mobile applications and the motion estimation architecture can be used as a building block for MPEG4 codec.

Wael Badawy, “A VLSI Architecture for Video Object Motion Estimation Using a 2D Hierarchical Mesh Model,” Microprocessors and Microsystems, Vol. 27, No. 3, April 2003, pp 131 – 140, invited.

WaelBadawy September 24, 2018 2D hierarchical mesh-based video object model, DTV, Motion estimation, MPEG, multimedia, Video object plane, VLSI architecture Journal Papers Comments Off

A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part I Motion Estimation,

This paper presents a new Computational-RAM (C-RAM) architecture for real-time mesh-based video motion tracking. The motion tracking consists of two operations: mesh-based motion estimation and compensation. The proposed motion estimation architecture is presented in Part 1 and the proposed motion compensation architecture is presented in Part 2. The motion estimation architecture stores two frames and computes motion vectors for a regular triangular mesh structure as defined by MPEG-4 Part 2.¹ The motion estimation architecture uses the block-matching algorithm (BMA) to estimate the vertical and horizontal motion vectors for each mesh node. Parallel and pipelined implementations have been used to overcome the huge computational requirements of the motion estimation process. The two frames are stored in embedded S-RAMs generated with Virage™ Memory Compiler. The proposed motion estimation architecture has been prototyped, simulated and synthesized using the TSMC 0.18 μm CMOS technology. At 100 MHz clock frequency, the proposed architecture processes one CIF video frame (i.e., 352×288 pixels) in 1.48 ms, which means it can process up to 675 frames per second. The core area of the proposed motion estimation architecture is 24.58 mm² and it consumes 46.26 mW.

Mohammed Sayed and Wael Badawy, “A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part I Motion Estimation,” Journal of Circuits, Systems and Computers, Vol. 13, Issue 6, December 2004, pp. 1203-1216.

WaelBadawy September 16, 2018 Computational-RAM, mesh-based architectures, Motion estimation, video coding Journal Papers Comments Off

Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking

The new VLSI architecture for video object (VO) motion tracking uses a novel hierarchical adaptive structured mesh topology. The structured mesh offers a significant reduction in the number of bits that describe the mesh topology. The motion of the mesh nodes represents the deformation of the VO. Motion compensation is performed using a multiplication-free algorithm for affine transformation, significantly reducing the decoder architecture complexity. Pipelining the affine unit contributes a considerable power saving. The VO motion-tracking architecture is based on a new algorithm. It consists of two main parts: a video object motion-estimation unit (VOME) and a video object motion-compensation unit (VOMC). The VOME processes two consequent frames to generate a hierarchical adaptive structured mesh and the motion vectors of the mesh nodes. It implements parallel block matching motion-estimation units to optimize the latency. The VOMC processes a reference frame, mesh nodes and motion vectors to predict a video frame. It implements parallel threads in which each thread implements a pipelined chain of scalable affine units. This motion-compensation algorithm allows the use of one simple warping unit to map a hierarchical structure. The affine unit warps the texture of a patch at any level of hierarchical mesh independently. The processor uses a memory serialization unit, which interfaces the memory to the parallel units. The architecture has been prototyped using top-down low-power design methodology. Performance analysis shows that this processor can be used in online object-based video applications such as MPEG-4 and VRML

Wael Badawy and Magdy Bayoumi, “Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking,” The IEEE Transaction on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2002, pp. 227-237

WaelBadawy September 5, 2018 2D mesh video-object motion tracking, Affine transformation, block matching, computational complexity, decoder architecture, decoding, image matching, image texture, low power VLSI architecture, memory serialization, mesh generation, Motion compensation, Motion estimation, MPEG-4, multiplication-free algorithm, online object-based video applications, optical tracking, parallel processing, parallel threads, pipeline processing, pipelined chain, power consumption, structured mesh topology, video coding, VLSI, VRML Journal Papers Comments Off

A Computational Memory Architecture for MPEG-4 Applications with Mobile Devices

This paper presents a Computational Memory architecture for MPEG-4 applications with mobile devices. The proposed architecture is used for real-time block-based motion estimation, which is the most computational intensive task in the video encoder. It uses the exhaustive block-matching algorithm (EBMA) for motion estimation. The proposed architecture consists of embedded SRAMs and a number of block-matching units working in parallel to process video data while stored in the memory. The block-matching units access the embedded SRAMs simultaneously, which increases the speed of the architecture.

The architecture processes CIF format video sequences (i.e., the frame size is 352 × 288 pixels) with block size of 16 × 16 pixels and ±15 pixels search range. The proposed architecture has been designed, prototyped, and simulated for 0.18 μm TSMC CMOS technology. The simulation shows that the proposed architectures processes up to 126 CIF frames per second with clock frequency 100 MHz. The synthesized prototype of the proposed architecture includes 200 KB memory and it has an area of 33.75 mm2 and consumes 986.96 mW @100 MHz.

Mohammed Sayed , Wael Badawy, “A Computational Memory Architecture for MPEG-4 Applications with Mobile Devices,” Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology – Special Issue on Digital and Computational Video , Vol. 42, No. 1, pp. 35-42, January 2006.

WaelBadawy August 13, 2018 computational memory, Motion estimation, MPEG-4 Journal Papers Comments Off

An Affine Based Algorithm and SIMD Architecture for Video Compression with Low Bit-rate Applications

This paper presents a new affine-based algorithm and SIMD architecture for video compression with low bit rate applications. The proposed algorithm is used for mesh-based motion estimation and it is named mesh-based square-matching algorithm (MB-SMA). The MB-SMA is a simplified version of the hexagonal matching algorithm [1]. In this algorithm, right-angled triangular mesh is used to benefit from a multiplication free algorithm presented in [2] for computing the affine parameters. The proposed algorithm has lower computational cost than the hexagonal matching algorithm while it produces almost the same peak signal-to-noise ratio (PSNR) values. The MB-SMA outperforms the commonly used motion estimation algorithms in terms of computational cost, efficiency and video quality (i.e., PSNR). The MB-SMA is implemented using an SIMD architecture in which a large number of processing elements has been embedded with SRAM blocks to utilize the large internal memory bandwidth. The proposed architecture needs 26.9 ms to process one CIF video frame. Therefore, it can process 37 CIF frames/s. The proposed architecture has been prototyped using Taiwan Semiconductor Manufacturing Company (TSMC) 0.18-μm CMOS technology and the embedded SRAMs have been generated using Virage Logic memory compiler.

Published in:

Circuits and Systems for Video Technology, IEEE Transactions on (Volume:16 , Issue: 4 )

Page(s):: 457 – 471
ISSN :: 1051-8215
INSPEC Accession Number:: 8891917
DOI:: 10.1109/TCSVT.2006.872780

Date of Publication :: April 2006
Date of Current Version :: 01 May 2006
Issue Date :: April 2006
Sponsored by :: IEEE Circuits and Systems Society
Publisher:: IEEE

Back to a complete list of Peer-Reviewed Journal Papers

Mohammed Sayed , Wael Badawy, “An Affine Based Algorithm and SIMD Architecture for Video Compression with Low Bit-rate Applications“, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, Issue 4, pp. 457-471, April 2006. Abstract

WaelBadawy August 12, 2018 Affine transformation, affine transforms, affine-based algorithm, Bandwidth, Bit rate, CMOS integrated circuits, CMOS technology, Computational efficiency, Computer architecture, data compression, hexagonal matching algorithm, image matching, low bit rate, low bit-rate applications, mesh-based (MB) video coding, mesh-based motion estimation, mesh-based square-matching algorithm, Motion estimation, parallel architectures, peak signal-to-noise ratio, Prototypes, PSNR, Random access memory, SIMD architecture, triangular mesh, video coding, Video compression Journal Papers Comments Off

Towards an H.264/AVC HW/SW Integrated Solution: An Efficient VBSME Architecture

Abstract:

Published in:

Circuits and Systems II: Express Briefs, IEEE Transactions on (Volume:55 , Issue: 9 )

Page(s):: 912 – 916
ISSN :: 1549-7747
INSPEC Accession Number:: 10185530
DOI:: 10.1109/TCSII.2008.923398

Date of Publication :: 23 May 2008
Date of Current Version :: 29 August 2008
Issue Date :: Sept. 2008
Sponsored by :: IEEE Circuits and Systems Society
Publisher:: IEEE

Link to the list of other Peer Journal Publications

WaelBadawy July 18, 2018 block size motion estimation, data compression, frequency 187.7 MHz, frequency 71 MHz, H.264, H.264/AVC HW/SW, Motion estimation, single-instruction multiple-data (SIMD) architecture, SRAM, telecommunication standards, variable block size motion estimation (VBSME), VBSME, video codecs, video coding, Xilinx Virtex-4 XC4VSX35-10 field-programmable gate array Journal Papers Comments Off

An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation

Abstract

Link to the list of other Peer Journal Publications

WaelBadawy July 17, 2018 Accelerator, Architecture, CAVLC, DCT/Q, Deblocking, H.264/AVC, IDCT/Q-1, Motion estimation, Multi-core, Video codec, Virtual socket Journal Papers Comments Off

Dr. Wael Badawy

From Idea to Innovation

Tag: Motion estimation

Towards an H.264/AVC HW/SW Integrated Solution: An Efficient VBSME Architecture

Published in:

An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation

Abstract

A Low Power VLSI Architecture for Mesh-based Video Motion Tracking

Published in:

A VLSI Architecture for Video Object Motion Estimation Using a 2D Hierarchical Mesh Model

A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part I Motion Estimation,

Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking

A Computational Memory Architecture for MPEG-4 Applications with Mobile Devices

An Affine Based Algorithm and SIMD Architecture for Video Compression with Low Bit-rate Applications

Published in:

Towards an H.264/AVC HW/SW Integrated Solution: An Efficient VBSME Architecture

Published in:

An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation

Abstract