Category: Journal Papers
A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part II Motion Compensation
This paper presents a new Computational-RAM (C-RAM) architecture for real-time mesh-based video motion tracking. In Part 1, the motion estimation part of the proposed architecture is presented. Here in Part 2, a new C-RAM mesh-based motion compensation architecture is presented. The input data to the architecture is the mesh nodes motion vectors and the reference frame and the output data is the compensated (i.e., predicted) frame. The architecture uses the affine transformation for warping the deformed patches in the reference frame into the undeformed patches in the current frame. The architecture computes the affine parameters using a multiplication-free algorithm. The reference and current frames are stored in embedded S-RAMs generated with Virage™ Memory Compiler. The proposed motion compensation architecture has been prototyped, simulated and synthesized using the TSMC 0.18 μm CMOS technology. Using 100 MHz clock frequency, the proposed architecture processes one CIF video frame (i.e., 352×288 pixels) in 0.59 ms, which means it can process up to 1694 frames per second. The core area of the proposed motion compensation architecture is 28.04 mm2 and it consumes 31.15 mW.
Mohammed Sayed and Wael Badawy, “A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part II Motion Compensation,” Journal of Circuits, Systems and Computer, Vol. 13, Issue 6, December 2004, pp. 1217-1232.
A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part I Motion Estimation,
This paper presents a new Computational-RAM (C-RAM) architecture for real-time mesh-based video motion tracking. The motion tracking consists of two operations: mesh-based motion estimation and compensation. The proposed motion estimation architecture is presented in Part 1 and the proposed motion compensation architecture is presented in Part 2. The motion estimation architecture stores two frames and computes motion vectors for a regular triangular mesh structure as defined by MPEG-4 Part 2.1 The motion estimation architecture uses the block-matching algorithm (BMA) to estimate the vertical and horizontal motion vectors for each mesh node. Parallel and pipelined implementations have been used to overcome the huge computational requirements of the motion estimation process. The two frames are stored in embedded S-RAMs generated with Virage™ Memory Compiler. The proposed motion estimation architecture has been prototyped, simulated and synthesized using the TSMC 0.18 μm CMOS technology. At 100 MHz clock frequency, the proposed architecture processes one CIF video frame (i.e., 352×288 pixels) in 1.48 ms, which means it can process up to 675 frames per second. The core area of the proposed motion estimation architecture is 24.58 mm2 and it consumes 46.26 mW.
Mohammed Sayed and Wael Badawy, “A Computational RAM (C-RAM) Architecture for Real-Time Mesh-Based Video Motion Tracking: Part I Motion Estimation,” Journal of Circuits, Systems and Computers, Vol. 13, Issue 6, December 2004, pp. 1203-1216.
A new time distributed DCT architecture for MPEG-4 hardware reference model
This paper presents the design of a new time distributed architecture (TDA) which outlines the architecture (ISO/IEC JTC1/SC29/WG11 MPEG2002/M8565) submitted to MPEG4 Part9 committee and included in the ISO/IEC JTC1/SC29/WG11 MPEG2002/9115N document. The proposed TDA optimizes the two-dimensional discrete cosine transform (2-D-DCT) architecture performance. It uses a time distribution mechanism to exploit the computational redundancy within the inner product computation module. The application specific requirements of input, output and coefficients word length are met by scheduling the input data. The coefficient matrix uses linear mappings to assign necessary computation to processor elements in both space and time domains. The performance analysis shows performance savings in excess of 96% as compared to the direct implementation and more than 71% as compared to other optimized application specific architectures for DCT.
Alam, M.; Badawy, W.; Jullien, G.; “A new time distributed DCT architecture for MPEG-4 hardware reference model,” IEEE Circuits and Systems for Video Technology, Volume 15, Issue 5, May 2005, pp. 726 – 730.
Review of Principles of verifiable RTL design
By Lionel Bening and Harry Foster, Kluwer Academic Publishers, 2000.
Using verifiable RTL design, an engineer can add or improve the use of cycle-based simulation, two-state simulation, formal equivalence checking, and model checking in the traditional verification flow. Furthermore, a verifiable RTL coding methodology permits the engineer to achieve greater verification coverage in minimal time, enhances cooperation and support for multiple EDA tools within the flow, clarifies RTL design intent, and facilitates emerging verification processes.
This book addresses verification of synchronous designs. It provides a comprehensive understanding of various verification processes from conceptual and practical approaches. The concepts presented in this book are drawn from author experience with large-scale system design projects. It draws a technique methodology for verifiable RTL coding. The book is divided into nine chapters as follows. Chapter 1 provides a short introduction of this book. Chapter 2 introduces four principles of RTL design (fundamental verification principle, retain useful information principle, orthogonal verification principle, and functional observation principle) and issues related to verifiable RTL (design specification, test strategies, coverage analysis, event monitoring, and assertion checking). Chapter 3 introduces the basics of the RTL methodology and addresses the problem of complexity due to competing tool coding requirements. It introduces a simplified and tool-efficient Verilog RTL verifiable subset using an object-oriented hardware design (OOHD) methodology. Moreover, it details a linting methodology, which is used to enforce project-specific coding rules and tool performance checks. Chapter 4 presents the history of logic simulation, followed by a discussion on applying RTL simulation at various stages within the design phase. Chapter 5 discusses RTL and the formal verification process. It presents the concept of finite state machine FSM and its analysis and applicability to proving machine equivalence and FSM properties. Chapter 6 discusses ideas on verifiable RTL style. Chapter 7 provides examples on the common mistakes that are involved with projects, designers, and EDA verification tool developers. Chapter 8 presents a tutorial on Verilog language elements that can be used to build a verifiable RTL model. Chapter 9 summarizes the 21 fundamental principles of verifiable RTL Design, which are discussed throughout the book.
This book is considered one of the milestones for verifiable RTL design. It shows an efficient methodology for writing a verifiable RTL, and it defines guidelines for large-scale systems. I believe that every engineer working in the area of RTL design should read this book.
Wael Badawy, “Principles of verifiable RTL design“, IEEE Circuits and Devices Magazine, Vol. 18, Issue 1, January 2002, pp. 26 -27
A Co-design Methodology for High-Performance Real-time Systems
Wael M. Badawy, Ashok Kumar and Magdy A. Bayoumi “A Co-design Methodology for High-Performance Real-time Systems” The Canadian Journal on Electrical and Computer Engineering, Vol. 26, July/October 2001, pp. 141-146.
System On Chip: Trends and Challenges
The increase in the number of transistors that can be integrated on a single chip allows the integration of more functions. On the other hand, time-to-market pressures require novel techniques for developing integrated circuits. System on chip is a methodology that allows the integration of several third-party cores with an embedded processor. This paper presents a tutorial for the system- on-chip methodology and presents the design tasks that are involved in developing a system on chip.
L’accroissement du nombre de transistors qu’il est possible d’inte ́grer sur une puce permet d’offrir plus de fonctionnalite ́s. D’autre part, les pressions de la mise en marche ́ rapide de celles-ci exige l’e ́laboration de techniques nouvelles de de ́veloppement de circuits inte ́gre ́s. Les syste`mes sur une puce repre ́sentent une me ́thodologie de de ́veloppement qui permet l’inte ́gration de com- posantes provenant de plusieurs de ́veloppeurs et de les combiner a` un processeur embarque ́. Cet article pre ́sente un tutoriel sur la me ́thodologie de conception de circuits sur une puce et pre ́sente les taˆches de design implique ́es dans le de ́veloppement de tels systemes.
Wael Badawy, “System On Chip: Trends and Challenges,” The Canadian Journal on Electrical and Computer Engineering, Vol. 26, July/October 2001, pp. 85-90.
Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking
The new VLSI architecture for video object (VO) motion tracking uses a novel hierarchical adaptive structured mesh topology. The structured mesh offers a significant reduction in the number of bits that describe the mesh topology. The motion of the mesh nodes represents the deformation of the VO. Motion compensation is performed using a multiplication-free algorithm for affine transformation, significantly reducing the decoder architecture complexity. Pipelining the affine unit contributes a considerable power saving. The VO motion-tracking architecture is based on a new algorithm. It consists of two main parts: a video object motion-estimation unit (VOME) and a video object motion-compensation unit (VOMC). The VOME processes two consequent frames to generate a hierarchical adaptive structured mesh and the motion vectors of the mesh nodes. It implements parallel block matching motion-estimation units to optimize the latency. The VOMC processes a reference frame, mesh nodes and motion vectors to predict a video frame. It implements parallel threads in which each thread implements a pipelined chain of scalable affine units. This motion-compensation algorithm allows the use of one simple warping unit to map a hierarchical structure. The affine unit warps the texture of a patch at any level of hierarchical mesh independently. The processor uses a memory serialization unit, which interfaces the memory to the parallel units. The architecture has been prototyped using top-down low-power design methodology. Performance analysis shows that this processor can be used in online object-based video applications such as MPEG-4 and VRML
Wael Badawy and Magdy Bayoumi, “Algorithm-Based Low Power VLSI Architecture For 2d-Mesh Video Object Motion Tracking,” The IEEE Transaction on Circuits and Systems for Video Technology, Vol. 12, No. 4, April 2002, pp. 227-237
MRI Data Compression Using a 3-D Discrete Wavelet transform
A low-power system that can be used to compress MRI data and for other medical applications is described. The system uses a low power 3-D DWT processor based on a centralized control unit architecture. The simulation results show the efficiency of the wavelet processor. The prototype processor consumes 0.5 W with total delay of 91.65 ns. The processor operates at a maximum frequency of 272 MHz. The prototype processor uses 16-bit adder, 16-bit Booth multiplier, and 1 kB cache with a maximum of 64-bit data bandwidth. Lower power has been achieved by using low-power building blocks and the minimal number of computational units with high throughput.
Engineering in Medicine and Biology Magazine, IEEE (Volume:21 , Issue: 4 )
Wael Badawy, Guoqing Zhang, Mike Talley, Michael Weeks and Magdy Bayoumi, “MRI Data Compression Using a 3-D Discrete Wavelet transform,” The IEEE Engineering in Medical and Biology Magazine, Vol. 21, Issue 4, July/August 2002, pp. 95-103.
A VLSI Architecture for Video Object Motion Estimation using a Novel 2-D Hierarchical Mesh
This paper proposes a novel hierarchical mesh-based video object model and a motion estimation architecture that generates a content-based video object representation. The 2-D mesh-based video object is represented using two layers: an alpha plane and a texture. The alpha plane consists of two layers: (1) a mesh layer and (2) a binary layer that defines the object boundary. The texture defines the object’s colors. A new hierarchical adaptive structured mesh represents the mesh layer. The proposed mesh is a coarse-to-fine hierarchical 2-D mesh that is formed by recursive triangulation of the initial coarse mesh geometry. The proposed technique reduces the mesh code size and captures the mesh dynamics.
The proposed motion estimation architecture generates a progressive mesh code and the motion vectors of the mesh nodes. The performance analysis for the proposed video object representation and the proposed motion estimation architecture shows that they are suitable for very low bit rate online mobile applications and the motion estimation architecture can be used as a building block for MPEG-4 codec.
Wael Badawy “A VLSI Architecture for Video Object Motion Estimation using a Novel 2-D Hierarchical Mesh,” Journal of Systems Architecture, ISSN 1383 – 7621, invited
A Novel Current-Mode Instrumentation Amplifier Based on Operational Floating Current Conveyor,
This paper presents a novel current-mode instrumentation amplifier (CMIA) that utilizes an operational floating current conveyor (OFCC) as a basic building block. The OFCC, as a current-mode device, shows flexible properties with respect to other current- or voltage-mode circuits. The advantages of the proposed CMIA are threefold. First, it offers a higher differential gain and a bandwidth that is independent of gain, unlike a traditional voltage-mode instrumentation amplifier. Second, it maintains a high common-mode rejection ratio (CMRR) without requiring matched resistors, and finally, the proposed CMIA circuit offers a significant improvement in accuracy compared to other current-mode instrumentation amplifiers based on the current conveyor. The proposed CMIA has been analyzed, simulated, and experimentally tested. The experimental results verify that the proposed CMIA outperforms existing CMIAs in terms of the number of basic building blocks used, differential gain, and CMRR.
Instrumentation and Measurement, IEEE Transactions on (Volume:54 , Issue: 5 )
Yehya H. Ghallab, and Wael Badawy, Karan V.I.S. Kaler and Brent J. Maundy, “A Novel Current-Mode Instrumentation Amplifier Based on Operational Floating Current Conveyor,” IEEE Transaction on Instrumentation and Measurement, Volume 4, October 2005, pp. 1941 – 1949.