Category: Journal Papers

 
+

An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation

Abstract

A new integrated programmable platform architecture is presented, with the support of multiple accelerators and extensible processing cores. An advanced application for this architecture is to facilitate the implementation of H.264 baseline profile video codec. The platform architecture employs the novel concept of virtual socket and optimized memory access to increase the efficiency for video encoding. The proposed architecture is mapped on an integrated FPGA device, Annapolis WildCard-II™ or WildCard-4™, for verification. According to the evaluation under different configurations, the results show that the overall performance of the architecture, with the integrated accelerators, can sufficiently meet the real-time encoding requirement for H.264 BP at basic levels, and achieve about 2–5.5 and 1–3 dB improvement, in terms of PSNR, as compared with MPEG-2 MP and MPEG-4 SP, respectively. The architecture is highly extensible, and thus can be utilized to benefit the development of multi-standard video codec beyond the description in this paper.

Yifeng Qiu, Wael Badawy and Robert Turney, “An Architecture for Programmable Multi-core IP Accelerated Platform with an Advanced Application of H.264 Codec Implementation” Journal of Signal Processing Systems, Volume 57, Number 2 / November, 2009, 123-137.

 Link to the list of other Peer Journal Publications

+

Interpolation-Free Fractional-Pixel Motion Estimation Algorithms with Efficient Hardware Implementation`

Abstract

This paper presents interpolation-free fractional-pixel motion estimation (FME) algorithms and efficient hardware prototype of one of the proposed FME algorithms. The proposed algorithms use a mathematical model to approximate the matching error at fractional-pixel locations instead of using the block matching algorithm to evaluate the actual matching error. Hence, no interpolation is required at fractional-pixel locations. The matching error values at integer-pixel locations are used to evaluate the mathematical model coefficients. The performance of the proposed algorithms has been compared with several FME algorithms including the full quarter-pixel search (FQPS) algorithm, which is used as part of the H.264 reference software. The computational cost and the performance analysis show that the proposed algorithms have about 90% less computational complexity than the FQPS algorithm with comparable reconstruction video quality (i.e., approximately 0.2 dB lower reconstruction PSNR values). In addition, a hardware prototype of one of the proposed algorithms is presented. The proposed architecture has been prototyped using the TSMC 0.18 μm CMOS technology. It has maximum clock frequency of 312.5 MHz, at which, the proposed architecture can process more than 70 HDTV 1080p fps. The architecture has only 13,650 gates. The proposed architecture shows superior performance when compared with several FME architectures.

M. Sayed, W. Badawy, and G. Jullien, “Interpolation-Free Fractional-Pixel Motion Estimation Algorithms with Efficient Hardware Implementation,” the Journal of Signal Processing Systems. Volume 67, Issue 2 , pp 139-155, May 2012.
 Link to the list of other Peer Journal Publications

+

A Prototyping Virtual Socket System-On-Platform Architecture with a Novel ACQPPS Motion Estimator for H.264 Video Encoding Applications

Abstract

H.264 delivers the streaming video in high quality for various applications. The coding tools involved in H.264, however, make its video codec implementation very complicated, raising the need for algorithm optimization, and hardware acceleration. In this paper, a novel adaptive crossed quarter polar pattern search (ACQPPS) algorithm is proposed to realize an enhanced inter prediction for H.264. Moreover, an efficient prototyping system-on-platform architecture is also presented, which can be utilized for a realization of H.264 baseline profile encoder with the support of integrated ACQPPS motion estimator and related video IP accelerators. The implementation results show that ACQPPS motion estimator can achieve very high estimated image quality comparable to that from the full search method, in terms of peak signal-to-noise ratio (PSNR), while keeping the complexity at an extremely low level. With the integrated IP accelerators and optimized techniques, the proposed system-on-platform architecture sufficiently supports the H.264 real-time encoding with the low cost.

Download A Prototyping Virtual Socket System-On-Platform Architecture with a Novel ACQPPS Motion Estimator for H.264 Video Encoding Applications

 

Yifeng Qiu and Wael Badawy, “A Prototyping Virtual Socket System-On-Platform Architecture with a Novel ACQPPS Motion Estimator for H.264 Video Encoding Applications” EURASIP Journal on Embedded Systems, Volume 2009

Link to the list of other Peer Journal Publications

+

Efficient Variable Block Size Selection for AVC Low Bitrate Applications

 

ABSTRACT

The Advanced Video Coding (AVC) standard proposes the usage of Variable Block Size (VBS) motion-compensated prediction and mode decision aiming for an optimized Rate-Distortion (R-D) performance. Unlike Fixed Block Size (FBS) motion-compensated prediction, where all regions of the pictures are treated similarly in terms of temporal prediction, VBS increases the efficiency of encoding by allowing more active regions to be represented with more bits than less active ones. The main concern regarding the usage of VBS motion-compensated prediction is the dramatic increase it adds to the encoder computational requirements, which not only prevents the encoder from satisfying real-time constraints, but also makes it impractical for hardware implementation. This paper presents an efficient VBS selection scheme, which can be applied to any VBS Motion Estimation (ME) module, leading to significant reduction in its computational requirements with minor loss in the quality of the reconstructed picture. The computational requirements reduction is achieved by minimizing the number of required ME searches and simplifying the Mode Decision (MD) operation. In order to meet different applications’ demands, the proposed algorithm can be adjusted to function at any of three operating points, trading off computational requirements with R-D performance. In the paper, the algorithm is described in detail, focusing on the theoretical computational requirements savings. This theoretical analysis is then supported with simulation results performed on three benchmark video sequences with various types of motion. Keywords-H.264/AVC, motion estimation, variable block size.

 

Download from here

Reference: Ihab Amer, Wael Badawy, Graham Jullien, Adrian Chirila-Rus, Robert Turney, and Rana Hamed, “Efficient Variable Block Size Selection for AVC Low Bitrate Applications,” IARIA on-line journals, 2010 Vol. 1&2, July 2010.

Link to the list of other Peer Journal Publications

+

Automatic License Plate Recognition (ALPR): A State-of-the-Art Review

Abstract:

Automatic license plate recognition (ALPR) is the extraction of vehicle license plate information from an image or a sequence of images. The extracted information can be used with or without a database in many applications, such as electronic payment systems (toll payment, parking fee payment), and freeway and arterial monitoring systems for traffic surveillance. The ALPR uses either a color, black and white, or infrared camera to take images. The quality of the acquired images is a major factor in the success of the ALPR. ALPR as a real-life application has to quickly and successfully process license plates under different environmental conditions, such as indoors, outdoors, day or night time. It should also be generalized to process license plates from different nations, provinces, or states. These plates usually contain different colors, are written in different languages, and use different fonts; some plates may have a single color background and others have background images. The license plates can be partially occluded by dirt, lighting, and towing accessories on the car. In this paper, we present a comprehensive review of the state-of-the-art techniques for ALPR. We categorize different ALPR techniques according to the features they used for each stage, and compare them in terms of pros, cons, recognition accuracy, and processing speed. Future forecasts of ALPR are given at the end.

Published in:

Circuits and Systems for Video Technology, IEEE Transactions on  (Volume:23 ,  Issue: 2 )

+

On-Chip Electrical Field Sensing For Lab-On-A-Chip Applications

 

Authors: Yehya H. Ghallab, and Wael Badawy

Abstract:  This paper presents a novel CMOS electric field sensor, termed as a Differential Electric Field Sensitive Field Effect Transistor (DeFET). It’s based on a standard 0.18μm CMOS technology. The DeFET shows a sensitivity of 76 μA/V/μm. Also, the DefET’s theory of operation is presented and discussed. Both the experimental and simulation results confirm the DeFET’s theory of operation is presented.

Link to the paper

Link to download the ECS Trans.-2006-Badawy-1-15

Link to the list of other Peer Journal Publications

Reference:  Yehya H. Ghallab, and Wael Badawy, “On-chip Electrical Field Sensing for Lab-on-a-chip applications“, ElectroChemical Transaction, 1, (28) 1 (2006), pp. 1-15.

+

Hierarchical Adaptive Structure Mesh for Efficient Video Coding

 

Wael Badawy, “Hierarchical Adaptive Structure Mesh for Efficient Video Coding,” The International Journal on Image and Video Processing, Vol. 17, November 2001

+

A Multiplication-Free Algorithm and A Parallel Architecture for Affine Transformation

Affine transformation is widely used in image processing. Recently, it is recommended by MPEG-4 for video motion compensation. This paper presents a novel low power parallel architecture for texture warping using affine transformation (AT). The architecture uses a novel multiplication-free algorithm that employs the algebraic properties of the AT. Low power has been achieved at different levels of the design. At the algorithmic level, replacing multiplication operations with bit shifting saves the power and delay of using a multiplier. At the architecture level, low power is achieved by using parallel computational units, where the latency constraints and/or the operating latency can be reduced. At the circuit level, using low power building blocks (such as low power adders) contributes to the power savings. The proposed architecture is used as a computational kernel in video object coders. It is compatible with MPEG-4 and VRML standards. The architecture has been prototyped in 0.6 μm CMOS technology with three layers of metal. The performance of the proposed architecture shows that it can be used in mobile and handheld applications.

 

Wael Badawy and Magdy Bayoumi, “A Multiplication-Free Algorithm and A Parallel Architecture for Affine Transformation,” The Journal of VLSI Signal Processing-Systems, Kluwer Academic Publishers, Vol. 31, No 2, May 2002, pp. 173-184.

+

System on Chip: the Future of System Integration

System on chip:
The future of the integration paradigm

Syste`me sur une puce:
le futur du paradigme de l’inte ́gration

Wael Badawy

The increase in the number of transistors that can be integrated on a single chip allows the integration of more functions. On the other hand, time-to-market pressures require novel techniques for developing integrated circuits. System on chip is a methodology that allows the integration of several third-party cores with an embedded processor. This paper presents a tutorial for the system- on-chip methodology and presents the design tasks that are involved in developing a system on chip.

L’accroissement du nombre de transistors qu’il est possible d’inte ́grer sur une puce permet d’offrir plus de fonctionnalite ́s. D’autre part, les pressions de la mise en marche ́ rapide de celles-ci exige l’e ́laboration de techniques nouvelles de de ́veloppement de circuits inte ́gre ́s. Les syste`mes sur une puce repre ́sentent une me ́thodologie de de ́veloppement qui permet l’inte ́gration de com- posantes provenant de plusieurs de ́veloppeurs et de les combiner a` un processeur embarque ́. Cet article pre ́sente un tutoriel sur la me ́thodologie de conception de circuits sur une puce et pre ́sente les taˆches de design implique ́es dans le de ́veloppement de tels syste`mes.

 

Wael Badawy, “System on Chip: the Future of System Integration,” The Canadian Journal on Electrical and Computer Engineering, Vol. 27, No. 4, October 2002, pp. 149 – 154

+

A Parallel Multiplication-Free Algorithm and Architecture for Affine-based Motion Compensation

 

Affine transformation is widely used in image processing. Recently, it has been recommended by MPEG-4 for video motion compensation. We present a novel low-power parallel architecture for texture warping using affine transformation (AT). The architecture uses a novel multiplication-free algorithm that employs the algebraic properties of the affine transformation. Low power has been achieved at different levels of the design. At the algorithmic level, replacing multiplication operations with bit shifting saves the power and delay of using a multiplier. At the architecture level, low power is achieved by using parallel computational units. At the circuit level, using low-power cells contributes to the power savings. The proposed architecture is used as a computational kernel in video object coders. It is compatible with MPEG-4 and virtual reality modeling language (VRML) standards. The architecture has been prototyped in 0.6-µm CMOS technology with three layers of metal. The performance of the proposed architecture shows that it can be used in mobile and handheld applications.

Wael Badawy and Magdy Bayoumi, “A Parallel Multiplication-Free Algorithm and Architecture for Affine-based Motion Compensation,” The SPIE Journal on Optical Engineering, Vol. 42 No. 1, January 2003 pp. 255 – 264