Postprint of: Kłosowski M., A Power-Efficient Digital Technique for Gain and Offset Correction in Slope ADCs, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, Vol. 67, iss. 6 (2020), pp. 979-983, DOI: 10.1109/TCSII.2019.2928183

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

# A Power-Efficient Digital Technique for Gain and Offset Correction in Slope ADCs

# M. Kłosowski

Abstract — In the paper, a power-efficient digital technique for gain and offset correction in slope ADCs has been proposed. The technique is especially useful for imaging arrays with massively parallel image acquisition where simultaneous compensation of dark signal non-uniformity (DSNU) as well as photo-response non-uniformity (PRNU) is critical. The presented approach is based on stopping the ADC clock by a specially prepared clockenable pulse sequence. The paper describes the properties of ADCs utilizing this clock stopping technique, including power dissipation, integral and differential nonlinearity. experimental validation has been performed for the ASIC implementation of the 128-pixel imager containing photo-sensors integrated with analog-to-digital converters. Finally, a modification is proposed that increases the accuracy of the gain correction. Measurements confirm functionality of the proposed approach. Reduction of the PRNU (to ~0.4 LSB) has been achieved as well.

Index Terms— digital image sensor, digital pixel, fixed pattern noise (FPN), gain correction, offset correction, photo-response non-uniformity (PRNU).

#### I. INTRODUCTION

AIN and offset correction of analog-to-digital converters (ADCs) is important in many microelectronic systems. For example, in measurement systems, it can be used to compensate for gain and offset errors of analog preamplifiers, calibration of the sensor, etc. In CMOS image sensors (CISs), the offset correction is often realized using correlated double sampling (CDS) [1], [2] or double sampling (DS) [3]. CDS can also be implemented in the digital domain by subtracting subsequent ADC processing results (e.g., by changing the counting direction of the counter in the ADC) [4].

In addition to the offset correction, it is usually necessary to perform the gain correction (GC) operation. It equalizes the gain of all analog-to-digital converters and thus reduces the gain-induced fixed-pattern-noise (FPN) which, in the image sensor applications, is referred to as photo-response non-uniformity (PRNU) [5]. The gain and offset corrections taken together are often called "flat-field correction". In typical

Manuscript received September X, 2018; revised January X, 2019; accepted XXXXXXXXXX X, 2019. This work was supported in part by the National Science Centre of Poland under Grants: 2011/03/B/ST7/03547 and 2016/23/B/ST7/03733.

M. Kłosowski is with the Faculty of Electronics Telecommunications and Informatics, Gdańsk University of Technology, Poland (e-mail: klosowsk@pg.edu.pl).

Digital Object Identifier .....

applications, GC should allow for reducing PRNU below 1 LSB. Obtaining such accuracy using digital techniques is not a problem, but the application of a particular technique is also determined by its other features such as: non-linearity introduced by analog-to-digital conversion, increase of power consumption and increase of the silicon area. Nowadays, image converter solutions tend to use massively parallel conversion, not only to easily achieve the global shutter functionality, but also to enable the processing of data directly in the pixel matrix (pixel-level processing) [6], [7] or using a multi-layer 3-D structure [8]-[10].

For CIS with massively parallel ADC, a single conversion path (including a photo-sensor, an analog amplifier, ADC, a read-out circuit, etc.) must be implemented within a small pixel area [11]. Known methods of GC using a digital multiplier [12] or a look-up table (LUT) [13] are effective, but their implementation requires a relatively large area of the integrated circuit and leads to a significant increase in power consumption. Consequently, these techniques are not attractive to massively parallel processing.

Recently, a new technique of GC for slope ADC was proposed, which involves stopping the converter's counter during the conversion [14]. For a typical digital counter, the correction can be easily accomplished by blocking the clock pulses counted by the counter. The blocked pulses are uniformly distributed over time so that non-linearity introduced into the analog-to-digital conversion is minimized. The method works correctly regardless of the code used in the counter and the counting direction. The blocking circuit exhibits much lower complexity, surface area and power consumption than the multiplier or LUT. The gain can be changed in the range from 0 to 1 with a very small  $2^{-N}$  step (where N is the number of output bits of the ADC). In [14], the new GC technique was used to equalize the sensitivity of 128 photo-sensors implemented in the last column of CIS. Some measurements were performed; however, not all properties and advantages were presented. The considerations in [14] include only the case of a 9-bit ADC with a GC. In order to draw more detailed conclusions, simulations of ADCs with different number of bits would be required. In the mentioned work, the benefits of blocking the clock and its effect on reducing power consumption were not presented either.

In this paper, the GC technique for the slope ADC using clock blocking is presented in more detail. Owing to the implemented specialized measurement system [15], additional measurements were performed. Additional simulations were carried out as well. These include nonlinearity simulations for

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <



Fig. 1. Simplified schematic diagram of the single slope ADC bank with the proposed clock stopping circuit for the gain and offset correction.

the general case of the *n*-bit ADC with a GC. The power consumed by the clock blocking system and related to clock distribution was also measured and analyzed. The measurements revealed an interesting property that using the clock blocking for GC reduces the power consumption of the ADC. The functionality of the circuit [14] has been extended by a new mode of operation with the ability to reduce the ADC offset (residual offset after the digital CDS). Finally, a modification to the circuit [14] is proposed that allows for greater accuracy of the gain correction.

#### II. OPERATION OF GAIN AND OFFSET CORRECTION CIRCUIT

Figure 1 shows a block diagram of a bank of single-slope ADCs equipped with the described gain and offset correction circuit. Each converter contains a G1 gate to block the counter's clock. The counter is eventually stopped by the signal from the comparator, but during the counting it can also be stopped by the CE signal generated by the G2 gate. The multi-input G2 gate (NOR) is driven from the outputs of the G3 gates (AND) blocking the influence of individual bus lines on stopping the clock. The state of the memory bit MEM\_n controlling the second input of the G3\_n gate determines whether the given bus line (b<sub>n</sub>) can stop the counter. In the tested CIS [14], the gates G2 and G3 were implemented using a dynamic logic (22 transistors). The 9-bit memory MEM was implemented by means of a shift register (63 transistors).

Also shown in Fig. 1 is the pulse generator module which drives the bus. This block generates a special pulse sequence that allows execution of the gain and offset correction. As all ADCs work synchronously it is possible to use one bus pulse generator for the entire converter set. This generator starts simultaneously with ADCs and works synchronously (common clock and reset). Each of the ADCs has an independent memory (MEM\_0 - MEM\_n) which determines its own correction data for the gain and/or the offset. Once the conversion is finished, the bus can be used to transfer data from the ADCs (as in the implemented CIS).

Figure 2(a) shows the results of the gain and offset correction for the implemented bank of 128 ADCs. The figure shows the FPN measured in [14] with and without a gain



Fig. 2. (a) Measured FPN of the tested imager as a function of the irradiance without corrections [14], with the GC only [14], with the offset and gain correction (this work) and (b) layout of the pixel from the last column of CIS [14] with described GC circuit (21 x 36  $\mu$ m, tech. ams 180 nm 1P6M).

correction, and FPN measured in this work, with the gain and offset corrections enabled simultaneously. A clear decrease in the FPN measured with the GC can be observed. The improvement of FPN after the additional offset correction is minor because the tested ADCs have already used the CDS for the initial offset reduction. Figure 2(b) shows the layout of the pixel with the GC circuit.

For the correct operation of the gain and offset correction circuit, a proper pulse sequence on the bus is necessary. Figure 3 shows the beginning of the pulse sequence. The first part (the left-hand side of the thin vertical line) realizes the offset correction, which was not available in [14]. In the described example, the offset correction can be performed in the range from 0 to +7, because it was implemented using 3 bus lines (and 3 MEM bits of each pixel). The remaining lines and bits were used for the GC. The division of bus and memory bits into the part controlling the offset correction and the part controlling the GC is a configurable parameter of the pulse generator and can be selected depending on the needs. In the case described, six bits were sufficient for the GC, the rest was used for the offset correction. The pulse generator can be common for the entire ADC matrix, or it can be common, for example, only for a column (row) which would allow us to apply a different division of the gain/offset bits for subsequent columns (rows) of the ADC matrix. In the described CIS, the pulse generator was implemented in an external FPGA chip.

The offset correction is performed before the ADC processing. After its completion, the ADC counters contain the initial state correcting the offset of the analog path. Negative values in the offset correction can be obtained by presetting all the ADC counters to a common negative value.

The GC begins simultaneously with the start of the ADC processing (i.e., when the ramp signal appears). This process is shown in Fig. 3 (the right-hand side of the thin vertical line). The bus pulse sequence has been carefully selected to ensure that the clock operation is evenly interrupted. A certain unevenness of the clock, however, remains. It is the cause of the nonlinear distortions discussed in Chapter IV.

The GC coefficients must be calculated and uploaded to the GC memory. In order to acquire the coefficients, the imager's response to a uniform irradiance is measured [15]. The pixel with the weakest response is selected and receives a GC coefficient of 1, other pixels are attenuated by GC to make the



Fig. 3. Simplified timing diagram for the A/D conversion with the offset (+3, 011b) and gain (111001011b/511, 459/511, 0.898) correction. Memory contents (offset/gain): 110/001011. Thin vertical line marks the end of the offset correction process and the begin of the A/D conversion with the gain correction.

imager's response uniform (Fig. 5(a)).

Presented solution is designed for a global shutter CIS with pixel parallel conversion (although the GC has only been implemented in one column), therefore the GC coefficients are constant for ADC. For a column parallel CIS the GC coefficients must be changed on a row by row basis.

## III. ENERGY CONSUMPTION

The measured energy consumption of the digital circuits of ADCs used in tested CIS depending on the value of the GC coefficient has been shown in Fig. 4. The clock stopping circuit was designed using a dynamic logic. In the case of multiplication by 1, the circuit is not discharged and it does not draw a dynamic power. The highest dynamic power is taken by the circuit during multiplication by 0. For typical applications of the GC, the factor range of 0.8 to 1.0 is sufficient. The graph also indicates that multiplication by values less than 1 results in reduction of the energy consumed by the entire ADC clock generation circuit (the logic that interrupts the clock, the clock buffer and the load created by the clock inputs of the synchronous counter).

Even higher energy savings can be observed when considering the energy consumption of the counter. This synchronous type linear feedback shift register (LFSR) counter works at a lower voltage (1.2V), than the rest of the system (1.8V). The maximum energy consumed by the counter (maximum illuminance condition) depending on the GC coefficient is also shown in Fig. 4.

Finally, the total energy consumption of the converter (the sum of energy consumed by the counter and the clock generation circuit) depending on the value of the GC coefficient presented in Fig. 4 confirms that the active GC reduces the total energy consumption of the ADCs.

The total CIS energy balance should also take into account the energy absorbed by the bus, which distributes the clock blocking signals to the pixels. It is the energy associated with the capacitance of the bus lines, the input capacitance of the blocking circuits and the capacitance of disabled readout buffers (after the conversion, the bus is used to read the data from the ADCs). Table I presents the energy consumption of this bus per one converter (pixel) depending on the number of active bus lines. The multiplication range possible for a given number of active bus lines is also presented.

Comparison of Table I and Fig. 4 shows that for typical applications for which the GC range of 0.751 to 1.0 is sufficient, the energy consumed by the ADC with the GC is less than the energy consumed by the ADC without a GC



Fig. 4. Measured energy used by the clock stopping circuit, clock generation circuit (clock stopping and clock distribution), synchronous LFSR counter, and total energy used by the digital part of the converter for a single conversion (in a single pixel) as a function of the GC coefficient.

when the GC coefficients are lower than about 0.95 (for this coefficient value, the energy savings are equal to the energy used to distribute the clock blocking signals on the bus).

The number of bus lines to be activated depends on the required range of GC. In Fig. 5(a), a histogram of the GC coefficients calculated for the measured CIS is shown. It can be inferred from Fig. 5(a) and Table I that six bits of the bus (the range of 0.877 to 1.0) are sufficient for a GC of the imager. This means that the energy loss in the bus for a single conversion is 1.287 pJ. The energy savings resulting from GC using coefficients lower than about 0.975 are already greater than this loss (Fig. 4). Since the vast majority of coefficients for the measured CIS are lower than 0.975 (Fig. 5(a)), the use of a GC leads to a reduction of the total consumed energy.

Unused bus lines can be utilized for the offset correction. It takes place before the processing of the ADC. The signals on the bus lines selected for the offset correction only change during this correction (Fig. 3). Consequently, the offset correction does not significantly affect the power consumption of the ADCs. For the considered imager, the remaining three bus lines are sufficient to compensate for the post-CDS offset. Its histogram is shown in Fig. 5(b).

#### IV. LINEARITY

Due to the interrupted clocking of the converter's counter, one can expect a deterioration of the ADC linearity. Figure 6(a) shows the measured differential nonlinearity (DNL) as a function of the ADC output code. The measurement was performed using the histogram method. The GC coefficient





Fig. 5. Distribution of GC coefficients (a) and offset correction values (b) calculated for the measured imager containing 128 pixels (converters).

TABLE I
ENERGY USED BY THE MULTIPLIER BUS (MEASURED)

| ENERGY COED BY THE MODIFIE ELECTION (MEMBERSES) |                       |                                |  |  |
|-------------------------------------------------|-----------------------|--------------------------------|--|--|
| Number of active bus lines                      | Programmable range    | Measured bus energy used       |  |  |
|                                                 | of the GC coefficient | for a single conversion of one |  |  |
|                                                 | (for 9-bit counter)   | converter (pixel)              |  |  |
| 1                                               | 0.998 - 1             | 0.052 pJ                       |  |  |
| 2                                               | 0.994 - 1             | 0.068 pJ                       |  |  |
| 3                                               | 0.986 - 1             | 0.183 pJ                       |  |  |
| 4                                               | 0.971 - 1             | 0.393 pJ                       |  |  |
| 5                                               | 0.939 - 1             | 0.733 pJ                       |  |  |
| 6                                               | 0.877 - 1             | 1.287 pJ                       |  |  |
| 7                                               | 0.751 - 1             | 2.569 pJ                       |  |  |
| 8                                               | 0.501 - 1             | 4.996 pJ                       |  |  |
| 9                                               | 0 - 1                 | 9.093 pJ                       |  |  |

(510/511) was chosen to show the maximum DNL error (a smaller DNL was obtained for other GC values). The relationship between the maximum DNL error and the value of the GC coefficient was simulated. The results are shown in Fig. 6(b). The effect of the GC mechanism on DNL is visible, however DNL is always in the range from -0.5 to 1.0 LSB. There are no lost codes.

Integral nonlinearity (INL) is also degraded due to the interrupted clock in the gain-corrected ADC. The unevenness of this clock depends on the GC coefficient. When the 511/511 coefficient is set, the counter's clock is not stopped at all, which corresponds to the ADC without the GC.

Figure 7 shows the measured INL for the GC factor 426/511 (which is one of the GC coefficient values for which the maximum INL is present) and, for comparison, an INL without the GC. The nonlinearity introduced by other parts of the ADC (optical sensor, ramp digital-to-analog converter, leakage, etc.) is visible. The INL introduced by the GC is comparable to the INL introduced by other parts of the ADC.

The influence of the GC coefficient on INL has been illustrated in Fig. 8(a). The INL is zero only for the GC coefficients equal to 0 or 1. A small change in the GC coefficient often causes a large change in the INL, thanks to which INL optimization is possible if a small deviation in the selected GC coefficient is acceptable.

The subsequent simulations show that with the increase in the number of bits of the ADC counter (which is equal to the maximum number of GC coefficient bits) the maximum INL also increases (about 1 LSB for every 6 bits). However, this means that INL relative to the full-scale output of ADC decreases with the increase in the number of ADC bits, as shown in Fig. 8(b). From this plot, it can be inferred that if the relative INL is insufficient, it is enough to increase the width of the counter, so as to lower the INL to the acceptable level.



Fig. 6. (a) DNL measured as a function of the ADC output code with the GC coefficient set to 510/511 and (b) range of DNL for all output codes (shaded) simulated as a function of the GC coefficient (9-bit).



Fig. 7. INL measured as a function of the irradiance with GC disabled (coeff=511/511), with max. INL condition for 9-bit counter (coeff=426/511) and with max. INL condition for precise GC (coeff=1706/2047).



Fig. 8. (a) Max. INL for all output codes simulated as a function of the 9-bit GC coefficient and (b) max. INL for all output codes and GC coefficients simulated as a function of the width of the ADC counter.

#### V. PRECISE GAIN CORRECTION

To achieve better precision of GC, circuit described in [14] has to be modified. The counter (Fig. 1) should be extended with E extra flip-flops (FFs) on the LSB side. For a binary counter the outputs of these FFs do not have to be connected to the output data bus. In addition, the clock frequency of the digital part of the ADC should be increased  $2^E$  times. The analog part of the ADC (ramp DAC, comparator) still works at the original frequency.

The minimum value of the GC coefficient (min\_coeff) for counters of any width can be calculated using the formula:

$$min\_coeff = 1 - 2^{m-c} \tag{1}$$

where m is the width of the multiplier bus and c is the width of the counter  $(c \ge m)$ . For E equal to 2, m is equal to 9 and c is equal to 11, so the coefficient is in the range from 0.75 to 1 which is sufficient for most applications. The precision of the coefficients is now 4 times higher and the available coefficients are in the range from 1536/2047 to 2047/2047.





Fig. 9. Measured FPN of the tested 128 pixel imager as a function of the irradiance without corrections, with the gain correction, and with the precise gain corrections (with one and with two extra FFs).

Better precision coefficients were calculated by averaging 900 frames of CIS response for the selected irradiance. Extra bits do not have to be connected to the output due to dithering. A second calculation of the GC coefficients was performed using extra LSB counter outputs, but the results after averaging were almost identical. For CIS with low level of random noise the state of the additional LSB outputs of the counter can improve the accuracy of the calculated coefficients. Extra FFs can be read out without an additional circuitry by setting the analog part of the ADC to a state that forces the counting, and then performing additional readouts of the matrix, each after one clock pulse. The state of the hidden LSB FFs can be restored by software that finds differences between consecutive frames.

The described technique was tested on the original CIS (fabricated with 9-bit counters) by emulating MSBs in the software. The CIS response was measured by monotonically increasing the irradiance and performing the readouts (r). The state of non-existent MSB FFs was restored by counting the number of pixel counter overflows (o). The 11-bit pixel value (p) can be calculated using the following formula:

$$p = r + 511 \cdot o \tag{2}$$

The results of the better precision GC (for E=1 and E=2) are presented in Fig. 9. Further reduction of FPN is clearly visible. According to Fig. 8b, the extension of the width of the counter should also reduce the relative INL. The reduced INL, shown in Fig. 7 (coefficient 1706/2047), is almost the same as for the disabled GC.

## VI. CONCLUSION

The paper presents the technique of gain and offset correction of slope type ADCs. The technique is particularly effective for many ADCs operating synchronously (e.g., in image sensors). The presented technique also reduces the power consumption of ADCs. The linearity of converters using the GC is slightly degraded, but both INL and DNL are limited to the acceptable level. In addition, with the increase in the number of bits of the ADC, the maximum relative INL decreases. The described GC technique can be implemented in all slope ADCs using counters for which successive clock

|                    | proposed | LUT (6T SRAM)<br>[13] | array multiplier<br>[12] |
|--------------------|----------|-----------------------|--------------------------|
| transistor count   | 85       | ~ 27648               | ~ 2534                   |
| total system power | reduced  | increased             | increased                |

pulses correspond linearly to the measured quantity. A brief comparison with other common GC techniques is presented in Table II. The precise GC allows further reduction of PRNU, which not only improves the image quality but also protects against the use of PRNU as a fingerprint of realized photographs [16]. In order to make the PRNU correction effective in this case, it must be implemented at the earliest possible stage of image data processing, so the correction in the ADC itself is the best for such applications.

#### REFERENCES

- [1] M. Perenzoni *et al.*, "A 160×120-pixels range camera with in-pixel correlated double sampling and fixed-pattern noise correction," *IEEE J. Solid-State Circuits*, vol. 46, no. 7, pp. 1672–1681, Jul. 2011.
- [2] H. M. Wey and W. Guggenbuhl, "An improved correlated double sampling circuit for low noise charge-coupled devices," *IEEE Trans. Circuits Syst.*, vol. 37, no. 12, pp. 1559–1565, Dec. 1990.
- [3] A. Lopich and P. Dudek, "A SIMD cellular processor array vision chip with asynchronous processing capabilities," *IEEE Trans. Circuits Syst. I*, vol. 58, no. 10, pp. 2420–2431, Oct. 2011.
- [4] Y. Nitta et al., "High-speed digital double sampling with analog CDS on column parallel ADC architecture for low-noise active pixel sensor," in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, USA, pp. 2024–2031, Feb. 2006.
- [5] S. Lim and A. El Gamal, "Gain fixed pattern noise correction via optical flow," *IEEE Trans. Circuits Syst. I*, vol. 51, no. 4, pp. 779–786, Apr. 2004.
- [6] W. Jendernalik, G. Blakiewicz, J. Jakusz, S. Szczepanski, and R. Piotrowski, "An analog sub-miliwatt CMOS image sensor with pixel-level convolution processing," *IEEE Trans. Circuits Syst. I*, vol. 60, no. 2, pp. 279–289, Feb. 2013.
- [7] K. Ito, B. Tongprasit, and T. Shibata, "A computational digital pixel sensor featuring block-readout architecture for on-chip image processing," *IEEE Trans. Circuits Syst. I*, vol. 56, no. 1, pp. 114–123, Jan. 2009.
- [8] A. Lopich and P. Dudek, "Architecture and design of a programmable 3D-integrated cellular processor array for image processing," in *Proc. IEEE/IFIP 19<sup>th</sup> Int. Conference on Very Large Scale Integration (VLSI-Soc)* 2011, Hong Kong, China, pp. 349–353, Oct. 2011.
   [9] G. W. Deptuch *et al.*, "Design and tests of the vertically integrated
- [9] G. W. Deptuch *et al.*, "Design and tests of the vertically integrated photon imaging chip," *IEEE Trans. Nuclear Science*, vol. 61, no. 1, pp. 663–674, Feb. 2014.
- [10] M. Goto et al., "Pixel-parallel 3-D integrated CMOS image sensor with pulse frequency modulation A/D converters developed by direct bonding of SOI layers," *IEEE Trans. Electron Devices*, vol. 62, no. 11, pp. 3530– 3535, Nov. 2015.
- [11] A. E. Gamal and H. Eltoukhy, "CMOS image sensors," *IEEE Circuits & Devices Magazine*, vol. 21, no. 3, pp. 6-20, May/Jun. 2005.
- [12] G. K. Beaverson, "Apparatus and method for correcting CCD pixel nonuniformities", U.S. Patent 4698685, Oct. 6, 1987.
- [13] H. S. Bloss et al., "High-speed camera based on a CMOS active pixel sensor," Proc. SPIE Electr. Img. Conf., vol. 3968, pp. 31-38, Feb. 2000.
- [14] M. Kłosowski, W. Jendernalik, J. Jakusz, G. Blakiewicz, and S. Szczepański, "A CMOS pixel with embedded ADC, digital CDS and gain correction capability for massively parallel imaging array", *IEEE Trans. Circuits Syst. I*, vol. 64, no. 1, pp. 38–49, Jan. 2017.
- [15] M. Kłosowski, J. Jakusz, W. Jendernalik, G. Blakiewicz, S. Szczepański, and S. Kozieł, "A high-efficient measurement system with optimization feature for prototype CMOS image sensors," *IEEE Trans. Instrum. Meas.*, vol. 67, no. 10, pp. 2363-2372, Oct. 2018.
- [16] M. Chen, J. Fridrich, M. Goljan, and J. Lukas, "Determining image origin and integrity using sensor noise," *IEEE Trans. Inf. Forensics Security*, vol. 3, no. 1, pp. 74-90, Mar. 2008.

