# The el-Badry correction

## Description of the method

The el-Badry correction is a method for correcting errors in data on children ever born caused by the enumerator or respondent failing to record answers of ‘zero’ to questions on lifetime fertility and, instead, leaving the response blank. When this occurs, during data processing the response is coded as ‘missing’ or ‘unknown’, even though it was evident to the enumerator at the time of data collection that the correct answer was ‘zero’. The method apportions the number of women whose parity is recorded as ‘missing’ between those whose parity is regarded as being truly unknown, and those women who should have been recorded as childless but whose responses were left blank. It does this apportionment at an aggregate level and not on an individual basis.

## Data required and assumptions

The method requires the number of children ever born, classified by age group of mother, including the count of women with missing data (i.e., where the field was left blank or contained an out-of-range code or a code for not answered or refused).

The method assumes that a constant proportion of women at each age truly did not state their lifetime fertility (i.e. parity) at the time of data collection. The balance of the women with unreported parities is assumed to be erroneously recorded as not stated when the women are, in fact, childless.

## Caveats and warnings

The method relies on the existence of a linear relationship between the proportions of women whose parity is not stated, and that of women reported to be childless. If such a linear relationship is observed,  the adjusted denominator used to calculate average parities should exclude those women whose parity (after correction) is still regarded as unknown. This reflects the implicit assumption that these women’s parity distribution is no different from those of women of the same age whose parity is known.

Where the data indicate that a correction is needed because of the large proportion of missing parity information but the method cannot be applied (for example, due to unavailability of data by age, or violation of the assumption of linearity), women of unknown parity should be included in the denominator used to determine average parities. This implicitly assumes that the parity of all such women is zero (i.e. that all women of unknown parity are childless). This will, of course, result in under-estimated average parities, as not all women of unknown parity are indeed childless.

## Application of method

We define

$N i = N 5 a$

for a = 15, 20, …, 45 and i=a/5-2, to be the number of women in age group i in the population. Thus, N1 represents the number of women aged 15-19 in the population. Denote Ni,j to be the number of women in age group i of parity j, and Ni,u to be the number of women in age group i whose parity is unknown.

#### Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero

Extract a table of reported children ever born (j) by women’s age group (i) from the census data to obtain Ni,j. Missing data on parity (i.e. blank fields and invalid codes) should be combined with codes for parity not stated for each age group to produce Ni,u. The proportion of women in age group i with parity unknown is then

$U i = N i,u N i$

The proportion of women in age group i who are reportedly childless (i.e. are of parity zero) is given by

$Z i = N i,0 N i$

If the Ui are small (less than 2 per cent in each age group), it is not worth applying the correction. In such a situation, average parities should be determined by assuming that the parity distribution of women with not stated parity is the same as that of women whose parity is known, by omitting the women with unstated parities from the denominator of the calculation. Thus, if Pi is the average parity of women in age group i,

$P i = ∑ j=0 ω j. N i,j ∑ j=0 ω N i,j$

If the proportions of women with parity not stated exceed 2 per cent, it is worth assessing whether the correction can be applied.

#### Step 2: Plot the points (Zi, Ui) and evaluate the data

For the method to work correctly, the series of points (Zi, Ui) should lie on, or very close to, a straight line. In some cases, curvature may be observed in the data points corresponding to either the oldest or the youngest ages. If the curvature affects the older ages only, even if it is quite extreme, it is acceptable to exclude the oldest, or two oldest, age groups from the fitting process and fit a straight line to the remaining points since the method has the greatest absolute impact on the proportions not stated at the youngest ages. If the curvature is most noticeable among the younger women, the method should not be used as exclusion of the data points relating to women aged 15-24 would result in the regression performing an out-of-sample extrapolation, the results of which could suggest illogical adjustments in these age groups.

If a strongly linear relationship cannot be identified, even after excluding one or two data points from older women, the method cannot be applied. In this situation, it is preferable to assume that all women of not stated parity are childless, and to include them in the denominator of the average parity calculation

$P i = ∑ j=0 ω j. N i,j N i$
Equation 1

The analytical report should note that this has been done, and that, therefore, the average parity values are liable to be underestimated.

#### Step 3: Determine the slope and intercept of the best straight line fit to the data

The slope (γ) and intercept (β) of the fitted line are found by means of linear regression of Zi against Ui applied to those data points selected for inclusion, that is,

$U i =β+γ Z i$

The intercept (β), which is independent of age (i), is the estimate of the proportion of those women in each age group with unknown parity whose parity is deemed to be truly unknown, and not misreported.

#### Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The adjusted proportion of women in age group i that is estimated to be truly childless is given by

$\text{\hspace{0.17em}}{Z}_{i}^{*}={Z}_{i}+{U}_{i}-\beta \text{\hspace{0.17em}}$

That is, the revised proportion of women of zero parity in any age group is the proportion actually recorded as being of zero parity together with the proportion of women in that age group of not stated parity less the estimated proportion of women whose parity is regarded as being truly unknown. The revised estimate of the number of childless women in age group i is given by

$N i,0 * = N i × Z i *$

Thus, the estimated true proportion of women in each age group whose parity is unknown is given by

$N i,u * = N i × β$

The

$N i,j *$

for other parities (j > 0) are unchanged.

#### Step 5: Calculation of average parities

If an el-Badry correction has been applied to the data, the average parities are given by

$P i = ∑ j=0 ω j. N i,j * (1−β) N i$
Equation 2

embodying the assumption that the remaining women in age group i of unknown parity, βNi, who are omitted from the denominator, have the same average parity as the women in age group i whose parity is known.

## Interpretation and checks

The value of β shows the estimated proportion of women whose parity is truly not stated. Larger values of β are therefore associated with poorer quality data.

Occasionally, the method may have a contrary effect and suggest that the number of women with not-stated parity is understated, and that the number of women of reported parity zero should be reduced. Such a situation will arise if β > Ui. If this is so, the correction should not be applied to that age group.

## Worked example

The accompanying spreadsheet implements the method using data from the 1989 Kenya Census data obtained from IPUMS. The original data are presented in Table 1.

Table 1 Children ever born, by age group of mother at census date, Kenya, 1989 Census

 Age group (i) 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Parity 1 2 3 4 5 6 7 0 597,560 198,600 59,400 23,120 14,580 11,040 9,560 1 134,700 224,660 83,140 26,140 13,620 9,460 7,740 2 38,120 202,300 120,940 38,340 19,180 13,240 9,280 3 11,120 126,500 150,500 53,880 28,020 17,000 12,440 4 6,820 59,700 146,500 73,280 37,340 21,400 14,800 5 1,740 33,720 102,300 87,720 48,140 28,980 18,560 6 0 12,480 58,980 83,580 56,520 35,260 26,280 7 0 0 57,180 91,800 56,240 41,260 28,640 8 0 0 0 64,740 56,560 42,700 32,920 9 0 0 0 0 40,780 39,480 33,000 10 0 0 0 0 26,840 32,240 27,920 11 0 0 0 0 14,920 22,840 21,920 12 0 0 0 0 8,280 14,660 14,720 13 0 0 0 0 3,740 7,900 8,920 14 0 0 0 0 2,180 4,080 4,900 15 0 0 0 0 1,260 2,100 2,860 16 0 0 0 0 960 1,200 1,540 17 0 0 0 0 520 680 1,000 18 0 0 0 0 420 520 620 19 0 0 0 0 140 340 380 20 0 0 0 0 160 300 280 21 0 0 0 0 240 160 280 22 0 0 0 0 40 100 60 23 0 0 0 0 20 20 80 24 0 0 0 0 60 20 80 25 0 0 0 0 60 40 0 26 0 0 0 0 60 40 80 27 0 0 0 0 80 40 60 28 0 0 0 0 20 40 40 29 0 0 0 0 20 0 40 30 0 0 0 0 340 440 360 Not Stated 402,780 147,540 61,920 31,580 20,240 15,420 12,960 TOTAL 1,192,840 1,005,500 840,860 574,180 451,580 363,000 292,320

Inspection of the data reveals that they have been edited to disallow the recording of high parities in women aged less than 35. The editing rule applied at the preparatory stage would appear to be stricter than the one suggested in the section on evaluation of parity data. Thus reports of 20-24 year old women have been restricted to parity 6 or less (rather than parity 8), reports for those aged 25-29 are truncated at parity 7 (rather than parity 12) and those of 30-34 year olds at parity 8 (rather than 15). However, implausibly high parities have been allowed to remain at ages 35 and more. Therefore, further light editing of the data highlighted in italics in Table 1 could be undertaken by re-assigning to the unknown category reports of parity 19 and over for age group 35-39, parity 23 and over in the age group 40-44, and parity 26 and over in the last age group, 45-49.

An option can be selected on the Introduction tab of the spreadsheet to set implausible parities to ‘not stated’ prior to the application of the method.

#### Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero

Table 2 presents the revised data, together with the calculation of the proportions of women of parity zero, and parity not stated in each age group.

Table 2 Correction of parity data, and calculation of proportion of women of parity zero, and parity not stated, Kenya, 1989 Census

 Age group (i) 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Parity 1 2 3 4 5 6 7 0 597,560 198,600 59,400 23,120 14,580 11,040 9,560 1 134,700 224,660 83,140 26,140 13,620 9,460 7,740 2 38,120 202,300 120,940 38,340 19,180 13,240 9,280 3 11,120 126,500 150,500 53,880 28,020 17,000 12,440 4 6,820 59,700 146,500 73,280 37,340 21,400 14,800 5 1,740 33,720 102,300 87,720 48,140 28,980 18,560 6 0 12,480 58,980 83,580 56,520 35,260 26,280 7 0 0 57,180 91,800 56,240 41,260 28,640 8 0 0 0 64,740 56,560 42,700 32,920 9 0 0 0 0 40,780 39,480 33,000 10 0 0 0 0 26,840 32,240 27,920 11 0 0 0 0 14,920 22,840 21,920 12 0 0 0 0 8,280 14,660 14,720 13 0 0 0 0 3,740 7,900 8,920 14 0 0 0 0 2,180 4,080 4,900 15 0 0 0 0 1,260 2,100 2,860 16 0 0 0 0 960 1,200 1,540 17 0 0 0 0 520 680 1,000 18 0 0 0 0 420 520 620 19 0 0 0 0 0 340 380 20 0 0 0 0 0 300 280 21 0 0 0 0 0 160 280 22 0 0 0 0 0 100 60 23 0 0 0 0 0 0 80 24 0 0 0 0 0 0 80 25 0 0 0 0 0 0 0 U 402,780 147,540 61,920 31,580 21,480 16,060 13,540 TOTAL 1,192,840 1,005,500 840,860 574,180 451,580 363,000 292,320 Ui 0.338 0.147 0.074 0.055 0.048 0.044 0.046 Zi 0.501 0.198 0.071 0.040 0.032 0.030 0.033

The data include high proportions of women with parity not stated at ages 15-19

$( 402,780 1,192,840 =0.338 )$

20-24 (0.147) and, to a lesser extent, the older age groups. The proportion of women reported as childless (Z­i) falls rapidly, from around 50 per cent in the first age group down to around 3 per cent at the end of the childbearing period. On these grounds, it is worth investigating whether an el-Badry correction can be applied to the data.

#### Step 2: Plot the points (Zi, Ui) on a set of axes and evaluate the data

The Zi and Ui are plotted against each other (shown by the blue diamonds) in Figure 1. The straight line fitted to the points is shown by the red line. If a point is excluded from the fitting process, the figure in the spreadsheet represents it with an open diamond.

Figure 1 Fitting of el-Badry correction, Kenya 1989 census

There is a clear linear relationship between the plotted points, and all points can be included in the application of an el-Badry correction.

#### Step 3: Determine the slope and intercept of the best straight line fit

Performing a linear regression of the Zi on the Ui for the selected points gives a value for the intercept (beta) of 0.02745. This suggests that around 2.7 per cent of the data on women’s parities can be regarded as truly missing.

#### Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The revised number of women of zero parity is given by

$N i,0 * = N i ( Z i + U i −β)$

while the revised numbers with parity unknown are calculated by multiplying the total number of women in each age group by β as shown in Table 3. For example, the number of women aged 20–24 estimated to be truly of an unknown parity is given by 0.02745× 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15–19 is derived from 1,192,840× (0.501 + 0.338 – 0.027) = 967,594.

Table 3 Revised estimates of numbers of women with parity not stated and childless women by age, Kenya, 1989 Census

 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Revised parity not stated 32,746 27,603 23,084 15,763 12,397 9,965 8,025 Revised zero parity 967,594 318,537 98,236 38,937 23,663 17,135 15,075

For example, the number of women aged 20-24 estimates to be truly of an unknown parity is given by 0.02745 x 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15-19 is derived from 1,192,840 x (0.501 + 0.338 - 0.027) = 967,594.

#### Step 5: Calculation of average parities

Since an el-Badry correction has been applied, corrected average parities, presented in Table 4, are then derived using Equation 2.

Table 4 Corrected average parities by age group, Kenya, 1989 Census

 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Average parity 0.242 1.525 3.214 4.760 6.239 7.120 7.510

Note that, relative to the average parities produced if the correction is not applied (and assuming therefore that all women with not stated parity are of parity zero), the correction increases the parities in each age group by a constant,

$\text{\hspace{0.17em}}\frac{1}{1-\beta }\text{\hspace{0.17em}}$

## Detailed description of the method

The method is fully described in el-Badry (1961). El-Badry’s fundamental insight was that, if it could be assumed that:

1)   there is a linear relationship between the proportions of childless women of a given age in a population, and the proportion of women whose parity is not stated; and

2)   the true, unknown, proportion of women whose parity is not known is a constant and independent of age, then

$U i =α Z i * +β$
Equation 3

where αZ*i is the proportion of truly childless women reported as parity not stated, and β is the true, constant, proportion of women with parity not stated.

Hence, if αZ*i have been misclassified as not stated when they are truly childless, then

$Z i = Z i * −α Z i * =(1−α) Z i * .$

and therefore:

$Z i * = Z i (1−α)$
Equation 4

and substituting this into Equation 3,

$U i = α 1−a Z i +β=γ Z i +β$

where gamma can be thought of as the odds of a childless woman being classified as being of unknown parity.

Thus, a regression of Ui on Zi will give estimates of β (as well as γ and α).

From Equation 3, we then obtain

$U i −β=α Z i * = Z i * − Z i$

and hence that

$Z i * = N i,0 * = U i −β+ Z i$

and

$U i * =β N i$

Note that, even though we have two identities involving Zi, they will only give the same answer when the fit is exact. Convention dictates that we prefer to use Equation 3 rather than Equation 4, on the grounds that it relies on the fitted value of β (the estimated proportion of truly not stated parities) rather than on the value of α, which lacks intuitive interpretability.

After deriving corrected values of Z*i and U*i , average parities can be calculated using Equation 2.

Having applied the correction, care should be taken to ensure that, in every age group, the adjusted number of childless women (that is, of parity zero) is less than the number of women reporting no births in the reference period in response to the question on recent fertility. Hence the revised Z*i can be used to determine the minimum number of women who could not have had a birth in the reference period before the census.

A version of the correction designed for (the now-rare) situations where questions on children ever born are asked only of married women is described in Annex II of Manual X (UN Population Division 1983).

## References

el-Badry MA. 1961. “Failure of enumerators to make entries of zero: errors in recording childless cases in population censuses”, Journal of the American Statistical Association 56(296):909–924. doi: http://dx.doi.org/10.1080/01621459.1961.10482134

UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html