Skip to Content

The el-Badry correction

Description of the method

The el-Badry correction is a method for correcting errors in data on children ever born caused by the enumerator or respondent failing to record answers of ‘zero’ to questions on lifetime fertility and, instead, leaving the response blank. When this occurs, during data processing the response is coded as ‘missing’ or ‘unknown’, even though it was evident to the enumerator at the time of data collection that the correct answer was ‘zero’. The method apportions the number of women whose parity is recorded as ‘missing’ between those whose parity is regarded as being truly unknown, and those women who should have been recorded as childless but whose responses were left blank. It does this apportionment at an aggregate level and not on an individual basis.

Data required and assumptions

The method requires the number of children ever born, classified by age group of mother, including the count of women with missing data (i.e., where the field was left blank or contained an out-of-range code or a code for not answered or refused).

The method assumes that a constant proportion of women at each age truly did not state their lifetime fertility (i.e. parity) at the time of data collection. The balance of the women with unreported parities is assumed to be erroneously recorded as not stated when the women are, in fact, childless.

Caveats and warnings

The method relies on the existence of a linear relationship between the proportions of women whose parity is not stated, and that of women reported to be childless. If such a linear relationship is observed,  the adjusted denominator used to calculate average parities should exclude those women whose parity (after correction) is still regarded as unknown. This reflects the implicit assumption that these women’s parity distribution is no different from those of women of the same age whose parity is known.

Where the data indicate that a correction is needed because of the large proportion of missing parity information but the method cannot be applied (for example, due to unavailability of data by age, or violation of the assumption of linearity), women of unknown parity should be included in the denominator used to determine average parities. This implicitly assumes that the parity of all such women is zero (i.e. that all women of unknown parity are childless). This will, of course, result in under-estimated average parities, as not all women of unknown parity are indeed childless.

Application of method

We define

N i = N 5 a

for a = 15, 20, …, 45 and i=a/5-2, to be the number of women in age group i in the population. Thus, N1 represents the number of women aged 15-19 in the population. Denote Ni,j to be the number of women in age group i of parity j, and Ni,u to be the number of women in age group i whose parity is unknown.

Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero

Extract a table of reported children ever born (j) by women’s age group (i) from the census data to obtain Ni,j. Missing data on parity (i.e. blank fields and invalid codes) should be combined with codes for parity not stated for each age group to produce Ni,u. The proportion of women in age group i with parity unknown is then

U i = N i,u N i

The proportion of women in age group i who are reportedly childless (i.e. are of parity zero) is given by

Z i = N i,0 N i

If the Ui are small (less than 2 per cent in each age group), it is not worth applying the correction. In such a situation, average parities should be determined by assuming that the parity distribution of women with not stated parity is the same as that of women whose parity is known, by omitting the women with unstated parities from the denominator of the calculation. Thus, if Pi is the average parity of women in age group i,

P i = j=0 ω j. N i,j j=0 ω N i,j

If the proportions of women with parity not stated exceed 2 per cent, it is worth assessing whether the correction can be applied.

Step 2: Plot the points (Zi, Ui) and evaluate the data

For the method to work correctly, the series of points (Zi, Ui) should lie on, or very close to, a straight line. In some cases, curvature may be observed in the data points corresponding to either the oldest or the youngest ages. If the curvature affects the older ages only, even if it is quite extreme, it is acceptable to exclude the oldest, or two oldest, age groups from the fitting process and fit a straight line to the remaining points since the method has the greatest absolute impact on the proportions not stated at the youngest ages. If the curvature is most noticeable among the younger women, the method should not be used as exclusion of the data points relating to women aged 15-24 would result in the regression performing an out-of-sample extrapolation, the results of which could suggest illogical adjustments in these age groups.

If a strongly linear relationship cannot be identified, even after excluding one or two data points from older women, the method cannot be applied. In this situation, it is preferable to assume that all women of not stated parity are childless, and to include them in the denominator of the average parity calculation

P i = j=0 ω j. N i,j N i
Equation 1

The analytical report should note that this has been done, and that, therefore, the average parity values are liable to be underestimated.

Step 3: Determine the slope and intercept of the best straight line fit to the data

The slope (γ) and intercept (β) of the fitted line are found by means of linear regression of Zi against Ui applied to those data points selected for inclusion, that is,

U i =β+γ Z i

The intercept (β), which is independent of age (i), is the estimate of the proportion of those women in each age group with unknown parity whose parity is deemed to be truly unknown, and not misreported.

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The adjusted proportion of women in age group i that is estimated to be truly childless is given by

Z i * = Z i + U i β

That is, the revised proportion of women of zero parity in any age group is the proportion actually recorded as being of zero parity together with the proportion of women in that age group of not stated parity less the estimated proportion of women whose parity is regarded as being truly unknown. The revised estimate of the number of childless women in age group i is given by

N i,0 * = N i × Z i *

Thus, the estimated true proportion of women in each age group whose parity is unknown is given by

N i,u * = N i ×β

The

N i,j *

for other parities (j > 0) are unchanged.

Step 5: Calculation of average parities

If an el-Badry correction has been applied to the data, the average parities are given by

P i = j=0 ω j. N i,j * (1β) N i
Equation 2

embodying the assumption that the remaining women in age group i of unknown parity, βNi, who are omitted from the denominator, have the same average parity as the women in age group i whose parity is known.

Interpretation and checks

The value of β shows the estimated proportion of women whose parity is truly not stated. Larger values of β are therefore associated with poorer quality data.

Occasionally, the method may have a contrary effect and suggest that the number of women with not-stated parity is understated, and that the number of women of reported parity zero should be reduced. Such a situation will arise if β > Ui. If this is so, the correction should not be applied to that age group. 

Worked example

The accompanying spreadsheet implements the method using data from the 1989 Kenya Census data obtained from IPUMS. The original data are presented in Table 1.

Table 1 Children ever born, by age group of mother at census date, Kenya, 1989 Census

 

Age group (i)

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Parity

1

2

3

4

5

6

7

0

597,560

198,600

59,400

23,120

14,580

11,040

9,560

1

134,700

224,660

83,140

26,140

13,620

9,460

7,740

2

38,120

202,300

120,940

38,340

19,180

13,240

9,280

3

11,120

126,500

150,500

53,880

28,020

17,000

12,440

4

6,820

59,700

146,500

73,280

37,340

21,400

14,800

5

1,740

33,720

102,300

87,720

48,140

28,980

18,560

6

0

12,480

58,980

83,580

56,520

35,260

26,280

7

0

0

57,180

91,800

56,240

41,260

28,640

8

0

0

0

64,740

56,560

42,700

32,920

9

0

0

0

0

40,780

39,480

33,000

10

0

0

0

0

26,840

32,240

27,920

11

0

0

0

0

14,920

22,840

21,920

12

0

0

0

0

8,280

14,660

14,720

13

0

0

0

0

3,740

7,900

8,920

14

0

0

0

0

2,180

4,080

4,900

15

0

0

0

0

1,260

2,100

2,860

16

0

0

0

0

960

1,200

1,540

17

0

0

0

0

520

680

1,000

18

0

0

0

0

420

520

620

19

0

0

0

0

140

340

380

20

0

0

0

0

160

300

280

21

0

0

0

0

240

160

280

22

0

0

0

0

40

100

60

23

0

0

0

0

20

20

80

24

0

0

0

0

60

20

80

25

0

0

0

0

60

40

0

26

0

0

0

0

60

40

80

27

0

0

0

0

80

40

60

28

0

0

0

0

20

40

40

29

0

0

0

0

20

0

40

30

0

0

0

0

340

440

360

Not Stated

402,780

147,540

61,920

31,580

20,240

15,420

12,960

TOTAL

1,192,840

1,005,500

840,860

574,180

451,580

363,000

292,320

Inspection of the data reveals that they have been edited to disallow the recording of high parities in women aged less than 35. The editing rule applied at the preparatory stage would appear to be stricter than the one suggested in the section on evaluation of parity data. Thus reports of 20-24 year old women have been restricted to parity 6 or less (rather than parity 8), reports for those aged 25-29 are truncated at parity 7 (rather than parity 12) and those of 30-34 year olds at parity 8 (rather than 15). However, implausibly high parities have been allowed to remain at ages 35 and more. Therefore, further light editing of the data highlighted in italics in Table 1 could be undertaken by re-assigning to the unknown category reports of parity 19 and over for age group 35-39, parity 23 and over in the age group 40-44, and parity 26 and over in the last age group, 45-49.

An option can be selected on the Introduction tab of the spreadsheet to set implausible parities to ‘not stated’ prior to the application of the method.

Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero

Table 2 presents the revised data, together with the calculation of the proportions of women of parity zero, and parity not stated in each age group.

Table 2 Correction of parity data, and calculation of proportion of women of parity zero, and parity not stated, Kenya, 1989 Census

 

Age group (i)

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Parity

1

2

3

4

5

6

7

0

597,560

198,600

59,400

23,120

14,580

11,040

9,560

1

134,700

224,660

83,140

26,140

13,620

9,460

7,740

2

38,120

202,300

120,940

38,340

19,180

13,240

9,280

3

11,120

126,500

150,500

53,880

28,020

17,000

12,440

4

6,820

59,700

146,500

73,280

37,340

21,400

14,800

5

1,740

33,720

102,300

87,720

48,140

28,980

18,560

6

0

12,480

58,980

83,580

56,520

35,260

26,280

7

0

0

57,180

91,800

56,240

41,260

28,640

8

0

0

0

64,740

56,560

42,700

32,920

9

0

0

0

0

40,780

39,480

33,000

10

0

0

0

0

26,840

32,240

27,920

11

0

0

0

0

14,920

22,840

21,920

12

0

0

0

0

8,280

14,660

14,720

13

0

0

0

0

3,740

7,900

8,920

14

0

0

0

0

2,180

4,080

4,900

15

0

0

0

0

1,260

2,100

2,860

16

0

0

0

0

960

1,200

1,540

17

0

0

0

0

520

680

1,000

18

0

0

0

0

420

520

620

19

0

0

0

0

0

340

380

20

0

0

0

0

0

300

280

21

0

0

0

0

0

160

280

22

0

0

0

0

0

100

60

23

0

0

0

0

0

0

80

24

0

0

0

0

0

0

80

25

0

0

0

0

0

0

0

U

402,780

147,540

61,920

31,580

21,480

16,060

13,540

TOTAL

1,192,840

1,005,500

840,860

574,180

451,580

363,000

292,320

Ui

0.338

0.147

0.074

0.055

0.048

0.044

0.046

Zi

0.501

0.198

0.071

0.040

0.032

0.030

0.033

 

The data include high proportions of women with parity not stated at ages 15-19

( 402,780 1,192,840 =0.338 )

20-24 (0.147) and, to a lesser extent, the older age groups. The proportion of women reported as childless (Z­i) falls rapidly, from around 50 per cent in the first age group down to around 3 per cent at the end of the childbearing period. On these grounds, it is worth investigating whether an el-Badry correction can be applied to the data.

Step 2: Plot the points (Zi, Ui) on a set of axes and evaluate the data

The Zi and Ui are plotted against each other (shown by the blue diamonds) in Figure 1. The straight line fitted to the points is shown by the red line. If a point is excluded from the fitting process, the figure in the spreadsheet represents it with an open diamond.

Figure 1 Fitting of el-Badry correction, Kenya 1989 census 143

There is a clear linear relationship between the plotted points, and all points can be included in the application of an el-Badry correction.

Step 3: Determine the slope and intercept of the best straight line fit

Performing a linear regression of the Zi on the Ui for the selected points gives a value for the intercept (beta) of 0.02745. This suggests that around 2.7 per cent of the data on women’s parities can be regarded as truly missing.

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The revised number of women of zero parity is given by

N i,0 * = N i ( Z i + U i β)

while the revised numbers with parity unknown are calculated by multiplying the total number of women in each age group by β as shown in Table 3. For example, the number of women aged 20–24 estimated to be truly of an unknown parity is given by 0.02745× 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15–19 is derived from 1,192,840× (0.501 + 0.338 – 0.027) = 967,594.

Table 3 Revised estimates of numbers of women with parity not stated and childless women by age, Kenya, 1989 Census

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Revised parity not stated

  32,746

27,603

23,084

15,763

12,397

9,965

8,025

Revised zero parity

967,594

318,537

98,236

38,937

23,663

17,135

15,075

For example, the number of women aged 20-24 estimates to be truly of an unknown parity is given by 0.02745 x 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15-19 is derived from 1,192,840 x (0.501 + 0.338 - 0.027) = 967,594.

Step 5: Calculation of average parities

Since an el-Badry correction has been applied, corrected average parities, presented in Table 4, are then derived using Equation 2.

Table 4 Corrected average parities by age group, Kenya, 1989 Census

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Average parity

0.242

1.525

3.214

4.760

6.239

7.120

7.510

Note that, relative to the average parities produced if the correction is not applied (and assuming therefore that all women with not stated parity are of parity zero), the correction increases the parities in each age group by a constant,

1 1β

Detailed description of the method

The method is fully described in el-Badry (1961). El-Badry’s fundamental insight was that, if it could be assumed that:

1)   there is a linear relationship between the proportions of childless women of a given age in a population, and the proportion of women whose parity is not stated; and

2)   the true, unknown, proportion of women whose parity is not known is a constant and independent of age, then

U i =α Z i * +β
Equation 3

where αZ*i is the proportion of truly childless women reported as parity not stated, and β is the true, constant, proportion of women with parity not stated.

Hence, if αZ*i have been misclassified as not stated when they are truly childless, then

Z i = Z i * α Z i * =(1α) Z i * .

and therefore:

Z i * = Z i (1α)
Equation 4

and substituting this into Equation 3,

U i = α 1a Z i +β=γ Z i +β

where gamma can be thought of as the odds of a childless woman being classified as being of unknown parity.

Thus, a regression of Ui on Zi will give estimates of β (as well as γ and α).

From Equation 3, we then obtain

U i β=α Z i * = Z i * Z i

and hence that

Z i * = N i,0 * = U i β+ Z i

and

U i * =β N i

Note that, even though we have two identities involving Zi, they will only give the same answer when the fit is exact. Convention dictates that we prefer to use Equation 3 rather than Equation 4, on the grounds that it relies on the fitted value of β (the estimated proportion of truly not stated parities) rather than on the value of α, which lacks intuitive interpretability.

After deriving corrected values of Z*i and U*i , average parities can be calculated using Equation 2.

Having applied the correction, care should be taken to ensure that, in every age group, the adjusted number of childless women (that is, of parity zero) is less than the number of women reporting no births in the reference period in response to the question on recent fertility. Hence the revised Z*i can be used to determine the minimum number of women who could not have had a birth in the reference period before the census.

A version of the correction designed for (the now-rare) situations where questions on children ever born are asked only of married women is described in Annex II of Manual X (UN Population Division 1983).

References

el-Badry MA. 1961. “Failure of enumerators to make entries of zero: errors in recording childless cases in population censuses”, Journal of the American Statistical Association 56(296):909–924. doi: http://dx.doi.org/10.1080/01621459.1961.10482134

UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html

Downloads