Tools for Demographic Estimation
Published on Tools for Demographic Estimation (http://demographicestimation.iussp.org)

Home > Evaluation and correction of fertility data

Evaluation and correction of fertility data

Introduction

Author: 
Moultrie TA

The questions on fertility that are usually asked in censuses provide data on both recent births and lifetime fertility. In the 1960s Brass and colleagues observed that these data are typically affected by common errors. In particular, reports on recent births tend to be misreported by mothers of all ages, the consequence of a combination of reference period errors (respondents interpreting the reference period to be other than the interval actually asked about) and omission of neonatal deaths.

Women’s reports of their number of children ever born also tend to be too low. Here the effect is is believed to become worse with increasing age. The reasons advanced for this bias include age exaggeration among teenage mothers, omission of dead children, and omission of older children who have left home. Over-reporting of children ever born may result from the misrepresentation of fostered children as ‘own’ children, or from confusion of still births with live births.

This section describes the investigations that should be pursued in the evaluation and assessment of fertility data collected in a census. The sections describe

  • the assessment of data on lifetime fertility [1], including the el-Badry correction [2] for parities erroneously recorded as "not stated"; and
  • the assessment of data on recent fertility [3]. This section also describes the process of estimating fertility directly from census data.

Assessment of parity data

Author: 
Moultrie TA

Introduction

The first type of question on fertility asked in censuses concerns women’s lifetime fertility. It asks about their total number of live births. In order to reduce underreporting of dead or absent children (who are usually a larger proportion of children born to older women than younger women) and guard against underreporting of girls, the questions are often structured as a series of six questions about the number of sons and daughters:

  • born alive and living with the mother;
  • born alive but living elsewhere; and
  • born alive but now deceased.

Total children born and surviving

The total of the answers to the questions relating to living children, present and absent, provides the total number of children born and surviving. Adding the reported numbers of children dead gives the total number of children ever born to the woman. When summing these individual answers, care must be taken not to treat error or missing value codes as legitimate responses. For example, if a missing value is coded as ‘9’, the procedure for deriving measures of the total children ever born, surviving and dead must make sure to exclude these codes.

Tabulations of the numbers of children reported in response to these questions are often truncated at some relatively high number (e.g. 9+). When this is the case, the only plausible assumption is that women in that category have had the number of children defined by the lower bound of the interval. The resulting errors are generally small, even in the case of extremely high fertility, unless the truncation is applied to the total children ever born, rather than to the separate categories of co-resident, absent, and dead sons and daughters.

Implausible parities

In evaluating the quality of data on lifetime fertility, the analyst should be alert to improbable and implausible parities relative to the age of the mother. Especially at young ages, a small number of women reporting excessively high numbers of children ever born can have a material effect on the estimated mean children ever born. Such errors can result from misreporting, or manual or automatic mis-capturing of the data. A useful rule of thumb is to limit the maximum number of live births that a women may have had to one birth every 18 months from the age of 12, rounding down to the next integer. Using this rubric, by exact age 20 (the end point of the 15-19 age group), a woman might have had a maximum of 5 children; by exact age 25 (closing the 20-24 age group), 8. If the reported number of lifetime births exceeds this maximum, the recorded value should be recoded as ‘missing’.

Assessment of enumerator errors

Another common error in the recording of lifetime fertility is caused by the failure of the enumerator to record responses of ‘zero’ on the census form, leaving the relevant space blank instead. It is impossible to be sure whether a blank means that the enumerator omitted to ask the question or record the response or whether it indicates zero. This error is usually more common in the data on younger women, who are more likely to be childless or answer zero to some of the six questions above. The error in some cases occurs because the enumerator assumes that the question is not relevant for younger women, or feels uncomfortable about asking it. A specific adjustment to the data, the el-Badry correction [2], is often indicated in this case. However, if in every age group the number of women with unstated parity is low (as a guide, less than 2 per cent of the total), then this reporting error is unlikely to have a material impact on the derived average parities and these cases can be ignored in further calculations. This is the same as making the explicit assumption that women with unstated parity have the same average parity as women in the same age group whose parity is known.

Proportions of women childless

The proportions of women who are childless should be calculated by age group of mother. The proportions should decline sharply with age. In most cases there should be around 3-10 per cent of women remaining childless in the oldest age-group, reflecting underlying levels of primary sterility and voluntary childlessness. In low fertility countries the proportion of childless women aged 45-49 may be even higher. Proportions of childless at older ages that exceed 10 per cent should be investigated further, as this may indicate significant errors in the data.

Average parities

A key indication of the consistency of data on women’s lifetime fertility is a credible pattern of average numbers of children alive and dead by age group of mother. In general, one would expect average parities (the average total number of co-resident, absent and dead children born to women) to increase steadily with age. The shape of the distribution by age should be sigmoid, with slightly flatter sections at the beginning and end, reflecting lower fertility at the youngest and oldest ages at which women bear children. Significant parity increments in these age groups – that is, large increases in average parities between successive age groups – are unlikely.

One would also expect average numbers of living children, dead children and the proportion of children dead each to rise with age.

A second check is to compare the observed average parities with results from Demographic and Health Surveys (DHS), or from earlier censuses and other surveys. In this regard, one can compare the average parities for real birth cohorts of women. Thus if two censuses are conducted a decade apart, the average parities of women aged x to x+4 in the earlier census can be compared with those of women aged x+10 to x+14 in the second. Average parities should not only increase monotonically with age within each census, but the cohorts should also show a reasonable parity increment between censuses.

If one has data on women aged 50 and over, one can make direct comparisons of the consistency of the average parities of women who have completed their childbearing – for example, by comparing the average parities of women aged 45-49 in one census with those of women aged 55-59 in a second census conducted a decade later. In making comparisons of this sort, and especially with comparisons involving older women, one should be alert to the possibility that mortality might differ according to the number of children a woman has had, either directly or because high fertility and socio-economic status may be correlated. This may hinder the ability to draw definitive conclusions about the trend in lifetime fertility.

A further refinement suggested by Feeney (1991) that is possible where there is information on the average parities of women who have completed their childbearing, is to locate these parities approximately in time and plot them. The approximate time location is derived by assuming that the average parities refer to a point time defined by subtracting the mid-point of each age group from the census date and assuming that all births in each cohort occurred at some mean age of childbearing, m. Thus, assuming m = 27.5 for example, if a census was conducted in 1960, the average parities of women aged 50-54 would refer (approximately) to 1960 - 52.5 + 27.5, or 1935.

The average parity of women of a given age x, Px, is calculated by dividing the total number of children ever born to women aged x at the census date by the number of women aged x at the census:

P x = ∑ j=0 ω j. N x,j ∑ j=0 ω N x,j

where Nx,j is the number of women aged x and of parity j in the population, and omega (ω) is the upper limit of the parities recorded in the population after excluding numerical values assigned as error codes in the data. In five-year age groups, the average parity of women in each age group is given by

5 P x = ∑ j=0 ω j . 5 N x,j ∑ j=0 ω 5 N x,j

for x=15, 20, … , 45.

For ease of exposition of many methods, average parities in five-year age groups, 15-19, 20-24, … are often indexed as P(i), i=1, 2 … , where P(1) refers to the 15-19 age group, P(2) the 20-24 age group etc.

Comparison with other estimates of average parities

Where other fertility data are available for the same country at a roughly similar point in time, the estimates should be compared. Where the estimates diverge to any great degree, efforts should be made to understand why this might be the case, although it will often be impossible to conclude definitively which of the data sets is deficient.

Comparison with total fertility

As a final check, the average parity for the 45-49 age group should be compared with the estimated total fertility (TF) derived from the data on recent fertility. If fertility has been constant for a long time, and the data were accurately reported, the two measures should be very close since period and cohort fertility would be equal under these conditions. If fertility has been falling, the average parity of older women should be greater than TF. As errors of underreporting of recent fertility will artificially depress TF, while omission of older women’s births will artificially depress the average parity in that group, it is important to ensure that both measures are plausible. One method of doing this uses the relational Gompertz model [4] to examine the fertility and parity distributions and their implied relationship.

Example: Assessment of data on lifetime fertility

The example below uses the data from the 2008 Census of Cambodia distributed by IPUMS. The data (weighted, to compensate for the fact that the IPUMS data represent only a microsample of the full data) are presented in Table 1.

Table 1 Total children ever born by age group of mother, Cambodia, 2008 Census

 

Age group of mother

Parity

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Total

0

743,190

426,760

191,720

58,530

46,650

36,050

28,780

1,531,680

1

29,560

167,810

142,720

44,310

34,530

25,790

21,740

466,460

2

4,240

78,410

171,450

90,990

79,080

51,980

36,680

512,830

3

1,200

16,940

82,960

84,220

98,640

67,690

48,190

399,840

4

830

4,020

26,870

48,510

79,480

70,400

56,190

286,300

5

430

1,340

6,910

21,010

49,250

56,980

51,500

187,420

6

270

630

2150

8,710

26,020

37,070

41,420

116,270

7

120

380

630

3,410

12,530

23,730

29,680

70,480

8

80

200

400

1,000

5,450

12,180

18,320

37,630

9

60

100

120

350

2410

6,030

10,040

19,110

10

40

120

140

190

1090

3,120

5,660

10,360

11

50

0

70

70

360

1,420

2,010

3,980

12

20

50

20

30

170

670

1,350

2,310

13

10

10

0

10

60

270

410

770

14

0

10

10

0

10

60

190

280

15

0

0

10

0

20

90

150

270

16

0

0

0

0

0

10

30

40

17

0

0

0

0

0

10

30

40

18

0

0

0

0

0

0

20

20

19

0

0

0

0

0

0

10

10

20

0

0

0

20

0

0

0

20

Unknown

220

380

250

290

130

210

120

1,600

TOTAL

780,320

697,160

626,430

361,650

435,880

393,760

352,520

3,647,720

 

The red italicized cell counts represent implausible parities according to the rule-of-thumb set out earlier. The values in these cells are summed and this total is added to the total number of women in each age group whose parity was missing in Table 1. The original values are then set to zero, resulting in the distribution shown in Table 2.

Table 2 Total children ever born by age group of mother after correcting for implausible parities, Cambodia, 2008 Census


Age group of mother
Parity

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Total

0

743,190

426,760

191,720

58,530

46,650

36,050

28,780

1,531,680

1

29,560

167,810

142,720

44,310

34,530

25,790

21,740

466,460

2

4,240

78,410

171,450

90,990

79,080

51,980

36,680

512,830

3

1,200

16,940

82,960

84,220

98,640

67,690

48,190

399,840

4

830

4,020

26,870

48,510

79,480

70,400

56,190

286,300

5

430

1,340

6,910

21,010

49,250

56,980

51,500

187,420

6

0

630

2150

8,710

26,020

37,070

41,420

116,000

7

0

380

630

3,410

12,530

23,730

29,680

70,360

8

0

200

400

1,000

5,450

12,180

18,320

37,550

9

0

0

120

350

2410

6,030

10,040

18,950

10

0

0

140

190

1090

3,120

5,660

10,200

11

0

0

70

70

360

1,420

2,010

3,930

12

0

0

20

30

170

670

1,350

2,240

13

0

0

0

10

60

270

410

750

14

0

0

0

0

10

60

190

260

15

0

0

0

0

20

90

150

260

16

0

0

0

0

0

10

30

40

17

0

0

0

0

0

10

30

40

18

0

0

0

0

0

0

20

20

19

0

0

0

0

0

0

10

10

20

0

0

0

0

0

0

0

0

Unknown

870

670

270

310

130

210

120

2,580

TOTAL

780,320

697,160

626,430

361,650

435,880

393,760

352,520

3,647,720

Proportion missing

0.111%

0.096%

0.043%

0.086%

0.030%

0.053%

0.034%


Proportion childless

95.24%

61.21%

30.61%

16.18%

10.70%

9.16%

8.16%


Average parities

0.0604

0.5833

1.4382

2.4035

3.1670

3.8126

4.3184


 

The proportion of women whose parity is unknown after making this adjustment is shown in the third last row of Table 2. In every age group, the proportion of women for whom parity data are missing is trivial. Although the proportion is somewhat higher in younger than in older age groups, even in the 15-19 age group only 0.11 per cent of women’s parities are unknown or implausible. An el-Badry correction is therefore unnecessary and the unknown cases can be excluded from the calculation of average parities, thereby implicitly assuming that women with implausible or missing data have the same average parities as other women of the same age. (The data presented here were chosen because an el-Badry correction is not required. The section of the manual dealing with the el-Badry correction [2] presents data from another country whose parity data are not of as good quality.) 

The proportion of women reported to be childless, shown in the second last line of Table 2, declines rapidly with age: by age 40, less than 10 per cent of women are still childless. As expected, this proportion falls only slightly further between the last two age groups: not many women start their childbearing after age 40. The proportion of women aged 45-49 who are childless (8.2 per cent) is relatively high. The average parities suggest very low levels of fertility in teenage girls, with lifetime fertility increasing to 4.3 children per woman in the 45-49 age group. A plot of the average parities has a sigmoid shape, with the largest parity increments occurring to women in their 20s and early 30s, the ages where fertility is expected to be highest (Figure 1).

Figure 1 Average parities by age group, Cambodia 2008 census, 2005 DHS and 2010 DHS142 [5]

Figure 1 also shows the average parities by age group according to the 2005 and 2010 Cambodian Demographic and Health Surveys (available from the www.statcompiler.com [6] DHS website). The average parities reported in the census and the 2010 survey are very similar. However, two features suggest one should be wary of concluding that this implies that they are accurate. First, given the timing of the three enquiries, the data from the census should lie approximately half-way between the estimates from the two DHSs. This is not the case. Second, it can be seen that the average parity of women aged 40-44 in the 2005 DHS is a little higher (by 0.2 of a child) than that of women aged 45-49 in the 2010 DHS. While fertility is low among women in their late 40s in Cambodia, and random error cannot be discounted, this result should encourage a little scepticism about the data. However, overall, the average parities from the two DHSs are not fundamentally at odds with those indicated by the 2008 census.

References

Feeney G. 1991. "Child survivorship estimation: Methods and data analysis", Asian and Pacific Population Forum 5(2-3):51-55, 76-87. http://hdl.handle.net/10125/3600 [7].

 

 

 

The el-Badry correction

Author: 
Moultrie TA

Description of the method

The el-Badry correction is a method for correcting errors in data on children ever born caused by the enumerator or respondent failing to record answers of ‘zero’ to questions on lifetime fertility and, instead, leaving the response blank. When this occurs, during data processing the response is coded as ‘missing’ or ‘unknown’, even though it was evident to the enumerator at the time of data collection that the correct answer was ‘zero’. The method apportions the number of women whose parity is recorded as ‘missing’ between those whose parity is regarded as being truly unknown, and those women who should have been recorded as childless but whose responses were left blank. It does this apportionment at an aggregate level and not on an individual basis.

Data required and assumptions

The method requires the number of children ever born, classified by age group of mother, including the count of women with missing data (i.e., where the field was left blank or contained an out-of-range code or a code for not answered or refused).

The method assumes that a constant proportion of women at each age truly did not state their lifetime fertility (i.e. parity) at the time of data collection. The balance of the women with unreported parities is assumed to be erroneously recorded as not stated when the women are, in fact, childless.

Caveats and warnings

The method relies on the existence of a linear relationship between the proportions of women whose parity is not stated, and that of women reported to be childless. If such a linear relationship is observed,  the adjusted denominator used to calculate average parities should exclude those women whose parity (after correction) is still regarded as unknown. This reflects the implicit assumption that these women’s parity distribution is no different from those of women of the same age whose parity is known.

Where the data indicate that a correction is needed because of the large proportion of missing parity information but the method cannot be applied (for example, due to unavailability of data by age, or violation of the assumption of linearity), women of unknown parity should be included in the denominator used to determine average parities. This implicitly assumes that the parity of all such women is zero (i.e. that all women of unknown parity are childless). This will, of course, result in under-estimated average parities, as not all women of unknown parity are indeed childless.

Application of method

We define

  N i = N 5 a  

for a = 15, 20, …, 45 and i=a/5-2, to be the number of women in age group i in the population. Thus, N1 represents the number of women aged 15-19 in the population. Denote Ni,j to be the number of women in age group i of parity j, and Ni,u to be the number of women in age group i whose parity is unknown.

Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero

Extract a table of reported children ever born (j) by women’s age group (i) from the census data to obtain Ni,j. Missing data on parity (i.e. blank fields and invalid codes) should be combined with codes for parity not stated for each age group to produce Ni,u. The proportion of women in age group i with parity unknown is then

U i = N i,u N i

The proportion of women in age group i who are reportedly childless (i.e. are of parity zero) is given by

Z i = N i,0 N i

If the Ui are small (less than 2 per cent in each age group), it is not worth applying the correction. In such a situation, average parities should be determined by assuming that the parity distribution of women with not stated parity is the same as that of women whose parity is known, by omitting the women with unstated parities from the denominator of the calculation. Thus, if Pi is the average parity of women in age group i,

P i = ∑ j=0 ω j. N i,j ∑ j=0 ω N i,j

If the proportions of women with parity not stated exceed 2 per cent, it is worth assessing whether the correction can be applied.

Step 2: Plot the points (Zi, Ui) and evaluate the data

For the method to work correctly, the series of points (Zi, Ui) should lie on, or very close to, a straight line. In some cases, curvature may be observed in the data points corresponding to either the oldest or the youngest ages. If the curvature affects the older ages only, even if it is quite extreme, it is acceptable to exclude the oldest, or two oldest, age groups from the fitting process and fit a straight line to the remaining points since the method has the greatest absolute impact on the proportions not stated at the youngest ages. If the curvature is most noticeable among the younger women, the method should not be used as exclusion of the data points relating to women aged 15-24 would result in the regression performing an out-of-sample extrapolation, the results of which could suggest illogical adjustments in these age groups.

If a strongly linear relationship cannot be identified, even after excluding one or two data points from older women, the method cannot be applied. In this situation, it is preferable to assume that all women of not stated parity are childless, and to include them in the denominator of the average parity calculation

P i = ∑ j=0 ω j. N i,j N i  
Equation 1

The analytical report should note that this has been done, and that, therefore, the average parity values are liable to be underestimated.

Step 3: Determine the slope and intercept of the best straight line fit to the data

The slope (γ) and intercept (β) of the fitted line are found by means of linear regression of Zi against Ui applied to those data points selected for inclusion, that is,

  U i =β+γ Z i  

The intercept (β), which is independent of age (i), is the estimate of the proportion of those women in each age group with unknown parity whose parity is deemed to be truly unknown, and not misreported.

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The adjusted proportion of women in age group i that is estimated to be truly childless is given by

  Z i * = Z i + U i −β 

That is, the revised proportion of women of zero parity in any age group is the proportion actually recorded as being of zero parity together with the proportion of women in that age group of not stated parity less the estimated proportion of women whose parity is regarded as being truly unknown. The revised estimate of the number of childless women in age group i is given by

  N i,0 * = N i  ×  Z i *  

Thus, the estimated true proportion of women in each age group whose parity is unknown is given by

  N i,u * = N i  × β 

The

  N i,j *  

for other parities (j > 0) are unchanged.

Step 5: Calculation of average parities

If an el-Badry correction has been applied to the data, the average parities are given by

P i = ∑ j=0 ω j. N i,j * (1−β) N i  
Equation 2

embodying the assumption that the remaining women in age group i of unknown parity, βNi, who are omitted from the denominator, have the same average parity as the women in age group i whose parity is known.

Interpretation and checks

The value of β shows the estimated proportion of women whose parity is truly not stated. Larger values of β are therefore associated with poorer quality data.

Occasionally, the method may have a contrary effect and suggest that the number of women with not-stated parity is understated, and that the number of women of reported parity zero should be reduced. Such a situation will arise if β > Ui. If this is so, the correction should not be applied to that age group. 

Worked example

The accompanying spreadsheet [8] implements the method using data from the 1989 Kenya Census data obtained from IPUMS. The original data are presented in Table 1.

Table 1 Children ever born, by age group of mother at census date, Kenya, 1989 Census

 

Age group (i)

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Parity

1

2

3

4

5

6

7

0

597,560

198,600

59,400

23,120

14,580

11,040

9,560

1

134,700

224,660

83,140

26,140

13,620

9,460

7,740

2

38,120

202,300

120,940

38,340

19,180

13,240

9,280

3

11,120

126,500

150,500

53,880

28,020

17,000

12,440

4

6,820

59,700

146,500

73,280

37,340

21,400

14,800

5

1,740

33,720

102,300

87,720

48,140

28,980

18,560

6

0

12,480

58,980

83,580

56,520

35,260

26,280

7

0

0

57,180

91,800

56,240

41,260

28,640

8

0

0

0

64,740

56,560

42,700

32,920

9

0

0

0

0

40,780

39,480

33,000

10

0

0

0

0

26,840

32,240

27,920

11

0

0

0

0

14,920

22,840

21,920

12

0

0

0

0

8,280

14,660

14,720

13

0

0

0

0

3,740

7,900

8,920

14

0

0

0

0

2,180

4,080

4,900

15

0

0

0

0

1,260

2,100

2,860

16

0

0

0

0

960

1,200

1,540

17

0

0

0

0

520

680

1,000

18

0

0

0

0

420

520

620

19

0

0

0

0

140

340

380

20

0

0

0

0

160

300

280

21

0

0

0

0

240

160

280

22

0

0

0

0

40

100

60

23

0

0

0

0

20

20

80

24

0

0

0

0

60

20

80

25

0

0

0

0

60

40

0

26

0

0

0

0

60

40

80

27

0

0

0

0

80

40

60

28

0

0

0

0

20

40

40

29

0

0

0

0

20

0

40

30

0

0

0

0

340

440

360

Not Stated

402,780

147,540

61,920

31,580

20,240

15,420

12,960

TOTAL

1,192,840

1,005,500

840,860

574,180

451,580

363,000

292,320

Inspection of the data reveals that they have been edited to disallow the recording of high parities in women aged less than 35. The editing rule applied at the preparatory stage would appear to be stricter than the one suggested in the section on evaluation of parity data [1]. Thus reports of 20-24 year old women have been restricted to parity 6 or less (rather than parity 8), reports for those aged 25-29 are truncated at parity 7 (rather than parity 12) and those of 30-34 year olds at parity 8 (rather than 15). However, implausibly high parities have been allowed to remain at ages 35 and more. Therefore, further light editing of the data highlighted in italics in Table 1 could be undertaken by re-assigning to the unknown category reports of parity 19 and over for age group 35-39, parity 23 and over in the age group 40-44, and parity 26 and over in the last age group, 45-49.

An option can be selected on the Introduction tab of the spreadsheet to set implausible parities to ‘not stated’ prior to the application of the method.

Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero

Table 2 presents the revised data, together with the calculation of the proportions of women of parity zero, and parity not stated in each age group.

Table 2 Correction of parity data, and calculation of proportion of women of parity zero, and parity not stated, Kenya, 1989 Census

 

Age group (i)

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Parity

1

2

3

4

5

6

7

0

597,560

198,600

59,400

23,120

14,580

11,040

9,560

1

134,700

224,660

83,140

26,140

13,620

9,460

7,740

2

38,120

202,300

120,940

38,340

19,180

13,240

9,280

3

11,120

126,500

150,500

53,880

28,020

17,000

12,440

4

6,820

59,700

146,500

73,280

37,340

21,400

14,800

5

1,740

33,720

102,300

87,720

48,140

28,980

18,560

6

0

12,480

58,980

83,580

56,520

35,260

26,280

7

0

0

57,180

91,800

56,240

41,260

28,640

8

0

0

0

64,740

56,560

42,700

32,920

9

0

0

0

0

40,780

39,480

33,000

10

0

0

0

0

26,840

32,240

27,920

11

0

0

0

0

14,920

22,840

21,920

12

0

0

0

0

8,280

14,660

14,720

13

0

0

0

0

3,740

7,900

8,920

14

0

0

0

0

2,180

4,080

4,900

15

0

0

0

0

1,260

2,100

2,860

16

0

0

0

0

960

1,200

1,540

17

0

0

0

0

520

680

1,000

18

0

0

0

0

420

520

620

19

0

0

0

0

0

340

380

20

0

0

0

0

0

300

280

21

0

0

0

0

0

160

280

22

0

0

0

0

0

100

60

23

0

0

0

0

0

0

80

24

0

0

0

0

0

0

80

25

0

0

0

0

0

0

0

U

402,780

147,540

61,920

31,580

21,480

16,060

13,540

TOTAL

1,192,840

1,005,500

840,860

574,180

451,580

363,000

292,320

Ui

0.338

0.147

0.074

0.055

0.048

0.044

0.046

Zi

0.501

0.198

0.071

0.040

0.032

0.030

0.033

 

The data include high proportions of women with parity not stated at ages 15-19

 ( 402,780 1,192,840 =0.338 ) 

20-24 (0.147) and, to a lesser extent, the older age groups. The proportion of women reported as childless (Z­i) falls rapidly, from around 50 per cent in the first age group down to around 3 per cent at the end of the childbearing period. On these grounds, it is worth investigating whether an el-Badry correction can be applied to the data.

Step 2: Plot the points (Zi, Ui) on a set of axes and evaluate the data

The Zi and Ui are plotted against each other (shown by the blue diamonds) in Figure 1. The straight line fitted to the points is shown by the red line. If a point is excluded from the fitting process, the figure in the spreadsheet represents it with an open diamond.

Figure 1 Fitting of el-Badry correction, Kenya 1989 census 143 [9]

There is a clear linear relationship between the plotted points, and all points can be included in the application of an el-Badry correction.

Step 3: Determine the slope and intercept of the best straight line fit

Performing a linear regression of the Zi on the Ui for the selected points gives a value for the intercept (beta) of 0.02745. This suggests that around 2.7 per cent of the data on women’s parities can be regarded as truly missing.

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The revised number of women of zero parity is given by

  N i,0 * = N i ( Z i + U i −β) 

while the revised numbers with parity unknown are calculated by multiplying the total number of women in each age group by β as shown in Table 3. For example, the number of women aged 20–24 estimated to be truly of an unknown parity is given by 0.02745× 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15–19 is derived from 1,192,840× (0.501 + 0.338 – 0.027) = 967,594.

Table 3 Revised estimates of numbers of women with parity not stated and childless women by age, Kenya, 1989 Census

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Revised parity not stated

  32,746

27,603

23,084

15,763

12,397

9,965

8,025

Revised zero parity

967,594

318,537

98,236

38,937

23,663

17,135

15,075

For example, the number of women aged 20-24 estimates to be truly of an unknown parity is given by 0.02745 x 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15-19 is derived from 1,192,840 x (0.501 + 0.338 - 0.027) = 967,594.

Step 5: Calculation of average parities

Since an el-Badry correction has been applied, corrected average parities, presented in Table 4, are then derived using Equation 2.

Table 4 Corrected average parities by age group, Kenya, 1989 Census

 

15-19

20-24

25-29

30-34

35-39

40-44

45-49

Average parity

0.242

1.525

3.214

4.760

6.239

7.120

7.510

Note that, relative to the average parities produced if the correction is not applied (and assuming therefore that all women with not stated parity are of parity zero), the correction increases the parities in each age group by a constant,

  1 1−β  

Detailed description of the method

The method is fully described in el-Badry (1961). El-Badry’s fundamental insight was that, if it could be assumed that:

1)   there is a linear relationship between the proportions of childless women of a given age in a population, and the proportion of women whose parity is not stated; and

2)   the true, unknown, proportion of women whose parity is not known is a constant and independent of age, then

U i =α Z i * +β
Equation 3

where αZ*i is the proportion of truly childless women reported as parity not stated, and β is the true, constant, proportion of women with parity not stated.

Hence, if αZ*i have been misclassified as not stated when they are truly childless, then

  Z i = Z i * −α Z i * =(1−α) Z i * . 

and therefore:

Z i * = Z i (1−α)
Equation 4

and substituting this into Equation 3,

U i = α 1−a Z i +β=γ Z i +β

where gamma can be thought of as the odds of a childless woman being classified as being of unknown parity.

Thus, a regression of Ui on Zi will give estimates of β (as well as γ and α).

From Equation 3, we then obtain

  U i −β=α Z i * = Z i * − Z i  

and hence that

  Z i * = N i,0 * = U i −β+ Z i  

and

  U i * =β N i  

Note that, even though we have two identities involving Zi, they will only give the same answer when the fit is exact. Convention dictates that we prefer to use Equation 3 rather than Equation 4, on the grounds that it relies on the fitted value of β (the estimated proportion of truly not stated parities) rather than on the value of α, which lacks intuitive interpretability.

After deriving corrected values of Z*i and U*i , average parities can be calculated using Equation 2.

Having applied the correction, care should be taken to ensure that, in every age group, the adjusted number of childless women (that is, of parity zero) is less than the number of women reporting no births in the reference period in response to the question on recent fertility. Hence the revised Z*i can be used to determine the minimum number of women who could not have had a birth in the reference period before the census.

A version of the correction designed for (the now-rare) situations where questions on children ever born are asked only of married women is described in Annex II of Manual X (UN Population Division 1983).

References

el-Badry MA. 1961. “Failure of enumerators to make entries of zero: errors in recording childless cases in population censuses”, Journal of the American Statistical Association 56(296):909–924. doi: http://dx.doi.org/10.1080/01621459.1961.10482134 [10]

UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html [11]

Downloads

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet iconEl_Badry (17/02/2020) [12]
FE_elBadry.xlsx 45.88 KB
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet iconEl_Badry (French) (17/02/2020) [13]
FE_elBadry_FR.xlsx 51.22 KB

Evaluation of data on recent fertility from censuses

Author: 
Moultrie TA

Introduction

Before evaluating the data on recent fertility collected in a census, it is important to examine the precise wording of the questions used to capture information on recent births by consulting the questionnaire. Over successive waves of censuses, and in different countries, widely different questions have been used. The wording can influence the validity of the estimates and the direction and magnitude of biases or errors in the data.

The generic forms of the census questions on recent fertility fall into three broad categories:

  • Did you give birth in the last year (or other reference period)?

This question produces a simple binary answer. Multiple births in the same reference period are not captured. These could arise from the birth of twins or triplets from a single pregnancy, or from a very short birth interval separating two different pregnancies. Neither of these outcomes is likely to influence the overall fertility rate to a large extent in that birth intervals shorter than a year are rare, and the probability that a pregnancy will result in multiple births is less than 2 per cent in most settings (sub-Saharan Africa being a possible exception). When faced with data collected in this form, it is recommended that the simplifying assumption be made that all births occurred halfway through the reference period, and that only one live birth resulted from each pregnancy.

  • How many children have you given birth to in the last year (or other reference period)?

This question is more refined than the first form given above. It does not yield information on the timing of birth within the reference period, but it does capture information on multiple births to the same woman, without distinguishing between twins and short birth intervals. Again, it is reasonable for the purposes of calculation to assume that the births occurred halfway through the reference period.

  • What was the date of your last live birth?

This question seeks to identify the timing of the last delivery with a greater degree of accuracy, although typically only the month and year of the last birth are recorded. If there are follow-up questions on the number of births that occurred at that time these give more accurate information on the number of recent births.

Additional questions (for example, on the survival of the last born child; the sex of last born child; or the date of the last-but-one birth) are occasionally encountered. Answers to such questions can be used, for example, to estimate, directly from the data, child mortality rates by sex or a sex ratio at birth.

In evaluating the quality of data on recent fertility, the following checks might be conducted:

  1. Comparison of the total number of births with that expected (for example, against numbers from a vital registration system, or from application of an accurate series of age-specific fertility rates to the enumerated population of women – although in the latter case, systematic under-enumeration of the women might also cause the rates to be underestimated).
  2. Assessment of the plausibility of the distribution of age-specific fertility rates calculated directly from the data. Plausible fertility distributions are almost invariably unimodal, concave, slightly right-skewed, and close to zero at the extremes of the childbearing age range. The distribution should also exhibit a reasonably smooth progression of fertility rates from one age to the next;
  3. Plausibility checks on the reported numbers of births in the reference period. In some censuses (e.g. South Africa 1996), a significant proportion of respondents confused the questions on lifetime and recent fertility, and gave the same answers to both questions. This error manifests itself in a strong diagonal in tabulations of children ever born by children born in the last year by age of mother (Moultrie and Timæus 2002);
  4. If data on the sex of the last born child have been collected, the reported sex ratio at birth should be checked. The sex ratio at birth is usually about 1.05, but could be as low as 0.95 in African populations and up to 1.1 in some Asian populations. Values outside the range of 0.99 to 1.06 should be subjected to careful scrutiny.

In all cases, care must be taken to identify correctly the universe of women required to answer the questions (in particular the ages and marital status of eligible respondents), as well as the rules governing recording and coding of non-response and incorrect data.

Assessment of current fertility data

Before proceeding with an analysis of age-specific fertility rates, it is advisable to investigate the extent to which the data on recent births are missing or implausible. The absence of any missing data almost certainly indicates that the data have been edited. If this is suspected, further investigations into the extent of editing and/or imputation of the data are recommended to the extent that this is possible, for example through examination of the distribution of imputed values where imputation flag variables are included in the data.

The proportion of the data that is missing should also be checked. If this exceeds five per cent of the total number of records relevant for current fertility data, further investigations should be done. In particular, one should examine the age distribution of missing cases. If these are concentrated among young women or women in their forties this would suggest that the missing cases are missing because these mothers did not have a birth in the reference period, and no answer was recorded by the enumerator rather than an entry of zero being made. This is an error very similar to that giving rise to the el-Badry correction [2].

When the data are tabulated by the number of births in the reference period (as opposed to simply whether or not a birth occurred in the reference period), the distribution of single versus multiple births should be investigated. Generally, less than 2 per cent of pregnancies result in multiple births. Triplets and higher order multiple births are exceedingly rare (less than 0.5 per cent of deliveries). If the proportion of multiple births in the reference period seems too high, it is recommended that tabulations of children ever born and births in the last year are produced for each age group of women. If children ever born and births in the last year are equal in a large proportion of cases, even for parities two and over, this may suggest that respondents or enumerators did not understand the distinction between the questions on lifetime and recent fertility. However, it is possible that a large proportion of younger women with only one child ever born gave birth to that child in the reference period and a close match between lifetime reports of just one birth and recent reports of one birth in young women may not indicate reporting errors.

Direct measurement of fertility from census data

When the data are of sufficient quality, it is possible to estimate age-specific fertility rates directly. When the data are of inferior quality, age-specific fertility rates from the direct calculation are used as inputs into various methods that aim to produce more reliable estimates of the level of fertility using indirect techniques.

The exact form of the age-specific fertility rates that can be derived hinges on the nature of the data collected. An age-specific fertility rate at any given age (or in any age group) is the ratio of the number of births to women of that age (in that age group) in a defined period to the number of person-years lived by women of the same age (in the same age group) in that time period. To calculate age-specific fertility rates exactly, one would need to know reliably the exact dates of birth of mothers (to establish the mother’s age) and their children. One can then calculate precisely the age of the mother at the birth of her child, as well as allocate her exposure to risk to the relevant ages or age groups over the period of investigation.

The data required for such precise calculations are not usually available in census microdata records, either because exact dates were not collected in the first place, or because of the potential for breaching confidentiality if full dates of birth are provided to end-users of the data. In addition, census data are often of insufficient quality to warrant the additional precision. Heaping of months of birth (e.g. on January) as well as years of birth (e.g. those ending in 0 or 5) are commonly encountered problems. Extended census enumeration periods can introduce problems with translating a reference period (e.g. within the last year from the interview date) to a calendar time period (e.g. 2008). Furthermore, retrospective questions about recent births asked in a census fail to capture information about births to mothers who have since died or left the country.

Four possible combinations of reporting of mother’s vital information, and recent births, are typically encountered (Table 1).

Table 1 Taxonomy of data on mother and children for estimating recent fertility

 

 

Mothers’ vital information

 

 

Age in completed years at census

Date of birth
(at least month and year)

Reporting of children born in the preceding period(*)

Number of children born
(or simple binary, yes/no)

(1)

(2)

Date of birth of last born child
(at least month and year)

(3)

(4)

(*)Typically the preceding period is 12 months, but analysts should be alert to non-standard reference periods, for example based on time elapsed since an important national event or holiday

Even in the fourth case identified in Table 1, which contains the most detailed information, expending effort to calculate accurately the exposure to risk for the purposes of estimating fertility is not generally warranted, as heaping of dates on particular months and other data quality problems could severely distort the resulting estimates. Thus, use of simple approximations for the calculation of fertility rates from census data is usually appropriate. The section on the direct measurement of fertility from survey data [14] describes the more precise calculation of the exposure to risk and estimation of fertility rates from data of good quality.

Cases 1 and 2: Estimation of age-specific fertility rates directly from the data when no information is available on the timing of the child’s birth

In the first two cases identified in Table 1, all that might be known about the mother’s recent fertility is whether or not she gave birth to at least one child in the period before the census. In more informative variants of the recent fertility question, the mother may be asked about the number of live births in the period preceding the census. Such a question allows the identification both of multiple births from the same pregnancy (twins, triplets etc.), as well as instances of more than one pregnancy ending in the defined period.

Since the mother’s age at birth is not known, the approximation usually used is to tabulate the fertility rates by the reported age of the mother at the census date. The additional assumption is then made that all births occurred half-way through the interval in question. This means that mothers are, on average, older by half the interval length at the time of the census, with the implication that the ages to which the fertility rates actually refer are younger than the reported ages at census. Most standard methods of estimating fertility indirectly compensate for the displacement of ages arising from this mismatch.

The additional information (on mother’s month and year of birth) available in the second case is not particularly helpful in refining the estimates of fertility since additional assumptions of uniformity of the distribution of children’s birthdays are still required. Thus, where the data that were collected fall into either the first or the second case identified in Table 1, fertility rates are estimated by dividing the count of children reported born in the reference period (by age of mother at the census date) by the number of women of that age. The total number of births in the reference period reported by women aged x at the census date, Bx, is given by

B x = ∑ k=0 ω k. N x,k

where k is the reported number of births in the reference period, ω is the maximum value of k in the data and Nx,k is the number of women aged x at the census reporting k births in the last year. If ω is classified as an open interval, e.g. 3+ births in the reference period, women in that category are all assumed to have had the number of births that opens that interval. Again, the error thus introduced is small.

The number of women aged x is given by

N x = ∑ k=0 ω N x,k

Women whose recent births are unknown or unrecorded must be excluded from both the numerator and denominator, with the implicit assumption that their fertility is no different from that of women whose recent fertility is known. Age-specific fertility rates (ASFRs) at age x are given by

f x = B x / N x

Using the conventional age range (from 15 to 49, inclusive) as the limits for the summation, the implied Total Fertility (TF) from the single-age data is

TF= ∑ a=15 49 f a

Total fertility is a synthetic cohort measure – indicating the number of children a woman would have if she survives to age 50 (deemed to be the end of childbearing) and experiences the age-specific fertility rates currently observed immediately before the census throughout her reproductive life.

Fertility rates by single years of age should be calculated and plotted to check the internal coherence of the data. The ASFRs will tend to be less erratic than either the numerators or the denominators on their own, and may indicate plausible levels and distributions of fertility. A highly erratic series of age-specific fertility rates by age, departing markedly from the anticipated n-shape, offers a strong indication that the recent fertility data are problematic, and suggests that further investigations are required.

Finally, age-specific fertility rates in conventional five-year bands, 5fx, where x = 15, 20,…,45, can be derived:

f i = f 5 x = ∑ x=5i+10 5i+14 B x ∑ x=5i+10 5i+14 N x

where the index, i, is determined by the relation i=(x/5) - 2. The measure of total fertility is thus

TF=5. ∑ i=1 7 f i

While the TF is an age-standardized measure of fertility (implicitly assuming a uniform distribution of the population of child-bearing population of women by age in each age group), the fertility rate in any age group is not standardized within the group. As a result, the TF derived from calculations using age-groups and single years of age will differ to a small degree, typically in the second or third decimal place.

Total fertility should be compared with estimates from other data sources from the same country (e.g. DHS). It is worth remembering, however, that the ASFRs and TF produced using this method do not take into account the true exposure-to risk in the derivation of the denominator. In addition, the numerator includes events that took place during the reference period categorized by the age of the mother at the end of the reference period, not by her age at the time the event took place. Most methods of indirect fertility estimation adjust the derived fertility rates to account for this age shift. For purposes of basic comparison (that is, assessing the shape and level of the fertility distributions), the differences in classification by age are not of major importance. However, the F-only variant of the relational Gompertz model [4] provides a method of unshifting fertility rates while smoothing them, should this be desired.

Example: Direct calculation of fertility

In the 2008 Cambodian Census, women were asked about the number of children they gave birth to in the previous year. Mother’s age was classified by age at the census date. The data are shown in Table 2.

Table 2 Recent fertility by age of mother at the census date, Cambodia, 2008 Census

 

Births in the last year

 

 

 

Age

0

1

2

3

4

Missing

Births

Women

ASFR

15

160,980

120

0

0

0

80

120

161,180

0.0007

16

152,710

500

0

0

0

50

500

153,260

0.0033

17

144,970

1,250

10

10

0

20

1,300

146,260

0.0089

18

182,500

3,540

20

0

0

40

3,580

186,100

0.0192

19

127,840

5,640

10

0

0

30

5,660

133,520

0.0424

20

147,990

8,840

80

0

0

90

9,000

157,000

0.0574

21

123,960

9,500

30

0

0

70

9,560

133,560

0.0716

22

126,030

11,600

80

0

0

30

11,760

137,740

0.0854

23

123,750

11,830

70

10

0

110

12,000

135,770

0.0885

24

121,820

11,010

150

10

20

80

11,420

133,090

0.0859

25

137,460

12,420

100

0

0

60

12,620

150,040

0.0841

26

115,370

11,320

110

0

0

80

11,540

126,880

0.0910

27

117,840

11,580

190

0

0

40

11,960

129,650

0.0923

28

118,270

10,690

110

0

10

30

10,950

129,110

0.0848

29

82,990

7,600

120

0

0

40

7,840

90,750

0.0864

30

77,690

5,950

40

10

0

30

6,060

83,720

0.0724

31

58,800

4,820

50

20

0

30

4,980

63,720

0.0782

32

67,110

4,480

150

20

0

110

4,840

71,870

0.0674

33

67,080

4,240

40

0

0

50

4,320

71,410

0.0605

34

67,010

3,800

30

10

10

70

3,930

70,930

0.0555

35

90,720

4,570

60

20

0

30

4,750

95,400

0.0498

36

77,950

3,800

10

10

0

30

3,850

81,800

0.0471

37

81,320

4,070

50

10

10

10

4,240

85,470

0.0496

38

92,290

3,780

30

20

30

30

4,020

96,180

0.0418

39

74,030

2,920

50

0

0

30

3,020

77,030

0.0392

40

88,940

2,720

70

10

10

50

2,930

91,800

0.0319

41

71,250

2,140

0

0

0

20

2,140

73,410

0.0292

42

81,560

2,010

30

0

0

60

2,070

83,660

0.0248

43

72,930

1,270

10

0

0

30

1,290

74,240

0.0174

44

69,660

930

10

0

0

50

950

70,650

0.0135

45

84,290

760

30

10

10

30

890

85,130

0.0105

46

67,330

510

0

50

30

40

780

67,960

0.0115

47

66,220

270

10

0

10

0

330

66,510

0.0050

48

74,790

310

10

10

0

30

360

75,150

0.0048

49

57,600

120

0

20

10

20

220

57,770

0.0038

TOTAL

3,473,050

170,910

1,760

250

150

1,600

175,780

3,647,720

1.6157

The “missing” column shows that only 1,600 women, out of nearly 3.65 million aged between 15 and 49, did not have their recent fertility recorded. This represents 0.04 per cent of all women, and will have no material impact on the estimated fertility of women in Cambodia. A further check on the age distribution of these cases shows no clear age pattern of omission. The number of births is given by the weighted sum of women reporting 1, 2, 3 and 4 deliveries, in the last row. This calculation shows that 173,070 women (170,910 + 1,760 + 250 + 150) gave birth to a total of 175,780 births (1x170,910 + 2×1,760 + 3×250 + 4×150) during the year preceding the census. Of these women, 98.8 per cent (170,910 / 173,070) experienced a single birth. 1.0 per cent had twins, and 0.2 per cent triplets or higher-order multiple births. The possibility of quintuplets (or five births in two deliveries over the period) is remote and need not be considered. Had the census not counted the multiple births separately, the crude birth rate would have been under-estimated by a factor of 173,070/175,780 = 0.984. This represents an under-estimate of just 1.6 per cent.

Using the data above, the series of single-age ASFRs is derived by dividing the total number of births to women of each age by the number of women reporting their current fertility, that is, excluding those women who did not report how many births they had in the last year. The rates are shown in Figure 1. Even though the number of women enumerated at each age is erratic, the ASFRs by single years of age are relatively smooth, with a clearly defined fertility pattern and a typical peak in the mid-twenties.

Figure 1 Age-specific fertility rates, Cambodia 2008 census140 [15]

According to these data, total fertility is 1.61 children per woman. Summing births and women in five-year age groups produces the same answer (Table 3), although, as suggested above, the measures do differ in the third decimal place.

Table 3 Age-specific fertility rates in five-year age groups, Cambodia, 2008 Census and 2005 and 2010 Demographic Health Surveys

Age group

Women

Missing

Births

ASFR

 

DHS2005

DHS2010

15-19

780,320

220

11,160

0.014

 

0.047

0.046

20-24

697,160

380

53,740

0.077

 

0.175

0.173

25-29

626,430

250

54,910

0.088

 

0.180

0.167

30-34

361,650

290

24,130

0.067

 

0.142

0.121

35-39

435,880

130

19,880

0.046

 

0.091

0.071

40-44

393,760

210

9,380

0.024

 

0.041

0.028

45-49

352,520

120

2,580

0.007

 

0.005

0.004

TF


 

 

1.61

 

3.41

3.05

Source: Census estimates, own calculations; DHS StatCompiler (www.statcompiler.com)

Even in the absence of external checks, the results from the 2008 Census data suggest implausibly low levels of fertility in Cambodia. The data are also inconsistent with the average parities calculated in the section on assessment of parity data [1]. This suggests that the data on recent fertility collected in this census are seriously deficient. This is confirmed by external checks, in the form of estimates of fertility from two DHSs conducted before and after the census. The data in the last two columns of Table 3 show that the estimate of total fertility in the 2010 DHS (based on births in the three years before the survey) was 3.1 children per woman. The estimate of total fertility from the 2005 DHS was 3.4 children per woman. It appears that only about half the births that occurred in the year before the census were reported to census enumerators.

The left-hand panel of Figure 2 shows the age-specific fertility rates calculated from the 2008 Census and the two DHSs. Clearly the fertility rates implied by the census are out of line relative to the DHSs. The latter in turn, show a rather strange pattern of fertility change over the five years, driven by almost constant reductions in fertility between ages 25 and 44. The right-hand panel of Figure 2 shows the same rates, but this time standardized to a TF of one child per woman. Despite substantial differences in the implied level of fertility, the shape of the three fertility distributions are similar, with the only real difference between them being in the 20-24 age group. It is unlikely, therefore, that there were significant differentials in the quality of the reporting of recent fertility in the 2008 Cambodia Census according to the age of women.

This result suggests that, even though the level of fertility implied by the 2008 Census data is seriously flawed, the shape of the fertility distribution is reasonably accurate. This is a prerequisite for applying many of the indirect methods of fertility estimation.

Figure 2 Age-specific fertility rates, and standardized age-specific fertility rates, Cambodia 2008 census, 2005 DHS and 2010 DHS187 [16]

Cases 3 and 4: Estimation of age-specific fertility rates when information is available on the timing of the child’s birth

If the births are classified by women’s date of last birth, a suitable period for the fertility investigation needs to be chosen. In general, it is advisable not to use a period much longer than a year as longer periods of investigation increase the probability that women might have had more than one pregnancy in that period. This results in births earlier in time being omitted (the requirement being to report on the date of birth of the last child, not all children in the period), meaning that estimates of fertility will systematically exclude births in the more distant past. In addition, if fertility has been changing rapidly, extending the period of investigation over more than a year means that the resulting estimates represent some kind of average of fertility over the period. If the census was conducted fairly early or late in the year, however, there is potentially some advantage to basing the rates on births since the beginning of the previous or current year respectively as this does not require women to remember the month of birth of their child accurately. The number of births reported in the reference period can then be prorated to produce an estimate of annual births. Rates can be calculated both in this way and based on a 12-month reference period and the results compared.

The third scenario in Table 1 does not permit the derivation of a completely accurate measure of fertility, as the age of the mother at the birth of the child cannot be established precisely. However, knowledge of the child’s date of birth does permit the numerator of the age-specific fertility rates to be derived more carefully.

In the commonly-encountered situation where the question asked is about the month and year of the last child’s birth, a more careful approach can be taken to determining the number of births in the last year. Usually a notional census date is defined. The questions on the census questionnaire typically refer to a particular day, even if the actual process of enumeration takes several weeks. A list of census dates for the last three rounds of censuses is maintained by the UN at http://unstats.un.org/unsd/demographic/sources/census/censusdates.htm [17]; a list of census dates for data maintained by IPUMS is available at https://international.ipums.org/international/samples.shtml [18] [18].

In establishing the numerator, all the births reported in the month of the census, and a prorated proportion of births that are reported to have occurred in the equivalent month a year earlier should be included. To extract this information from census data, the date handling capacity of the statistical package being used, or the DHS Century-Month Code (CMC) system can be used.

Table 4 Births reported in each month by age of mother at census date (24-25 August 1999), Kenya, 1999 Census

 

Age of mother at census

Month

15-19

20-24

25-29

30-34

35-39

40-44

45-49

August 1998

13,240

31,300

23,120

13,940

8,940

3,220

560

September 1998

9,800

22,900

17,260

9,560

6,180

2,080

680

October 1998

9,240

21,580

15,520

9,600

5,880

1,880

500

November 1998

9,040

21,940

16,060

9,880

5,280

1,660

540

December 1998

10,200

23,700

18,000

10,580

5,940

2,080

480

January 1999

14,720

28,620

20,620

12,260

7,300

2,180

660

February 1999

20,740

42,140

30,860

17,400

11,220

4,560

2,060

March 1999

15,620

31,480

21,320

12,520

7,340

2,820

520

April 1999

18,660

33,160

24,260

12,240

7,820

2,860

720

May 1999

19,660

33,880

22,860

13,960

7,440

2,480

760

June 1999

20,100

32,140

23,380

12,580

7,300

2,720

560

July 1999

21,600

32,360

23,860

13,800

7,060

2,640

520

August 1999

15,900

25,020

16,720

9,280

5,840

1,620

360

Estimated births in the last year

 188,269.68

 355,987.74

 255,940.65

 146,807.74

 86,618.71

 30,307.10

 8,486.45

Number of women

  1,700,060

  1,495,180

  1,205,060

  849,620

  725,780

  519,740

  417,500

Age-specific fertility rates

0.1107

0.2381

0.2124

0.1728

0.1193

0.0583

0.0203

In the 1999 Kenya Census, the official census date was the night of 24-25 August 1999. To estimate the births that occurred in the year preceding the census, all births reported between September 1998 and August 1999 would be included, along with 1-24/31 (=7/31) of the births reported in August 1998. This assumes that births are uniformly distributed over the days of a month (Table 4).

The estimated number of births in the year before the census in the 30-34 age group, for example, is then given by

7 31 (13,940)+9,560+9,600+...+9,280=146,807.74

In the absence of further information about the mother’s date of birth, the data above are tabulated according to the mother’s age at the census date. As noted above, the rates so derived would thus be subject to a half-year shift.

Dividing these births by the number of women in each age group gives the age-specific fertility rates. The resulting estimate of total fertility of 4.66 children per woman is clearly out of line with other estimates of fertility in the country for around that time. This, as with Cambodia, suggests that widespread underreporting occurred of births reported in the year before that census.

Only in the fourth case, when detailed information is available on both mother and child’s date of birth, is it possible to produce a precise measurement of fertility. However, if there is evidence of extreme heaping of reported dates of birth (for example on 1 January), there is little point in making use of the more refined measures as they will be distorted by the heaping. Thus, since the quality and internal consistency of the data collected in a census are unlikely to be as good as in a DHS, it is inappropriate to attempt the precise calculation of fertility rates that one would with a DHS. In some situations, however, the extent of heaping in the reported dates of birth and other errors in the data may be sufficiently limited to merit calculation of direct estimates of fertility. In these situations, the principles outlined for the calculation of estimates of recent fertility from survey data [14] should be applied.

References

Moultrie, Tom A. and Ian M. Timæus. 2002. Trends in South African Fertility between 1970 and 1998: An Analysis of the 1996 Census and the 1998 Demographic and Health Survey. Cape Town: Medical Research Council. http://www.mrc.ac.za/bod/trends.pdf [19]. Accessed 1 May 2011.

 

For example, in the Kenya 1999 census, the official census date was the night of 24-25 August 1999. To estimate the births that occurred in the year preceding the census, all births reported between September 1998 and August 1999 would be included, along with 1-24/31 (=7/31) of the births reported in August 1998, on the assumption that births are uniformly distributed over the days of a month.49 [20]

 

Note that, in the absence of further information about the mother’s date of birth, the data above is still tabulated according to the mother’s age at the census date, and the rates so derived would still be subject to a half-year shift, as with the other methods described above.

Copyright © IUSSP 2011 - 2013

Source URL (retrieved on 17/01/2025): http://demographicestimation.iussp.org/content/evaluation-and-correction-fertility-data

Links:
[1] http://demographicestimation.iussp.org/content/assessment-parity-data
[2] http://demographicestimation.iussp.org/content/el-badry-correction
[3] http://demographicestimation.iussp.org/content/evaluation-data-recent-fertility-censuses
[4] http://demographicestimation.iussp.org/content/relational-gompertz-model
[5] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/sites/demographicestimation.iussp.org/files/wysiwyg_imageupload/3/FE_DIR_CEN_01_0.png
[6] http://www.statcompiler.com
[7] http://hdl.handle.net/10125/3600
[8] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/FE_elBadry_5.xlsx
[9] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/sites/demographicestimation.iussp.org/files/wysiwyg_imageupload/3/FE_ELBAD_01_0.png
[10] http://dx.doi.org/10.1080/01621459.1961.10482134
[11] http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html
[12] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/sites/demographicestimation.iussp.org/files/FE_elBadry_6.xlsx
[13] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/sites/demographicestimation.iussp.org/files/FE_elBadry_FR_0.xlsx
[14] http://demographicestimation.iussp.org/content/direct-estimation-fertility-survey-data-containing-birth-histories
[15] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/sites/demographicestimation.iussp.org/files/wysiwyg_imageupload/3/FE_DIR_CEN_02_0.png
[16] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/sites/demographicestimation.iussp.org/files/wysiwyg_imageupload/3/FE_DIR_CEN_03_0.png
[17] http://unstats.un.org/unsd/demographic/sources/census/censusdates.htm
[18] https://international.ipums.org/international/samples.shtml
[19] http://www.mrc.ac.za/bod/trends.pdf
[20] http://demographicestimation.iussp.org/sites/new-demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/sites/demographicestimation.iussp.org/files/wysiwyg_imageupload/3/df_fig1_1.png