Bureau of Labor Statistics: Link



Each month the U.S. Bureau of Labor

Statistics collects prices from a sample of approximately 83,500 commodities and services (C&S) quotes in approximately 26,400 outlets2 around the United States for the Consumer Price Index (CPI). For example, from January through December 2009, the 1-month changes in the U.S. city average all items index had a median value of 0.23 percent. The standard errors of those 12 estimates had a median value of 0.04 percent. Margins of error are usually expressed as a statistic’s point estimate plus or minus two standard errors, so the margin of error on this CPI’s 1-month change is approximately 0.23 percent plus or minus 0.08 percent.

Sources of error

One way of analyzing the error in a survey estimate is to divide the total error into two sources: sampling error and non-sampling error. Sampling error is the uncertainty in the CPI caused by the fact that a sample of retail prices is used to compute the CPI, instead of using the complete universe of retail prices. Non-sampling error is the rest of the error. Non-sampling error includes things such as incorrect information given by survey respondents, data processing errors, and so forth. Non-sampling error arises regardless of whether data are collected from a sample of retail prices or from the complete universe.


Another way of analyzing error is to divide it into variance and bias. The variance of the CPI is a measure of how close different estimates of the CPI would be to each other if it were possible to repeat the survey over and over using different samples. Of course, it is not feasible to repeat the survey multiple times, but statistical theory allows the CPI’s variance to be estimated anyway. A small variance, for example, indicates that multiple independent samples would produce values that are consistently very close to each other. Bias is the difference between the CPI’s expected value and its true value.
A statistic may have a small variance but a large bias, or it may have a large variance but a small bias. For an index to be considered accurate, both its variance and bias need to be small.  The Bureau of Labor Statistics (BLS) is constantly trying to reduce the error in the CPI. Variance and sampling error are reduced by using a sample of retail prices that is as large as possible, given resource constraints. BLS has developed a model that optimizes the allocation of resources by indicating the number of prices that should be observed in each geographic area and each item category, in order to minimize the variance of the U.S. city average all items index. BLS reduces non-sampling error through a series of computerized and professional data reviews, as well as through continuous survey process improvements and theoretical research.
BLS collects CPI data in 38 geographic areas across the United States. These areas consist of 31 selfrepresenting areas and 7 non-self-representing areas. Self-representing areas are large metropolitan areas, such as the Boston, St. Louis, and San Francisco metropolitan areas. Non-self-representing areas are collections of smaller metropolitan areas. For example, one non-self-representing area is a collection of 32 small metropolitan areas in the Northeast region (Buffalo, Hartford, Providence, Bangor, and others), of which 8 were randomly selected to represent the entire set. Within each of the 38 areas, price data are collected for 211 item categories called item strata. Together the 211 item strata cover all consumer purchases. Examples of item strata are bananas, women’s dresses, and electricity.
Multiplying the number of areas by the number of item strata gives 8,018 (= 38 ! 211) different area and item combinations for which price indexes need to be calculated. Separate price indexes are calculated for each one of these 8,018 area and item combinations. After all 8,018 of these basic-level indexes are calculated, they are aggregated to form higher-level indexes, using expenditure estimates from the Consumer Expenditure Survey as their weights. Examples of higher-level geographic areas are the four regions (Northeast, Midwest, South, and West); and examples of higher-level item categories are the eight major groups (food & beverages, housing, apparel, transportation, medical care, education and communication, recreation, and other goods and services). The highest level of geographic aggregation is the U.S. city average, and the highest level of item aggregation is all items.


Jefferson once said: "Eternal vigilance is the price of freedom."