Bureau of Labor Statistics:
Link
Each month the U.S. Bureau of Labor
Statistics collects prices from a sample of approximately 83,500
commodities and services (C&S) quotes in approximately 26,400 outlets2 around the United States for the
Consumer Price Index (CPI). For example, from January through December 2009, the 1-month changes
in the U.S. city average all items index had a median value of 0.23 percent. The standard errors
of those 12 estimates had a median value of 0.04 percent. Margins of error are usually expressed as a
statistic’s point estimate plus or minus two standard errors, so the margin of error on this CPI’s 1-month
change is approximately 0.23 percent plus or minus 0.08 percent.
Sources of error
One way of analyzing the error in a survey estimate is to divide the
total error into two sources: sampling error and non-sampling error. Sampling error is the uncertainty in
the CPI caused by the fact that a sample of retail prices is used to compute the CPI, instead of using
the complete universe of retail prices. Non-sampling error is the rest of the error. Non-sampling error
includes things such as incorrect information given by survey respondents, data processing errors, and
so forth. Non-sampling error arises regardless of whether data are collected from a sample of retail
prices or from the complete universe.
Another way of analyzing error is to divide it into variance and
bias. The variance of the CPI is a measure of how close different estimates of the CPI would be to each
other if it were possible to repeat the survey over and over using different samples. Of course, it is
not feasible to repeat the survey multiple times, but statistical theory allows the CPI’s variance to be
estimated anyway. A small variance, for example, indicates that multiple independent samples would produce
values that are consistently very close to each other. Bias is the difference between the CPI’s
expected value and its true value.
A statistic may have a small variance but a large bias, or it may have a large
variance but a small bias. For an index to be considered accurate, both its variance and bias need to be
small. The Bureau of Labor Statistics (BLS) is constantly trying to reduce
the error in the CPI. Variance and sampling error are reduced by using a sample of retail prices that
is as large as possible, given resource constraints. BLS has developed a model that optimizes the allocation
of resources by indicating the number of prices that should be observed in each geographic area and
each item category, in order to minimize the variance of the U.S. city average all items index. BLS
reduces non-sampling error through a series of computerized and professional data reviews, as well as
through continuous survey process improvements and theoretical research.
BLS collects CPI data in 38 geographic areas across the United
States. These areas consist of 31 selfrepresenting areas and 7 non-self-representing areas. Self-representing areas are
large metropolitan areas, such as the Boston, St. Louis, and San Francisco metropolitan areas.
Non-self-representing areas are collections of smaller metropolitan areas. For example, one
non-self-representing area is a collection of 32 small metropolitan areas in the Northeast region (Buffalo,
Hartford, Providence, Bangor, and others), of which 8 were randomly selected to represent the entire set.
Within each of the 38 areas, price data are collected for 211 item categories called item strata. Together the
211 item strata cover all consumer purchases. Examples of item strata are bananas, women’s dresses, and
electricity.
Multiplying the number of areas by the number of item strata gives
8,018 (= 38 ! 211) different area and item combinations for which price indexes need to be calculated.
Separate price indexes are calculated for each one of these 8,018 area and item combinations.
After all 8,018 of these basic-level indexes are calculated, they are aggregated to form higher-level
indexes, using expenditure estimates from the Consumer Expenditure Survey as their weights. Examples of
higher-level geographic areas are the four regions (Northeast, Midwest, South, and West); and examples of
higher-level item categories are the eight major groups (food & beverages, housing, apparel,
transportation, medical care, education and communication, recreation, and other goods and services). The
highest level of geographic aggregation is the U.S. city average, and the highest level of item aggregation is
all items.
Jefferson once said: "Eternal vigilance is the price of freedom."