Calculation:
Link
The Gini index is defined as a ratio of the areas on the Lorenz
curve diagram. If the area between the line of perfect equality and the
Lorenz curve is A, and the area under the Lorenz curve is B, then the
Gini index is A/(A+B). Since A+B = 0.5, the Gini index, G = A/(0.5) = 2A
= 1-2B. If the Lorenz curve is represented by the function Y = L(X), the
value of B can be found with integration and:
In some cases, this equation can be applied to calculate the Gini
coefficient without direct reference to the Lorenz curve. For example:
For a population uniform on the values yi, i = 1 to n, indexed in
non-decreasing order ( yi ≤ yi+1):
This may be simplified to: For a discrete probability function f(y),
where yi, i = 1 to n, are the points with nonzero probabilities and
which are indexed in increasing order ( yi < yi+1):
where and For a cumulative distribution function F(y) that is
piecewise differentiable, has a mean μ, and is zero for all negative
values of y:
Since the Gini coefficient is half the relative mean difference, it
can also be calculated using formulas for the relative mean difference.
For a random sample S consisting of values yi, i = 1 to n, that are
indexed in non-decreasing order ( yi ≤ yi+1), the statistic:
is a consistent estimator of the population Gini coefficient, but is
not, in general, unbiased. Like G, G(S) has a simpler form: .
There does not exist a sample statistic that is in general an
unbiased estimator of the population Gini coefficient, like the relative
mean difference.
For some functional forms, the Gini index can be calculated
explicitly. For example, if y follows a lognormal distribution with the
standard deviation of logs equal to σ, then where Φ() is the cumulative
distribution function of the standard normal distribution.
Sometimes the entire Lorenz curve is not known, and only values at
certain intervals are given. In that case, the Gini coefficient can be
approximated by using various techniques for interpolating the missing
values of the Lorenz curve. If ( X k, Yk ) are the known points on the
Lorenz curve, with the X k indexed in increasing order ( X k - 1 < X k
), so that:
Xk is the cumulated proportion of the population variable, for k =
0,...,n, with X0 = 0, Xn = 1.
Yk is the cumulated proportion of the income variable, for k =
0,...,n, with Y0 = 0, Yn = 1.
Yk should be indexed in non-decreasing order (Yk>Yk-1)
If the Lorenz curve is approximated on each interval as a line
between consecutive points, then the area B can be approximated with
trapezoids and:
is the resulting approximation for G. More accurate results can be
obtained using other methods to approximate the area B, such as
approximating the Lorenz curve with a quadratic function across pairs of
intervals, or building an appropriately smooth approximation to the
underlying distribution function that matches the known data. If the
population mean and boundary values for each interval are also known,
these can also often be used to improve the accuracy of the
approximation.
The Gini coefficient calculated from a sample is a statistic and its
standard error, or confidence intervals for the population Gini
coefficient, should be reported. These can be calculated using bootstrap
techniques but those proposed have been mathematically complicated and
computationally onerous even in an era of fast computers. Ogwang (2000)
made the process more efficient by setting up a “trick regression model”
in which the incomes in the sample are ranked with the lowest income
being allocated rank 1. The model then expresses the rank (dependent
variable) as the sum of a constant A and a normal error term whose
variance is inversely proportional to yk;
Ogwang showed that G can be expressed as a function of the weighted
least squares estimate of the constant A and that this can be used to
speed up the calculation of the jackknife estimate for the standard
error. Giles (2004) argued that the standard error of the estimate of A
can be used to derive that of the estimate of G directly without using a
jackknife at all. This method only requires the use of ordinary least
squares regression after ordering the sample data. The results compare
favorably with the estimates from the jackknife with agreement improving
with increasing sample size. The paper describing this method can be
found here:
Link
However it has since been argued that this is dependent on the
model’s assumptions about the error distributions (Ogwang 2004) and the
independence of error terms (Reza & Gastwirth 2006) and that these
assumptions are often not valid for real data sets. It may therefore be
better to stick with jackknife methods such as those proposed by
Yitzhaki (1991) and Karagiannis and Kovacevic (2000). The debate
continues.
The Gini coefficient can be calculated if you know the mean of a
distribution, the number of people (or percentiles), and the income of
each person (or percentile). Princeton development economist Angus
Deaton (1997, 139) simplified the Gini calculation to one easy formula:
where u is mean income of the population, Pi is the income rank P of
person i, with income X, such that the richest person receives a rank of
1 and the poorest a rank of N. This effectively gives higher weight to
poorer people in the income distribution, which allows the Gini to meet
the Transfer Principle.
Video
how to do these calc’s
Disadvantages of Gini coefficient as a measure of inequality
While the Gini coefficient measures inequality of income, it does
not measure inequality of opportunity. For example, the United Kingdom
has a social class structure that may present barriers to upward
mobility; this is not reflected in its Gini coefficient.
The Gini coefficient of different sets of people cannot be averaged
to obtain the Gini coefficient of all the people in the sets: if a Gini
coefficient were to be calculated for each person it would always be
zero. For a large, economically diverse country, a much higher
coefficient will be calculated for the country as a whole than will be
calculated for each of its regions. (The coefficient is usually applied
to measurable nominal income rather than local purchasing power, tending
to increase the calculated coefficient across larger areas.)[citation
needed]
The Lorenz curve may understate the actual amount of inequality if
richer households are able to use income more efficiently than lower
income households or vice versa. From another point of view, measured
inequality may be the result of more or less efficient use of household
incomes[citation needed].
Economies with similar incomes and Gini coefficients can still have
very different income distributions. This is because the Lorenz curves
can have different shapes and yet still yield the same Gini coefficient.
For example, consider a society where half of individuals had no income
and the other half shared all the income equally (i.e. whose Lorenz
curve is linear from (0,0) to (0.5,0) and then linear to (1,1)). As is
easily calculated, this society has Gini coefficient 0.5 -- the same as
that of a society in which 75% of people equally shared 25% of income
while the remaining 25% equally shared 75% (i.e. whose Lorenz curve is
linear from (0,0) to (0.75,0.25) and then linear to (1,1)).
It measures current income rather than lifetime income. A society in
which everyone earned the same over a lifetime would appear unequal
because of people at different stages in their life; a society in which
students study rather than save can never have a coefficient of 0.
However, Gini coefficient can also be calculated for any kind of
distribution, e.g. for wealth.[9][unreliable source?]
Gini coefficients do include investment income; however, the Gini
coefficient based on net income does not accurately reflect differences
in wealth—a possible source of misinterpretation. For example, Sweden
has a low Gini coefficient for income distribution but a significantly
higher Gini coefficient for wealth (for instance 77% of the share value
owned by households is held by just 5% of Swedish shareholding
households )[10]. In other words, the Gini income coefficient should not
be interpreted as measuring effective egalitarianism.
Too often only the Gini coefficient is quoted without describing the
proportions of the quantiles used for measurement. As with other
inequality coefficients, the Gini coefficient is influenced by the
granularity of the measurements. For example, five 20% quantiles (low
granularity) will usually yield a lower Gini coefficient than twenty 5%
quantiles (high granularity) taken from the same distribution. This is
an often encountered problem with measurements.
Care should be taken in using the Gini coefficient as a measure of
egalitarianism, as it is properly a measure of income dispersion. For
example, if two equally egalitarian countries pursue different
immigration policies, the country accepting higher proportion of
low-income or impoverished migrants will be assessed as less equal (gain
a higher Gini coefficient).
The Gini coefficient is a point-estimate of equality at a certain
time, hence it ignores life-span changes in income. Typically, increases
in the proportion of young or old members of a society will drive
apparent changes in equality. Because of this, factors such as age
distribution within a population and mobility within income classes can
create the appearance of differential equality when none exist taking
into account demographic effects. Thus a given economy may have a higher
Gini coefficient at any one point in time compared to another, while the
Gini coefficient calculated over individuals' lifetime income is
actually lower than the apparently more equal (at a given point in time)
economy's.[11] Essentially, what matters is not just inequality in any
particular year, but the composition of the distribution over time.
[edit] General problems of measurement
Comparing income distributions among countries may be difficult
because benefits systems may differ. For example, some countries give
benefits in the form of money while others give food stamps, which might
not be counted by some economists and researchers as income in the
Lorenz curve and therefore not taken into account in the Gini
coefficient. Income in the United States is counted before benefits,
while in France it is counted after benefits, which may lead the United
States to appear somewhat more unequal vis-a-vis France. In another
example, the Soviet Union was measured to have relatively high income
inequality: by some estimates, in the late 1970s, Gini coefficient of
its urban population was as high as 0.38,[12] which is higher than many
Western countries today. This number would not reflect those benefits
received by Soviet citizens that were not monetized for measurement,
which may include child care for children as young as two months,
elementary, secondary and higher education, cradle-to-grave medical
care, and heavily subsidized or provided housing. In this example, a
more accurate comparison between the 1970s Soviet Union and Western
countries may require one to assign monetary values to all benefits – a
difficult task in the absence of free markets. Similar problems arise
whenever a comparison between pure free-market economies and partially
socialist economies is attempted. Benefits may take various and
unexpected forms: for example, major oil producers such as Venezuela and
Iran provide indirect benefits to its citizens by subsidizing the retail
price of gasoline.
Similarly, in some societies people may have significant income in
other forms than money, for example through subsistence farming or
bartering. Like non-monetary benefits, the value of these incomes is
difficult to quantify. Different quantifications of these incomes will
yield different Gini coefficients.
The measure will give different results when applied to individuals
instead of households. When different populations are not measured with
consistent definitions, comparison is not meaningful.
As for all statistics, there may be systematic and random errors in
the data. The meaning of the Gini coefficient decreases as the data
become less accurate. Also, countries may collect data differently,
making it difficult to compare statistics between countries.
As one result of this criticism, in addition to or in competition
with the Gini coefficient entropy measures are frequently used (e.g. the
Theil Index and the Atkinson index). These measures attempt to compare
the distribution of resources by intelligent agents in the market with a
maximum entropy random distribution, which would occur if these agents
acted like non-intelligent particles in a closed system following the
laws of statistical physics.