Return to Erdős Number Project home page.
There are about 1.6 million authored items in the Math Reviews database, by a total of about 337,000 different authors. (This includes all books and papers in MR except those items, such as some conference proceedings, that do not have authors.) Approximately 65.8% of these items are by a single author, 25.8% by two authors, 6.7% by three authors, 1.3% by four authors, 0.3% by five authors, and 0.1% by six or more authors. The largest number of authors shown for a single item is in the 20s, but sometimes the author list includes et al., whom we did not count as a real person. The fraction of items authored by just one person has steadily decreased over time, starting out above 90% in the 1940s and currently hovering just under 50%.
Let B be the bipartite graph whose vertices are papers and authors, with an edge joining a paper with each author of that paper. Then B has about 2.3 million edges. The average number of authors per paper is 1.45, and the average number of papers per author is 6.87. Click here to see the distribution of the number of papers per author. The median is 2, the mean is 6.87, and the standard deviation is 15.35. It is interesting (for tenure review committees?) to note that the 60th percentile is 3 papers, the 70th percentile is 4, the 80th percentile is 8, the 90th percentile is 17, and the 95th percentile is 30. Indeed, over 42% of all authors in the database have just one paper.
There are eight authors with more than 500 papers: Paul Erdös with 1401 (he actually wrote more papers than that, but these are just the ones treated by Math Reviews), Drumi Bainov with 782, Leonard Carlitz with 730, Lucien Godeaux with 644, SAHARON SHELAH with 600, Hari M. Srivastava with 537, FRANK HARARY with 534, and Richard Bellman with 522. Bainovs Erdös number is 4, Carlitzs is 2, Godeauxs is infinite (actually he wrote only one joint paper), Srivastavas is 2, Bellmans is 2, and Shelah and Harary are both Erdös coauthors.
The collaboration graph C has the roughly 337,000 authors as its vertices, with an edge between every pair of people who have a joint publication (with or without other coauthors but see below for a discussion of Erdös number of the second kind, where we restrict links to just two-author papers). This graph has about 496,000 edges, so the average number of collaborators per person is 2.94. (If we were to view C as a multigraph, with one edge between two vertices for each paper in which they collaborated, then there would be about 950,000 edges, for an average of 5.65 collaborations per person.) In C there is one large component consisting of about 208,000 vertices. Of the remaining 129,000 authors, 84,000 of them have written no joint papers (these are isolated vertices in C). The average number of collaborators for people who have collaborated is 3.92; the average number of collaborators for people in the large component is 4.43; and the average number of collaborators for people who have collaborated but are not in the large component is 1.54.
Click here to see the data on the number of collaborators per author (in other words, the numbers of coauthors mathematicians have). In graph-theoretical terms, this table shows the degrees of the vertices in C. The median is 1, the mean is 2.94, and the standard deviation is 5.50. (If we omit the isolated vertices, then the median degree is 2, the mean is 3.92, and the standard deviation is 6.04.) Recent research (see our research page) has indicated that we should expect the nonzero degrees to follow a power law: the number of vertices with degree x should be proportional to x raised to a power, where the exponent is somewhere around 2 or 3. Indeed, when we fit such a model to our data (grouping the data in the tail), we find the exponent to be about 2.97, with a correlation coefficient for the model of r = 0.97. A slightly more accurate model throws in an exponential decay factor, and with this factor present, the exponent is 2.46, and r = 0.98. Apparently these models are appropriate for our data.
The three people with more than 200 coauthors are Paul Erdös (of course) with 502, Frank Harary with 254, and Yuri Alekseevich Mitropolskii with 240. (Harary is an Erdös coauthor, and Mitropolskiis Erdös number is 3.) The other six sociable mathematicians in this table (all of whom have between 152 and 156 coauthors listed in Mathematical Reviews), with their Erdös numbers shown in parentheses, are NOGA ALON (1), Andrei Nikolaevich Kolmogorov (4), SAHARON SHELAH (1), Sergei Petrovich Novikov (3), Aleksandr Andreevich Samarskii (3), and Hari M. Srivastava (2).
Click here for information on publication habits over time. It is clear from these data that collaboration has increased over the past 60 years, especially so recently. At the present time, less than half of all mathematics papers are by a single author, about a third are by two authors, about an eighth by three authors, and 3% by four or more authors. The table also indicates that the average number of papers per author in a decade has slowly increased over time, now standing at about 5 (although the variance is very large, and the median is only 2).
The radius of the large component of C (as it existed in Mathematical Reviews data as of May, 2000) is 14, and its diameter is 27. There are at least three vertices with eccentricity 14, including NOGA ALON, but not including Paul Erdös. (In other words, there is no one with ALON number greater than 14.) Based on a sample of 66 pairs of vertices in this component, the average distance between two vertices is around 7.73 (between 7.37 and 8.08 with 95% confidence), with a standard deviation of 1.45. The median of the sample was 8, with the quartiles at 6.75 and 9. The smallest and largest distances in the sample were 5 and 12, respectively. The appropriate phrase for C, then, is perhaps "nine degrees of separation", if we wish to account for three quarters of all pairs of mathematicians.
To analyze this another way, we took a sample of 100 vertices in the large component and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the eccentricity). Here are the results from the sample. The mean distance to other vertices varied from 5.65 to 11.61, with an average of 7.52 and a standard deviation of 1.00. (Thus a 95% confidence interval for the average distance between vertices is 7.32 to 7.72, consistent with the estimate in the paragraph above.) The standard deviation of the distances to all the other vertices was remarkably constant, with the numbers varying only between 1.19 and 1.35 (the interquartile range was from 1.23 to 1.27, mean 1.25, standard deviation 0.03). So although the average Jane Doe number varies quite a bit, depending on who Jane Doe is, the distribution of these numbers has pretty much the same shape and spread for everyone. Its as if those people further away from the heart of the graph may take longer to get to the heart, but once there, the fan-out pattern is the same. The eccentricities of the vertices in the sample ranged from 15 to 21, with a mean of 17.46 and a standard deviation of 1.16. We also looked at correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l). The associations are as one might predict: The correlation coefficient between d and n is 0.33 (people with a lot of collaborators tend to have smaller Erdös number); the correlation coefficient between d and l is 0.45 (people with a lot of collaborators tend to have shorter paths to other people); and the correlation coefficient between n and l is 0.81 (people with a small Erdös number are closer to the heart of the graph and therefore have shorter paths to others, compared to those out in the fringes).The clustering coefficient of a graph is equal to the fraction of ordered triples of vertices a,b,c in which edges ab and bc are present that have edge ac present. (In other words, how often are two neighbors of a vertex adjacent to each other?) The clustering coefficient of the collaboration graph of the first kind is 913659/6072790 = 0.15. The fairly high value of this figure, together with the fact that average path lengths are small, indicates that this graph is a small world graph (as defined by Duncan Watts see our pages on research on collaboration and related concepts).
We also have some data on the portion of the collaboration graph outside the Erdös component (the one giant component). We are ignoring here the 84,000 isolated vertices and looking only at those authors who have collaborated but do not have a finite Erdös number. There are about 45,000 such vertices. There are about 35,000 edges in these components, so the average degree of these vertices is 1.54. In other words, a person who has collaborated but does not find herself in the Erdös component of C has on the average collaborated with only one or two people. In contrast, the average degree of vertices in the Erdös component is 4.43 (there are about 462,000 edges and 208,000 vertices). Click here for the distribution of component sizes. As would be expected, most of these roughly 17,000 other components are isolated edges (two thirds of them, in fact). The largest component has 39 vertices. Its most collaborating author is Feng Xiang Mei (Department of Applied Mechanics at the Beijing Institute of Technology).
Erdös number 0 --- 1 person
Erdös number 1 --- 502 people
Erdös number 2 --- 5713 people
Erdös number 3 --- 26422 people
Erdös number 4 --- 62136 people
Erdös number 5 --- 66157 people
Erdös number 6 --- 32280 people
Erdös number 7 --- 10431 people
Erdös number 8 --- 3214 people
Erdös number 9 --- 953 people
Erdös number 10 --- 262 people
Erdös number 11 --- 94 people
Erdös number 12 --- 23 people
Erdös number 13 --- 4 people
Erdös number 14 --- 7 people
Erdös number 15 --- 1 person
Erdös number 16 --- 0 people
Thus the median Erdös number is 5; the mean
is 4.69, and the standard deviation is 1.27.
The person with the largest finite Erdös number is R. G. Kamalov, and one shortest path goes like this: Erdös to NOGA ALON (1985-1997) to Vitali D. Milman (1983-1987) to A. D. Myskis (1960-1964) to A. Ja. Hohrjakov (1958) to V. M. Zubov (1968-1971) to G. A. Shinkarenko (1987-1993) to Ya. G. Savula (1977) to Ya. M. Grigorenko (1985) to V. I. Gulyaev (1991) to A. M. Aleksandrov (1972) to Boris Aleksandrovich Kats (1973) to M. Kh. Brenerman (1985) to A. R. Kessel (1994) to G. O. Berim (1980-1984) to Kamalov (1984).
Since Paul Erdös collaborated with so many people, one would
expect this distribution for him to be shifted downward from that of a
random mathematician. Indeed, Grossman numbers have a median
of 6 and a mean of 5.81, and range as high as 17.
Here is what we have found out so far about C' and Erdös
numbers of the second kind. This two-author-only collaboration
graph has about 142,000 isolated vertices (including the 84,000 people who
have written no joint papers, together with another 58,000 people who have
written joint papers but only when three or more authors were involved).
The remaining 195,000 mathematicians in C' account for about 226,000
edges, so the average degree of a nonisolated vertex in C' is about 2.31
(as opposed to 3.92 for C). Click here to see the data
on the distribution of these degrees, i.e., the number of collaborators
per author counting only dual works. The median is 1, the mean is 1.34,
and the standard deviation is 2.66. (If we omit the isolated vertices,
then the
median degree is 2, the mean is 2.31, and the standard deviation is 3.16.)
As with the collaboration graph of
the
first kind, we should expect the nonzero degrees to follow a power
law, and when we fit this a model to our data (again grouping the
data in the tail), we find the exponent to be about 3.26, with a
correlation coefficient for the model of r = 0.97. The model with an
exponential decay factor present gives the exponent as 2.70, with
r
= 0.98.
The three people with 100 or more coauthors of this
type are Paul Erdös (of course) with 229, FRANK HARARY with
123, and SAHARON SHELAH with 100. The only other person with more than 72
is Joseph B. Keller (who has Erdös number of the first kind 3).
There are about 141,000 vertices in the large component of C'
(versus 208,000 in C). The average number of two-author-only
collaborators for people in the large component is 2.74; and the average
number of two-author-only collaborators for people who have written
two-author papers but are not in the large component is 1.22.
The radius of the large component of C' (as it existed in Mathematical
Reviews data as of May, 2000) is either 14 or 15, and its diameter is 28.
As in the case of C, we took a sample of vertices in the large component
of C' and computed for each of them: the degree, the mean distance to all
the other vertices, the standard deviation of the distances to all the
other vertices, and the maximum distance to another vertex (the
eccentricity). Here are the results from the sample of
151 vertices. The mean distance to other vertices varied from
6.60 to 13.65, with an average of 9.56 and a standard deviation of 1.26.
(Thus a 95% confidence interval for the average distance between vertices
is 9.36 to 9.76.) The standard deviation of the distances to all the other
vertices was again remarkably constant, with the numbers varying only
between 1.56 and 1.72 (the interquartile range was from 1.59 to 1.65, mean
1.62, standard deviation 0.04). The eccentricities of the vertices in the
sample ranged from 16 to 24, with a mean of 19.25 and a standard deviation
of 1.42. As for the correlations among Erdös number (n), vertex
degree (d), and average distance to the other vertices (l), the
correlation coefficient between d and n is 0.24; the correlation
coefficient between d and l is 0.26; and the correlation
coefficient
between n and l is 0.86.
The clustering coefficient of the collaboration graph
of the second kind is 34578/1270315 = 0.027. This is actually a fairly
high value (compared to a random graph with this density of edges, where
the clustering coefficient is essentially 0), so again we have a
small world graph. The three mathematicians with at least 25
collaborations among their collaborators whose collaborators most
collaborate with each other are Rosa Alexander, Charles C. Lindner, and
Masahiro Nakamura, each with local clustering coefficients in the 11% to
13% range these are the only ones above 10%. (In other words, for
these people, about 12% of the pairs of their collaborators have
themselves collaborated, where in all of this we are talking only about
two-author papers.)
We also have some data on the portion of the collaboration graph of the
second kind outside the Erdös component (the one
giant component). We are ignoring here the 142,000 isolated
vertices and looking only at those authors who have written two-author
papers but do not have a finite Erdös number of the second kind.
There are about 55,000 such vertices. There are about 33,000 edges in
these components, so the average degree of these vertices is 1.22. (In
contrast, the average degree of vertices in the Erdös component is
2.74 (there are about 193,000 edges and 141,000 vertices). Click here for the
distribution of component sizes. As would be expected, most of
these roughly 21,000 other components are isolated edges (three fourths of
them, in fact). The largest component has 33 vertices.
Paul Erdös asked the following question: Is the
collaboration graph of the second kind planar? Our guess was that surely
it was not, and we now have a proof. If we can find a
homeomorphic copy of the complete graph on five vertices in C', then we
know that the graph cannot be imbedded in a plane. Here is one such copy
we found by hand. We need five vertices, each of which is joined in C' to
each of the others, either by an edge or by a path, and all the paths have
to be disjoint (none of the intermediate authors can occur more than
once). The five vertices are Paul Erdös, Frank Harary, Ron Graham,
Dan Kleitman, and Saharon Shelah. There are edges in C' between Erdös
and Graham, between Erdös and Kleitman, and between Erdös and
Shelah. Although Erdös and Harary have two joint publications, each
of those papers had a third author, so the Erdös-Harary edge is not
in C'. There is a path in C' from Erdös to Harary passing through Leo
Moser, however. There is a path between Harary and Graham passing through
Stefan Burr, a path from Harary to Kleitman passing through Jin Akiyama
and Noga Alon (in that order), and a path from Harary to Shelah passing
through Andreas Blass. There is an edge between Graham and Kleitman, and
there is a path from Graham to Shelah passing through Joel Spencer.
Finally, there is a path from Kleitman to Shelah passing through Eric
Milner.
Recently (December 2000) we have improved this even further. The
"k-core" of a graph is the (unique) largest subgraph all of whose vertices
have degree at least k. (See the article in Social Networks
discussed on the "research" subpage
for references to the notion of core.) It is easy to find the k-core: just
remove all vertices of degree less than k, then repeat again and again
until no such vertices remain. If any vertices remain, then they form the
k-core. It is clear that the 1-core contains the 2-core, which contains
the 3-core, etc. The smallest nonempty k-core (i.e., the one for largest
k) is called the "main core". For the collaboration graph of the
second kind, we found (using electronic data) that the main core is the
5-core, and it has 44 vertices (including Erdös, not surprisingly,
with degree 25) and 162 edges. Click here for the names
of these most social 44 mathematicians (of whom 36 are Erdös
coauthors, and the other 7 have Erdös number 2), here for a picture, and here for the
adjacency matrix of this graph.
It turns out that the main core of the collaboration graph of the
second kind has three complete graphs on five vertices:
COLBOURN-Hartman-Mendelson-PHELPS-ROSA,
COLBOURN-Lindner-Mendelson-PHELPS-ROSA, and
Lindner-MULLIN-ROSA-STINSON-Wallis. It also has 66 copies of the complete
bipartite graph with three vertices in each part (the other canonical
nonplanar graph), such as (FAN CHUNG, RODL, SZEMEREDI)-(RON GRAHAM,
TROTTER, E R D O S). So this graph is certainly nonplanar.
Actually, these are not the only complete graphs on five vertices in
the collaboration graph of the second kind.
Gerald Ludden
has the distinction of being the only person who has exactly four
collaborators of the second kind, each of whom has 2-author collaborated
with each of the others (Koichi Ogiue, Masajumi Okumura, Bang-Yen Chen,
and David E. Blair).
This file
contains a statistical summary of the number of Erdös number 1
coauthors for people with Erdös number 2, the number of Erdös
number 1 coauthors for people with Erdös number 1, the total number
of coauthors for people with Erdös number 1, the number of papers
that Erdöss coauthors have with him, and the number of
new coauthors Paul Erdös added each year.
This is a
textfile giving the adjacency lists for the induced subgraph of the
collaboration graph on all Erdös coauthors.
This file
lists the Erdös number record holders (for example, which person with
Erdös number 2 has the most coauthors with Erdös number 1?).
Erdös numbers of the second kind
The entire discussion so far has been based on linking two mathematicians
if they have written a joint paper, whether or not other authors were
involved. A purer definition of the collaboration graph (in fact, the one
that Paul Erdös himself seemed to favor) would put an edge
between two vertices if the mathematicians have a joint paper by
themselves, with no other authors. Under this definition, for
example, YOLANDA DEBOSE would not have an Erdös number of 1, since
her only joint publication with Erdös was a three-author paper with
ARTHUR M. HOBBS as well. (But Hobbs would still have Erdös number 1,
since some of his joint works are with Paul alone.) Let C' denote the
collaboration graph under this more restrictive definition, and let us
call the associated path lengths Erdös numbers of the second
kind (and therefore call traditional Erdös numbers
Erdös numbers of the first kind when we need to make a
distinction).
The distribution of Erdös numbers of the second kind
The following table shows the number of people with Erdös number 1,
2, 3, ..., according to the electronic data but counting only
coauthorships on papers with just two authors. In addition to these
people with finite Erdös number of the second kind, there are about
55,000 mathematicians who have collaborated but have an infinite
Erdös number of the second kind (this is about 10,000 greater than
the corresponding number for Erdös numbers of the first kind).
these are Erdös numbers of the second kind
Erdös number 0 --- 1 person
Erdös number 1 --- 229 people
Erdös number 2 --- 1969 people
Erdös number 3 --- 8602 people
Erdös number 4 --- 22668 people
Erdös number 5 --- 36112 people
Erdös number 6 --- 34286 people
Erdös number 7 --- 20319 people
Erdös number 8 --- 9880 people
Erdös number 9 --- 4157 people
Erdös number 10 --- 1667 people
Erdös number 11 --- 602 people
Erdös number 12 --- 245 people
Erdös number 13 --- 58 people
Erdös number 14 --- 22 people
Erdös number 15 --- 4 people
Erdös number 16 --- 0 people
Thus the median Erdös number of the second kind is
6; the mean is 5.63, and the standard
deviation is 1.63, a little higher than the corresponding
statistics for Erdös numbers of the first kind, as would be expected.
Surprisingly, the maximum finite Erdös number remains the same.
(However, the person with the maximum Erdös number of the first kind
(15), namely R. G. Kamalov, is not in the Erdös component of the
second kind. The following are the four people with maximum Erdös
number of the second kind (15): S. A. Arakelyan, N. V. Kondrasova, Sunil
Kumar, and N. V. Strakhova.)
Statistical summaries of Erdos1 and Erdos2 lists (numbers of the first
kind)
The data below are based on the 2004 data contained on this site.
NOTE: A paper summarizing some of what is on this page is available here. It appears in
the Proceedings of 33rd Southeastern Conference on Combinatorics
(Congressus Numerantium, Vol. 158, 2002, pp. 201-212).
An abbreviated version appears in SIAM
News 35:9 (November, 2002), pp. 1, 8-9; click here for a
reprint (pdf).
Here
is a file of slides from a recent talk about the publication habits
in the different area of mathematics (Word for Macintosh document).
Here
is a file of slides from a recent talk about the collaboration graph of
papers rather than the collaboration graph of people.