The Erdös Number Project
Mathematics and Science Center, Room 346
146 Library Drive
Rochester, MI 48309-4479
(map)

facts-about-erdos-numbers-and-the-collaboration-graph

 Return to Erd?s Number Project home page.

The following interesting facts about the collaboration graph and Erdös numbers are mostly based on information in the database of the American Mathematical Society’s Mathematical Reviews (MR) as of May, 2000. (For the corresponding more up-to-date page based on 2004 data, please click here .) Internet access to information there is provided by the service MathSciNet . We gratefully acknowledge the assistance of the AMS in making this information available.

Data on the entire collaboration graph

There are about 1.6 million authored items in the Math Reviews database, by a total of about 337,000 different authors. (This includes all books and papers in MR except those items, such as some conference proceedings, that do not have authors.) Approximately 65.8% of these items are by a single author, 25.8% by two authors, 6.7% by three authors, 1.3% by four authors, 0.3% by five authors, and 0.1% by six or more authors. The largest number of authors shown for a single item is in the 20s, but sometimes the author list includes “et al.”, whom we did not count as a real person. The fraction of items authored by just one person has steadily decreased over time, starting out above 90% in the 1940s and currently hovering just under 50%.

Let B be the bipartite graph whose vertices are papers and authors, with an edge joining a paper with each author of that paper. Then B has about 2.3 million edges. The average number of authors per paper is 1.45, and the average number of papers per author is 6.87. Click here to see the distribution of the number of papers per author. The median is 2, the mean is 6.87, and the standard deviation is 15.35. It is interesting (for tenure review committees?) to note that the 60th percentile is 3 papers, the 70th percentile is 4, the 80th percentile is 8, the 90th percentile is 17, and the 95th percentile is 30. Indeed, over 42% of all authors in the database have just one paper.

There are eight authors with more than 500 papers: Paul Erdös with 1401 (he actually wrote more papers than that, but these are just the ones treated by Math Reviews), Drumi Bainov with 782, Leonard Carlitz with 730, Lucien Godeaux with 644, SAHARON SHELAH with 600, Hari M. Srivastava with 537, FRANK HARARY with 534, and Richard Bellman with 522. Bainov’s Erdös number is 4, Carlitz’s is 2, Godeaux’s is infinite (actually he wrote only one joint paper), Srivastava’s is 2, Bellman’s is 2, and Shelah and Harary are both Erdös coauthors.

The collaboration graph C has the roughly 337,000 authors as its vertices, with an edge between every pair of people who have a joint publication (with or without other coauthors — but see below for a discussion of “Erdös number of the second kind”, where we restrict links to just two-author papers). This graph has about 496,000 edges, so the average number of collaborators per person is 2.94. (If we were to view C as a multigraph, with one edge between two vertices for each paper in which they collaborated, then there would be about 950,000 edges, for an average of 5.65 collaborations per person.) In C there is one large component consisting of about 208,000 vertices. Of the remaining 129,000 authors, 84,000 of them have written no joint papers (these are isolated vertices in C). The average number of collaborators for people who have collaborated is 3.92; the average number of collaborators for people in the large component is 4.43; and the average number of collaborators for people who have collaborated but are not in the large component is 1.54.

Click here to see the data on the number of collaborators per author (in other words, the numbers of coauthors mathematicians have). In graph-theoretical terms, this table shows the degrees of the vertices in C. The median is 1, the mean is 2.94, and the standard deviation is 5.50. (If we omit the isolated vertices, then the median degree is 2, the mean is 3.92, and the standard deviation is 6.04.) Recent research (see our research page) has indicated that we should expect the nonzero degrees to follow a power law: the number of vertices with degree x should be proportional to x raised to a power, where the exponent is somewhere around –2 or –3. Indeed, when we fit such a model to our data (grouping the data in the tail), we find the exponent to be about –2.97, with a correlation coefficient for the model of r = 0.97. A slightly more accurate model throws in an exponential decay factor, and with this factor present, the exponent is –2.46, and r = 0.98. Apparently these models are appropriate for our data.

The three people with more than 200 coauthors are Paul Erdös (of course) with 502, Frank Harary with 254, and Yuri Alekseevich Mitropolskii with 240. (Harary is an Erdös coauthor, and Mitropolskii’s Erdös number is 3.) The other six sociable mathematicians in this table (all of whom have between 152 and 156 coauthors listed in Mathematical Reviews), with their Erdös numbers shown in parentheses, are NOGA ALON (1), Andrei Nikolaevich Kolmogorov (4), SAHARON SHELAH (1), Sergei Petrovich Novikov (3), Aleksandr Andreevich Samarskii (3), and Hari M. Srivastava (2).

Click here for information on publication habits over time. It is clear from these data that collaboration has increased over the past 60 years, especially so recently. At the present time, less than half of all mathematics papers are by a single author, about a third are by two authors, about an eighth by three authors, and 3% by four or more authors. The table also indicates that the average number of papers per author in a decade has slowly increased over time, now standing at about 5 (although the variance is very large, and the median is only 2).

The radius of the large component of C (as it existed in Mathematical Reviews data as of May, 2000) is 14, and its diameter is 27. There are at least three vertices with eccentricity 14, including NOGA ALON, but not including Paul Erdös. (In other words, there is no one with ALON number greater than 14.) Based on a sample of 66 pairs of vertices in this component, the average distance between two vertices is around 7.73 (between 7.37 and 8.08 with 95% confidence), with a standard deviation of 1.45. The median of the sample was 8, with the quartiles at 6.75 and 9. The smallest and largest distances in the sample were 5 and 12, respectively. The appropriate phrase for C, then, is perhaps "nine degrees of separation", if we wish to account for three quarters of all pairs of mathematicians.

To analyze this another way, we took a sample of 100 vertices in the large component and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the “eccentricity”). Here are the results from the sample. The mean distance to other vertices varied from 5.65 to 11.61, with an average of 7.52 and a standard deviation of 1.00. (Thus a 95% confidence interval for the average distance between vertices is 7.32 to 7.72, consistent with the estimate in the paragraph above.) The standard deviation of the distances to all the other vertices was remarkably constant, with the numbers varying only between 1.19 and 1.35 (the interquartile range was from 1.23 to 1.27, mean 1.25, standard deviation 0.03). So although the average “Jane Doe” number varies quite a bit, depending on who Jane Doe is, the distribution of these numbers has pretty much the same shape and spread for everyone. It’s as if those people further away from the heart of the graph may take longer to get to the heart, but once there, the fan-out pattern is the same. The eccentricities of the vertices in the sample ranged from 15 to 21, with a mean of 17.46 and a standard deviation of 1.16. We also looked at correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l). The associations are as one might predict: The correlation coefficient between d and n is –0.33 (people with a lot of collaborators tend to have smaller Erdös number); the correlation coefficient between d and l is –0.45 (people with a lot of collaborators tend to have shorter paths to other people); and the correlation coefficient between n and l is 0.81 (people with a small Erdös number are closer to the heart of the graph and therefore have shorter paths to others, compared to those out in the fringes).

The “clustering coefficient” of a graph is equal to the fraction of ordered triples of vertices a,b,c in which edges ab and bc are present that have edge ac present. (In other words, how often are two neighbors of a vertex adjacent to each other?) The clustering coefficient of the collaboration graph of the first kind is 913659/6072790 = 0.15. The fairly high value of this figure, together with the fact that average path lengths are small, indicates that this graph is a “small world” graph (as defined by Duncan Watts — see our pages on research on collaboration and related concepts).

We also have some data on the portion of the collaboration graph outside the “Erdös component” (the one giant component). We are ignoring here the 84,000 isolated vertices and looking only at those authors who have collaborated but do not have a finite Erdös number. There are about 45,000 such vertices. There are about 35,000 edges in these components, so the average degree of these vertices is 1.54. In other words, a person who has collaborated but does not find herself in the Erdös component of C has on the average collaborated with only one or two people. In contrast, the average degree of vertices in the Erdös component is 4.43 (there are about 462,000 edges and 208,000 vertices). Click here for the distribution of component sizes. As would be expected, most of these roughly 17,000 other components are isolated edges (two thirds of them, in fact). The largest component has 39 vertices. Its most collaborating author is Feng Xiang Mei (Department of Applied Mechanics at the Beijing Institute of Technology).

Smaller collaboration graphs

It would be interesting to see how much collaboration goes on within one department. In the Department of Mathematics and Statistics at Oakland University there seems to be quite a bit. Click here for a pdf file of their collaboration graph. If other departments produce such graph, please send the link to me.

The distribution of Erdös numbers

The following table shows the number of people with Erdös number 1, 2, 3, ..., according to the electronic data. Note that there are slightly fewer people shown here with Erdös numbers 1 and 2 than in our lists, since our lists are compiled by hand from various sources in addition to MathSciNet. In addition to these people with finite Erdös number, there are about 45,000 mathematicians who have collaborated but have an infinite Erdös number, and 84,000 who have never published joint works (and therefore of course also have an infinite Erdös number).
Erdös number 0 --- 1 person
Erdös number 1 --- 502 people
Erdös number 2 --- 5713 people
Erdös number 3 --- 26422 people
Erdös number 4 --- 62136 people
Erdös number 5 --- 66157 people
Erdös number 6 --- 32280 people
Erdös number 7 --- 10431 people
Erdös number 8 --- 3214 people
Erdös number 9 --- 953 people
Erdös number 10 --- 262 people
Erdös number 11 --- 94 people
Erdös number 12 --- 23 people
Erdös number 13 --- 4 people
Erdös number 14 --- 7 people
Erdös number 15 --- 1 person
Erdös number 16 --- 0 people


Thus the median Erdös number is 5; the mean is 4.69, and the standard deviation is 1.27.

The person with the largest finite Erdös number is R. G. Kamalov, and one shortest path goes like this: Erdös to NOGA ALON (1985-1997) to Vitali D. Milman (1983-1987) to A. D. Myskis (1960-1964) to A. Ja. Hohrjakov (1958) to V. M. Zubov (1968-1971) to G. A. Shinkarenko (1987-1993) to Ya. G. Savula (1977) to Ya. M. Grigorenko (1985) to V. I. Gulyaev (1991) to A. M. Aleksandrov (1972) to Boris Aleksandrovich Kats (1973) to M. Kh. Brenerman (1985) to A. R. Kessel (1994) to G. O. Berim (1980-1984) to Kamalov (1984).

Since Paul Erdös collaborated with so many people, one would expect this distribution for him to be shifted downward from that of a random mathematician. Indeed, “Grossman numbers” have a median of 6 and a mean of 5.81, and range as high as 17.

Erdös numbers of the second kind

The entire discussion so far has been based on linking two mathematicians if they have written a joint paper, whether or not other authors were involved. A purer definition of the collaboration graph (in fact, the one that Paul Erdös himself seemed to favor) would put an edge between two vertices if the mathematicians have a joint paper by themselves, with no other authors. Under this definition, for example, YOLANDA DEBOSE would not have an Erdös number of 1, since her only joint publication with Erdös was a three-author paper with ARTHUR M. HOBBS as well. (But Hobbs would still have Erdös number 1, since some of his joint works are with Paul alone.) Let C' denote the collaboration graph under this more restrictive definition, and let us call the associated path lengths “Erdös numbers of the second kind” (and therefore call traditional Erdös numbers “Erdös numbers of the first kind” when we need to make a distinction).

Here is what we have found out so far about C' and Erdös numbers of the second kind. This two-author-only collaboration graph has about 142,000 isolated vertices (including the 84,000 people who have written no joint papers, together with another 58,000 people who have written joint papers but only when three or more authors were involved). The remaining 195,000 mathematicians in C' account for about 226,000 edges, so the average degree of a nonisolated vertex in C' is about 2.31 (as opposed to 3.92 for C). Click here to see the data on the distribution of these degrees, i.e., the number of collaborators per author counting only dual works. The median is 1, the mean is 1.34, and the standard deviation is 2.66. (If we omit the isolated vertices, then the median degree is 2, the mean is 2.31, and the standard deviation is 3.16.) As with the collaboration graph of the first kind, we should expect the nonzero degrees to follow a power law, and when we fit this a model to our data (again grouping the data in the tail), we find the exponent to be about –3.26, with a correlation coefficient for the model of r = 0.97. The model with an exponential decay factor present gives the exponent as –2.70, with r = 0.98.

The three people with 100 or more coauthors of this type are Paul Erdös (of course) with 229, FRANK HARARY with 123, and SAHARON SHELAH with 100. The only other person with more than 72 is Joseph B. Keller (who has Erdös number of the first kind 3). There are about 141,000 vertices in the large component of C' (versus 208,000 in C). The average number of two-author-only collaborators for people in the large component is 2.74; and the average number of two-author-only collaborators for people who have written two-author papers but are not in the large component is 1.22.

The radius of the large component of C' (as it existed in Mathematical Reviews data as of May, 2000) is either 14 or 15, and its diameter is 28. As in the case of C, we took a sample of vertices in the large component of C' and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the “eccentricity”). Here are the results from the sample of 151 vertices. The mean distance to other vertices varied from 6.60 to 13.65, with an average of 9.56 and a standard deviation of 1.26. (Thus a 95% confidence interval for the average distance between vertices is 9.36 to 9.76.) The standard deviation of the distances to all the other vertices was again remarkably constant, with the numbers varying only between 1.56 and 1.72 (the interquartile range was from 1.59 to 1.65, mean 1.62, standard deviation 0.04). The eccentricities of the vertices in the sample ranged from 16 to 24, with a mean of 19.25 and a standard deviation of 1.42. As for the correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l), the correlation coefficient between d and n is –0.24; the correlation coefficient between d and l is –0.26; and the correlation coefficient between n and l is 0.86.

The clustering coefficient of the collaboration graph of the second kind is 34578/1270315 = 0.027. This is actually a fairly high value (compared to a random graph with this density of edges, where the clustering coefficient is essentially 0), so again we have a “small world” graph. The three mathematicians with at least 25 collaborations among their collaborators whose collaborators most collaborate with each other are Rosa Alexander, Charles C. Lindner, and Masahiro Nakamura, each with local clustering coefficients in the 11% to 13% range — these are the only ones above 10%. (In other words, for these people, about 12% of the pairs of their collaborators have themselves collaborated, where in all of this we are talking only about two-author papers.)

We also have some data on the portion of the collaboration graph of the second kind outside the “Erdös component” (the one giant component). We are ignoring here the 142,000 isolated vertices and looking only at those authors who have written two-author papers but do not have a finite Erdös number of the second kind. There are about 55,000 such vertices. There are about 33,000 edges in these components, so the average degree of these vertices is 1.22. (In contrast, the average degree of vertices in the Erdös component is 2.74 (there are about 193,000 edges and 141,000 vertices). Click here for the distribution of component sizes. As would be expected, most of these roughly 21,000 other components are isolated edges (three fourths of them, in fact). The largest component has 33 vertices.

The distribution of Erdös numbers of the second kind

The following table shows the number of people with Erdös number 1, 2, 3, ..., according to the electronic data but counting only coauthorships on papers with just two authors. In addition to these people with finite Erdös number of the second kind, there are about 55,000 mathematicians who have collaborated but have an infinite Erdös number of the second kind (this is about 10,000 greater than the corresponding number for Erdös numbers of the first kind).
these are Erdös numbers of the second kind
Erdös number 0 --- 1 person
Erdös number 1 --- 229 people
Erdös number 2 --- 1969 people
Erdös number 3 --- 8602 people
Erdös number 4 --- 22668 people
Erdös number 5 --- 36112 people
Erdös number 6 --- 34286 people
Erdös number 7 --- 20319 people
Erdös number 8 --- 9880 people
Erdös number 9 --- 4157 people
Erdös number 10 --- 1667 people
Erdös number 11 --- 602 people
Erdös number 12 --- 245 people
Erdös number 13 --- 58 people
Erdös number 14 --- 22 people
Erdös number 15 --- 4 people
Erdös number 16 --- 0 people

Thus the median Erdös number of the second kind is 6; the mean is 5.63, and the standard deviation is 1.63, a little higher than the corresponding statistics for Erdös numbers of the first kind, as would be expected. Surprisingly, the maximum finite Erdös number remains the same. (However, the person with the maximum Erdös number of the first kind (15), namely R. G. Kamalov, is not in the Erdös component of the second kind. The following are the four people with maximum Erdös number of the second kind (15): S. A. Arakelyan, N. V. Kondrasova, Sunil Kumar, and N. V. Strakhova.)


Paul Erdös asked the following question: Is the collaboration graph of the second kind planar? Our guess was that surely it was not, and we now have a proof. If we can find a homeomorphic copy of the complete graph on five vertices in C', then we know that the graph cannot be imbedded in a plane. Here is one such copy we found by hand. We need five vertices, each of which is joined in C' to each of the others, either by an edge or by a path, and all the paths have to be disjoint (none of the intermediate authors can occur more than once). The five vertices are Paul Erdös, Frank Harary, Ron Graham, Dan Kleitman, and Saharon Shelah. There are edges in C' between Erdös and Graham, between Erdös and Kleitman, and between Erdös and Shelah. Although Erdös and Harary have two joint publications, each of those papers had a third author, so the Erdös-Harary edge is not in C'. There is a path in C' from Erdös to Harary passing through Leo Moser, however. There is a path between Harary and Graham passing through Stefan Burr, a path from Harary to Kleitman passing through Jin Akiyama and Noga Alon (in that order), and a path from Harary to Shelah passing through Andreas Blass. There is an edge between Graham and Kleitman, and there is a path from Graham to Shelah passing through Joel Spencer. Finally, there is a path from Kleitman to Shelah passing through Eric Milner.

In December 2000 we have improved this even further. The "k-core" of a graph is the (unique) largest subgraph all of whose vertices have degree at least k. (See the article in Social Networks discussed on the "research" subpage for references to the notion of core.) It is easy to find the k-core: just remove all vertices of degree less than k, then repeat again and again until no such vertices remain. If any vertices remain, then they form the k-core. It is clear that the 1-core contains the 2-core, which contains the 3-core, etc. The smallest nonempty k-core (i.e., the one for largest k) is called the "main core". For the collaboration graph of the second kind, we found (using electronic data) that the main core is the 5-core, and it has 44 vertices (including Erdös, not surprisingly, with degree 25) and 162 edges. Click here for the names of these most social 44 mathematicians (of whom 36 are Erdös coauthors, and the other 7 have Erdös number 2), here for a picture, and here for the adjacency matrix of this graph.

It turns out that the main core of the collaboration graph of the second kind has three complete graphs on five vertices: COLBOURN-Hartman-Mendelson-PHELPS-ROSA, COLBOURN-Lindner-Mendelson-PHELPS-ROSA, and Lindner-MULLIN-ROSA-STINSON-Wallis. It also has 66 copies of the complete bipartite graph with three vertices in each part (the other canonical nonplanar graph), such as (FAN CHUNG, RODL, SZEMEREDI)-(RON GRAHAM, TROTTER, E R D O S). So this graph is certainly nonplanar.

Actually, these are not the only complete graphs on five vertices in the collaboration graph of the second kind. Gerald Ludden has the distinction of being the only person who has exactly four collaborators of the second kind, each of whom has 2-author collaborated with each of the others (Koichi Ogiue, Masajumi Okumura, Bang-Yen Chen, and David E. Blair).

Statistical summaries of Erdos1 and Erdos2 lists (numbers of the first kind)

The data below are based on the 2004 data contained on this site.

This file contains a statistical summary of the number of Erdös number 1 coauthors for people with Erdös number 2, the number of Erdös number 1 coauthors for people with Erdös number 1, the total number of coauthors for people with Erdös number 1, the number of papers that Erdös’s coauthors have with him, and the number of new coauthors Paul Erdös added each year.

This is a textfile giving the adjacency lists for the induced subgraph of the collaboration graph on all Erdös coauthors.

This file lists the Erdös number record holders (for example, which person with Erdös number 2 has the most coauthors with Erdös number 1?).


NOTE: A paper summarizing some of what is on this page is available here. It appears in the Proceedings of 33rd Southeastern Conference on Combinatorics (Congressus Numerantium, Vol. 158, 2002, pp. 201-212). An abbreviated version appears in SIAM News 35:9 (November, 2002), pp. 1, 8-9; click here for a reprint (pdf). Here is a file of slides from a talk about the publication habits in the different area of mathematics (Word for Macintosh document). Here is a file of slides from a talk about the collaboration graph of papers rather than the collaboration graph of people. URL = http://www.oakland.edu/enp/oldtrivia.html
This page was last updated on November 2, 2014. 
Return to Erdös Number Project home page.