Menu Menu
OU Home  >  The Erdös Number Project  >  Facts about Erdös Numbers and the Collaboration Graph

Facts about Erdös Numbers and the Collaboration Graph

 Return to Erdös Number Project home page.

The following interesting facts about the collaboration graph and Erdös numbers are mostly based on information in the database of the American Mathematical Society’s Mathematical Reviews (MR) as of July, 2004. Internet access to MR data is provided by the service MathSciNet. We gratefully acknowledge the assistance of the AMS in making this information available. An article with much of the information contained in this page appears in Geographical Analysis.

[For an older page, with the corresponding facts as of May, 2000, click here. It is interesting to note that over this 4-year period, 64,000 new authors were added to the MR database, but the number of authors who have written only solo-authored papers has DECREASED, from just over 84,000 to just under 84,000. Similarly, the mean number of collaborators per author increased 14%, from 2.94 to 3.36.]

A different standard is used for collaborations here than is used in constructing our Erdös-1 and Erdös-2 lists. First, for our lists we use sources in addition to Mathematical Reviews; the conclusion on this page are based just on the MR data. Second, we generally do not count articles that are not the result of research collaboration as establishing a link. For example, if Jack and Jill wrote a joint obituary article on Humpty Dumpty when he died, that article might appear in the MR database and establish a link between Jack and Jill for the conclusions reached on this page, whereas the traditional definition of the collaboration graph would not suggest putting an edge between them just on this basis. There are also a few author identification problems in the Math Reviews database (primarily prior to 1985), which make the conclusions here only approximate.

Data on the entire collaboration graph

There are about 1.9 million authored items in the Math Reviews database, by a total of about 401,000 different authors. (This includes all books and papers in MR except those items, such as some conference proceedings, that do not have authors.) Approximately 62.4% of these items are by a single author, 27.4% by two authors, 8.0% by three authors, 1.7% by four authors, 0.4% by five authors, and 0.1% by six or more authors. The largest number of authors shown for a single item is in the 20s, but sometimes the author list includes “et al.”, whom we did not count as a real person. The fraction of items authored by just one person has steadily decreased over time, starting out above 90% in the 1940s and currently standing at under 50%.

Let B be the bipartite graph whose vertices are papers and authors, with an edge joining a paper with each author of that paper. Then B has about 2.9 million edges. The average number of authors per paper is 1.51, and the average number of papers per author is 7.21. Click here to see the distribution of the number of papers per author. The median is 2, the mean is 7.21, and the standard deviation is 18.02. It is interesting (for tenure review committees?) to note that the 60th percentile is 3 papers, the 70th percentile is 4, the 80th percentile is 8, the 90th percentile is 18, and the 95th percentile is 32. Indeed, over 42% of all authors in the database have just one paper.

There are four authors with more than 700 papers: Paul Erdös with 1416 (he actually wrote more papers than that, but these are just the ones covered by Math Reviews), Drumi Bainov with 823, SAHARON SHELAH with 760, and Leonard Carlitz with 730. Bainov’s Erdös number is 4, SHELAH’s is 1, and Carlitz’s is 2. The other mathematicians with more than 500 papers listed in MathSciNet (and their Erdös numbers) are Hari M. Srivastava (2), Lucien Godeaux (infinite — actually he wrote only one joint paper), Ravi Agarwal (3), Edoardo Ballico (3), FRANK HARARY (1), Josip E. Pecaric (2), Shigeyoshi Owa (3), and Richard Bellman (2). The most prolific authors listed in the DBLP (dealing with computer science publications) can be found on a list at their website (DBLP), which is definitely worth exploring.

The collaboration graph C has the roughly 401,000 authors as its vertices, with an edge between every pair of people who have a joint publication (with or without other coauthors — but see below for a discussion of “Erdös number of the second kind”, where we restrict links to just two-author papers). Click here for a picture of a small portion of this graph. The entire graph has about 676,000 edges, so the average number of collaborators per person is 3.36. (If we were to view C as a multigraph, with one edge between two vertices for each paper in which they collaborated, then there would be about 1,300,000 edges, for an average of 6.55 collaborations per person.) In C there is one large component consisting of about 268,000 vertices. Of the remaining 133,000 authors, 84,000 of them have written no joint papers (these are isolated vertices in C). The average number of collaborators for people who have collaborated is 4.25; the average number of collaborators for people in the large component is 4.73; and the average number of collaborators for people who have collaborated but are not in the large component is 1.65.

Click here to see the data on the number of collaborators per author (in other words, the numbers of coauthors mathematicians have). In graph-theoretical terms, this table shows the degrees of the vertices in C. The median is 1, the mean is 3.36, and the standard deviation is 6.61. (If we omit the isolated vertices, then the median degree is 2, and the mean is 5.37.) Recent research (see our research page) has indicated that we should expect the nonzero degrees to follow a power law: the number of vertices with degree x should be proportional to x raised to a power, where the exponent is somewhere around –2 or –3. Indeed, when we fit such a model to our data from May, 2000 (grouping the data in the tail), we find the exponent to be about –2.97, with a correlation coefficient for the model of r = 0.97. A slightly more accurate model throws in an exponential decay factor, and with this factor present, the exponent is –2.46, and r = 0.98. Apparently these models are appropriate for our data.

The five people with more than 200 coauthors are Paul Erdös (of course) with 509 (although the MR data actually show only 504, missing some coauthors of very minor works or works before 1940, when MR was started), FRANK HARARY (Erdös number 1) with 268, Yuri Alekseevich Mitropolskii (Erdös number 3) with 244, NOGA ALON (Erdös number 1) with 227, and Hari M. Srivastava (Erdös number 2) with 244.

Click here for information on publication habits over time (1940 to 1999). It is clear from these data that collaboration has increased over the past 60-odd years, especially so recently. By 2000, less than half of all mathematics papers were by a single author, about a third were by two authors, about an eighth by three authors, and 3% by four or more authors. The table also indicates that the average number of papers per author in a decade has slowly increased over time, now standing at about 5 (although the variance is very large, and the median is only 2).

The radius of the large component of C (as it existed in Mathematical Reviews data as of July, 2004) is 12, and its diameter is 23. There are exactly two vertices with eccentricity 12 — Izrail M. Gelfand (Rutgers University) and Yakov Sinai (Princeton University), both of whom have Erdös number 3 — but not including Paul Erdös! (In other words, there is no one with Gelfand number or Sinai number greater than 12, whereas the maximum Erdös number is 13. In all, 1220 people have eccentricity 13.) Erdös does have the distinction of having the smallest mean distance to the other vertices, though: 4.65. There are five other people with means less than 5. In order of increasing mean, they are RONALD GRAHAM, ANDREW ODLYZKO, NOGA ALON, Larry Shepp, and FRANK HARARY. All of them have eccentricity 14 and Erdös number 1 except for Shepp, whose eccentricity is 13 and whose Erdös number is 2. The means for Gelfand and Sinai are slightly higher than 5.

Based on a sample of 100 pairs of vertices in this component, the average distance between two vertices is around 7.64 (between 7.41 and 7.87 with 95% confidence), with a standard deviation of about 1.19. The median of the sample was 7, with the quartiles at 6 and 8. The smallest and largest distances in the sample were 4 and 11, respectively. The appropriate phrase for C, then, is perhaps “eight degrees of separation”, if we wish to account for three quarters of all pairs of mathematicians.

To analyze this another way, we took a sample of 100 vertices in the large component and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the “eccentricity”). Here are the results from the sample. The mean distance to other vertices varied from 5.80 to 10.67, with an average of 7.37 and a standard deviation of 0.86. The standard deviation of the distances to all the other vertices was remarkably constant, with the numbers varying only between 1.14 and 1.28 (mean 1.19, standard deviation 0.03). So although the average “Jane Doe” number varies quite a bit, depending on who Jane Doe is, the distribution of these numbers has pretty much the same shape and spread for everyone. It’s as if those people further away from the heart of the graph may take longer to get to the heart, but once there, the fan-out pattern is the same. The eccentricities of the vertices in the sample ranged from 14 to 19, with a mean of 15.62 and a standard deviation of 1.04. We also looked at correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l). The associations are as one might predict: The correlation coefficient between d and n is –0.46 (people with a lot of collaborators tend to have smaller Erdös number); the correlation coefficient between d and l is –0.56 (people with a lot of collaborators tend to have shorter paths to other people); and the correlation coefficient between n and l is 0.78 (people with a small Erdös number are closer to the heart of the graph and therefore have shorter paths to others, compared to those out in the fringes).

The “clustering coefficient” of a graph is equal to the fraction of ordered triples of vertices a,b,c in which edges ab and bc are present that have edge ac present. (In other words, how often are two neighbors of a vertex adjacent to each other?) The clustering coefficient of the collaboration graph of the first kind is 1308045/9125801 = 0.14. The high value of this figure, together with the fact that average path lengths are small, indicates that this graph is a “small world” graph (as defined by Duncan Watts — see our pages on research on collaboration and related concepts).

We also have some data on the portion of the collaboration graph outside the “Erdös component” (the one giant component). We are ignoring here the 84,000 isolated vertices and looking only at those authors who have collaborated but do not have a finite Erdös number. There are about 50,000 such vertices. There are about 41,000 edges in these components, so the average degree of these vertices is 1.65. In other words, a person who has collaborated but does not find herself in the Erdös component of C has on the average collaborated with only one or two people. In contrast, the average degree of vertices in the Erdös component is 4.73 (there are about 634,000 edges and 268,000 vertices). Click here for the distribution of component sizes. As would be expected, most of these roughly 18,000 other components are isolated edges (64% of them, in fact). The largest component has 32 vertices. Its most collaborating author is Yu. A. Shevlyakov (Department of Applied Mathematics, Simferopol State University, Crimea, Ukraine), who has 13 coauthors. The person outside the Erdös component with the most coauthors is Gholam Reza Jahanshahloo (Department of Mathematics, University for Teacher Education, Tehran, Iran), who is in a component with 23 vertices (he has collaborated with all but two of them).

Smaller collaboration graphs

It would be interesting to see how much collaboration goes on within one department. In the Department of Mathematics and Statistics at Oakland University there seems to be quite a bit. Click here for a pdf file of their collaboration graph in 2004 and here for the 2012 graph. If other departments produce such a graph, please send the link to me, and I will list them here. So far we have the University of Georgia mathematics department.

The distribution of Erdös numbers

The following table shows the number of people with Erdös number 1, 2, 3, ..., according to the electronic data. Note that there are slightly fewer people shown here with Erdös numbers 1 and 2 than in our lists, since our lists are compiled by hand from various sources in addition to MathSciNet. In addition to these 268,000 people with finite Erdös number, there are about 50,000 published mathematicians who have collaborated but have an infinite Erdös number, and 84,000 who have never published joint works (and therefore of course also have an infinite Erdös number).

 

 

      Erdös number  0  ---      1 person
      Erdös number  1  ---    504 people
      Erdös number  2  ---   6593 people
      Erdös number  3  ---  33605 people
      Erdös number  4  ---  83642 people
      Erdös number  5  ---  87760 people
      Erdös number  6  ---  40014 people
      Erdös number  7  ---  11591 people
      Erdös number  8  ---   3146 people
      Erdös number  9  ---    819 people
      Erdös number 10  ---    244 people
      Erdös number 11  ---     68 people
      Erdös number 12  ---     23 people
      Erdös number 13  ---      5 people

Thus the median Erdös number is 5; the mean is 4.65, and the standard deviation is 1.21.

One of the five people with the largest finite Erdös number is Arturo Robles, and one shortest path goes like this (year of joint work in parenthese): Erdös to Daniel D. Bonar (1977) to Charles L. Belna (1979) to S. A. Obaid (1983) to Wadie A. Bassali (1981) to Ibrahim H. M. el-Sirafy (1976) to Konstantin Chernous (1977) to Jose Valdes (1980) to B. Dugnol (1980) to P. Suarez Rodriguez (1995) to A. E. Alvarez Vigil (1995) to C. Gonzalez Nicieza (1992) to Jose Angel Huidobro (1986) to Robles (1990).

Since Paul Erdös collaborated with so many people, one would expect this distribution for him to be shifted downward from that of a random mathematician. For example, “Jerry Grossman numbers” have a median of 6, a mean of 5.71 (standard deviation = 1.22), and range as high as 15; and “Arturo Robles numbers” have a median of 15, a mean of 15.06 (standard deviation = 1.21). It turns out that the standard deviation is almost exactly the same for almost everyone in the large component.

Erdös numbers of the second kind

The entire discussion so far has been based on linking two mathematicians if they have written a joint paper, whether or not other authors were involved. A purer definition of the collaboration graph (in fact, the one that Paul Erdös himself seemed to favor) would put an edge between two vertices if the mathematicians have a joint paper by themselves, with no other authors. Under this definition, for example, YOLANDA DEBOSE would not have an Erdös number of 1, since her only joint publication with Erdös was a three-author paper with ARTHUR M. HOBBS as well. (But HOBBS would still have Erdös number 1, since some of his joint works are with Paul alone.) Let C' denote the collaboration graph under this more restrictive definition, and let us call the associated path lengths “Erdös numbers of the second kind” (and therefore call traditional Erdös numbers “Erdös numbers of the first kind” when we need to make a distinction).

Here is what we know about C' and Erdös numbers of the second kind. This two-author-only collaboration graph has about 166,000 isolated vertices (including the 84,000 people who have written no joint papers, together with another 83,000 people who have written joint papers but only when three or more authors were involved — these numbers all rounded to the nearest thousand). The remaining 235,000 mathematicians in C' account for about 284,000 edges, so the average degree of a nonisolated vertex in C' is about 2.41 (as opposed to 4.25 for C). Click here to see the data on the distribution of these degrees, i.e., the number of collaborators per author counting only dual works. The median is 1, the mean is 1.34, and the standard deviation is 2.84. (If we omit the isolated vertices, then the median degree is still 1, the mean is 2.41, and the standard deviation is 3.37.) As with the collaboration graph of the first kind, we should expect the nonzero degrees to follow a power law, and when we fit this a model to our data from May 2000 (again grouping the data in the tail), we find the exponent to be about –3.26, with a correlation coefficient for the model of r = 0.97. The model with an exponential decay factor present gives the exponent as –2.70, with r = 0.98.

The three people with 100 or more coauthors of this type are Paul Erdös (of course) with 230, FRANK HARARY with 124, and SAHARON SHELAH with 121. HARARY’s only papers with Erdös are 3-author works, so his Erdös number of the second kind is 2 (through BOLLOBAS, for example); SHELAH’s is 1.

There are about 176,000 vertices in the large component of C' (versus 268,000 in C). The average number of two-author-only collaborators for people in the large component is 2.82; and the average number of two-author-only collaborators for people who have written two-author papers but are not in the large component is 1.21.

The radius of the large component of C' (as it existed in Mathematical Reviews data as of July, 2004) is 14. The unique center is J. Bryce McLeod (whose Erdös numbers, of both kinds, are 3), and not Paul Erdös, whose eccentricity is 15, as is the eccentricity of 392 other people. The diameter of C' is 26 (this is the distance between the two people with Erdös number of the second kind equal to 15). As is the case with the collaboration graph of the first kind, Erdös has the distinction of having the smallest mean distance to the other vertices, 5.58, and no one else has a mean less than 6.

As in the case of C, we took a sample of vertices in the large component of C' and computed for each of them: the degree, the mean distance to all the other vertices, the standard deviation of the distances to all the other vertices, and the maximum distance to another vertex (the “eccentricity”). Here are the results from the sample of 100 vertices. The mean distance to other vertices varied from 6.87 to 11.99, with an average of 9.18 and a standard deviation of 1.19. (Thus a 95% confidence interval for the average distance between vertices is 8.95 to 9.42.) The standard deviation of the distances to all the other vertices was again remarkably constant, with the numbers varying only between 1.48 and 1.63 (mean 1.54, standard deviation 0.034). The eccentricities of the vertices in the sample ranged from 15 to 21, with a mean of 18.21 and a standard deviation of 1.32. As for the correlations among Erdös number (n), vertex degree (d), and average distance to the other vertices (l), the correlation coefficient between d and n is –0.41; the correlation coefficient between d and l is –0.48; and the correlation coefficient between n and l is 0.86.

The clustering coefficient of the collaboration graph of the second kind is 48132/1738599 = 0.028. This is actually a fairly high value (compared to a random graph with this density of edges, where the clustering coefficient is essentially 0), so again we have a “small world” graph. (The reason it is so much smaller than the clustering coefficient for the collaboration graph of the first kind is that the multi-author collaborations create a lot of triangles.) The three mathematicians with at least 25 two-author collaboration pairs among their collaborators whose collaborators most collaborate with each other are Masatoshi Fujii, Masahiro Nakamura, and Jian She Yu, each with about 30 two-author collaborators and local clustering coefficients in the 11% to 13% range — these are the only ones above 10%. (In other words, for these people, about 12% of the pairs of their two-author collaborators have themselves written a two-author paper. In fact, Fujii and Nakamura are adjacent in C'.)

We also have some data on the portion of the collaboration graph of the second kind outside the “Erdös component” (the one giant component). We are ignoring here the 166,000 isolated vertices and looking only at those authors who have written two-author papers but do not have a finite Erdös number of the second kind. There are about 59,000 such vertices. There are about 36,000 edges in these components, so the average degree of these vertices is 1.21. (In contrast, the average degree of vertices in the Erdös component is 2.82 (there are about 248,000 edges and 176,000 vertices). Click here for the distribution of component sizes. As would be expected, most of these roughly 23,000 other components are isolated edges (three fourths of them, in fact). The largest component has 28 vertices.

The distribution of Erdös numbers of the second kind

The following table shows the number of people with Erdös number 1, 2, 3, ..., according to the electronic data but counting only coauthorships on papers with just two authors. In addition to these 176,000 people with finite Erdös number of the second kind, there are about 59,000 mathematicians who have collaborated but have an infinite Erdös number of the second kind (this is about 9,000 greater than the corresponding number for Erdös numbers of the first kind).

  these are Erdös numbers of the second kind
      Erdös number  0  ---      1 person
      Erdös number  1  ---    230 people
      Erdös number  2  ---   2153 people
      Erdös number  3  ---  10118 people
      Erdös number  4  ---  28559 people
      Erdös number  5  ---  47430 people
      Erdös number  6  ---  44102 people
      Erdös number  7  ---  25348 people
      Erdös number  8  ---  11265 people
      Erdös number  9  ---   4299 people
      Erdös number 10  ---   1570 people
      Erdös number 11  ---    533 people
      Erdös number 12  ---    206 people
      Erdös number 13  ---     61 people
      Erdös number 14  ---     25 people
      Erdös number 15  ---      2 people

Thus the median Erdös number of the second kind is 5; the mean is 5.58, and the standard deviation is 1.55, a little higher than the corresponding statistics for Erdös numbers of the first kind, as would be expected. The two people with maximum Erdös number of the second kind Sunil Kumar-2 and N. V. Silenok.

Paul Erdös asked the following question: Is the collaboration graph of the second kind planar? Our guess was that surely it was not, and we now have a proof. If we can find a homeomorphic copy of the complete graph on five vertices in C', or a copy of the complete bipartite graph with three vertices in each part, then we know that the graph cannot be imbedded in a plane. A natural place to look for such subgraphs would be in a portion of the graph where there are lots of edges. The following concept, apparently introduced not by graph theorists but by sociologist, proved fruitful.

The “k-core” of a graph is the (unique) largest subgraph all of whose vertices have degree at least k. (See the article in Social Networks discussed on the “research” subpage for references to the notion of core.) It is easy to find the k-core: just remove all vertices of degree less than k, then repeat again and again until no such vertices remain. If any vertices remain, then they form the k-core. It is clear that the 1-core contains the 2-core, which contains the 3-core, etc. The smallest nonempty k-core (i.e., the one for largest k) is called the “main core”. For the collaboration graph of the second kind, we found (using electronic data) that the main core is the 5-core, and it has 70 vertices (including Erdös, not surprisingly, with degree 30) and 272 edges. Click here for the names of these most social mathematicians (all of whom have Erdös number of the first kind at most 2, and 50 of whom are Erdös coauthors), and here for the adjacency matrix of this graph.

It turns out that the main core of the collaboration graph of the second kind has four complete graphs on five vertices: ALON-FUREDI-KLEITMAN-WEST-E R D O S, COLBOURN-Hartman-Mendelson-PHELPS-ROSA, COLBOURN-Lindner-Mendelson-PHELPS-ROSA, and Lindner-MULLIN-ROSA-STINSON-Wallis. It also has 125 copies of the complete bipartite graph with three vertices in each part (the other canonical nonplanar graph), such as (FAN CHUNG, RODL, SZEMEREDI)-(RON GRAHAM, TROTTER, E R D O S). So this graph is certainly nonplanar.

Actually, these are not the only complete graphs on five vertices in the collaboration graph of the second kind. For example, Gerald Ludden (Michigan State University) has only four collaborators of the second kind, but each of them has two-author collaborated with each of the others (Koichi Ogiue, Masafumi Okumura, Bang-Yen Chen, and David E. Blair).

Statistical summaries of Erdos1 and Erdos2 lists (numbers of the first kind)

The data below are based on the 2007 data contained on this site (as opposed to the July 2004 MR data).

This file contains a statistical summary of the number of Erdös number 1 coauthors for people with Erdös number 2, the number of Erdös number 1 coauthors for people with Erdös number 1, the total number of coauthors for people with Erdös number 1, the number of papers that Erdös’s coauthors have with him, and the number of new coauthors Paul Erdös added each year.

This is a textfile giving the adjacency lists for the induced subgraph of the collaboration graph on all Erdös coauthors.

This file lists the Erdös number record holders (for example, which person with Erdös number 2 has the most coauthors with Erdös number 1?).


MORE INFORMATION: A paper summarizing some of what is on this page is available in pdf. It appears in the Proceedings of 33rd Southeastern Conference on Combinatorics (Congressus Numerantium, Vol. 158, 2002, pp. 201-212). An abbreviated version appears in SIAM News 35:9 (November, 2002), pp. 1, 8-9; click here for a reprint (pdf). Another article, which also looks at the publication patterns as a function of area of mathematics, appears in the Janaury 2005 issue of the Notices of the American Mathematical Society. Finally, here is a file of slides from a talk about the collaboration graph of papers rather than the collaboration graph of people.


For a Bulgarian translation of this page (by Artem Delik), click here.

 

URL = http://www.oakland.edu/enp/trivia.html
This page was last updated on November 2, 2014.
Return to Erdös Number Project home page.