If People Were Papers and Papers Were People

 

 

 

 

Jerrold W. Grossman

Oakland University

Rochester, Michigan

 

 

 

 

 

 

AMS Annual Meeting

January 10, 2004

Phoenix, Arizona


Let G be a bipartite graph, with

vertex sets A (authors) and

P (papers) Ñ e.g., the Òauthorship graphÓ for mathematics papers.


 

Form the author collaboration graph CA on vertex set A, in which two vertices are adjacent if they have a common neighbor in P. 

Similarly, form CP, the Òpaper collaboration graphÓ on P.


 

People who have studied actual social networks such as CA (Albert-L‡szl— Barab‡si, Fan Chung, J. G., Mark Newman, É), as well as similar large graphs such as power grids, the Internet, and neural networks, have found that in such graphs:

 

There is a giant component containing most of the vertices.

The diameter of the giant component is fairly small.

The clustering coefficient is relatively high.

The vertex degrees follow a power law distribution.


 The Data

 

 

All authored items from

Mathematical Reviews

(MathSciNet)

1940Ð1999

 

 

MR tries hard to identify authors as people, not name strings.

 

 

                340,000  authors (A)

 

             1,600,000  papers (P)

 

             2,300,000  edges

 


Facts About the Underlying Bipartite Graph

(the Authorship Graph)

 

                               authors    papers

mean degree              6.87         1.45

median degree              2              1

max degree               1500          15?


Facts About the Author and Paper Collaboration Graphs, CA and CP

                              authors     papers

# vertices                    337,000        1,598,000

# edges                       496,000      44,685,000

mean degree                  2.94                 56

max degree                    502               2090

 

Ignoring the isolated vertices:

# vertices                    253,000        1,541,000

mean degree                  3.92                 58

median degree                 2                    33

 

giant component            62%              82%

isolated vertices             25%                4%

other vertices                 13%              14%


 

Degree distribution for non-isolated vertices:

 

                            authors     papers

degree      1                   37%             2.9%

degree      2                   22%             2.6%

degree      3                   12%             2.4%

degree      4                     7%             2.2%

degrees  5Ð10                14%             12%

degrees  > 10                  8%             78%


Distribution of Distances in (the Giant Components of) CA and CP

 

Diameters are essentially the same, because the distance between two papers in CP is within 1 of the distance between their authors in CA.  Diameter is around 28.

Pick a ÒcenterÓ vertex and call it e (for Paul Erdšs).  The Erdšs number of vertex v (with respect to vertex e) is the distance from v to e.  What is the distribution of Erdšs numbers in CA and CP?


 
Some Graph Theory Questions

1.  Given a desired CA, what is the minimum number of papers needed in G (elements of P) that will produce it?

 

The edge clique cover number of a graph H is the smallest number of complete subgraphs of H whose union includes all the edges of H.  The intersection number of H is the minimum cardinality of a set S such that H is the intersection graph of a collection of subsets of S.  These two values are the same and give the answer to this question. 

(But itÕs NP-hard to compute.)


2.  Given an authorship graph G, what is the minimum number of papers of G that will produce the same author collaboration graph CA?  In other words, what fraction of the papers can be thrown away without changing the author collaboration graph?

 


 

 

For more information, visit the Erdšs Number Project website:

 

 

www.oakland.edu/~grossman/erdoshp.html

 

 

www.oakland.edu

/~grossman/erdoshp.html