Interview with Author Steven Skiena

Sitting on History
Sitting on History
Who doesn’t love lists of figures from history? Historians love compiling them. Most of us enjoying reading them. And many of us enjoy arguing about the people on them. But who decides on the criteria for inclusion on such lists? Who really deserves a place in the pantheon? And how have these criteria changed over time as the scholarly world has become diverse and more inclusivity-minded?

These are the questions that Steven Skiena and Charles Ward examine in their new book, Who’s Bigger? Where Historical Figures Really Rank. I wrote to Steven Skiena and asked for an interview. Much to my delight, he agreed. This is the result.

Okay, Steve. The first thing I did, as I imagine many of your readers will do, was look at the table of contents of your book. I then turned immediately to the section of the book entitled, “Our 100.” And I looked at that list from the standpoint of a woman who is exceedingly aware that women have long been underrepresented in rankings of historical greatness (which have often been compiled by heavily male groups of historians).

First of all, both Queen Victoria and Elizabeth I are on the list. Can you explain why both women are on your list? And if Queen Victoria, why not Elizabeth II? 

The simplest answer to why any figure ranks where they do is because that’s where our data and analysis methods put them. Our book is the result of a computational analysis of over 800,000 people appearing in Wikipedia, and the same methods apply to all of them. Neither Charles nor I had any say in the matter. We developed ranking algorithms based on sound modeling principles, using the best data available.

Of course, we can speculate why certain figures are or are not highly ranked.

In comparing Queen Victoria and Queen Elizabeth II (who ranks 132 on our list), we suspect that the greater role of the United Kingdom on the world scene 150 years ago is the key factor here. No one says we live today in the Elizabethan Era.

And why is Margaret Thatcher not on your list? Or Catherine the Great? You include George W. Bush but not Margaret Thatcher? Seriously? Can you make your case for that?

Our list is made up by algorithms, not us. Catherine the Great ranks 108 and Margaret Thatcher 271, so both are well regarded. We can only fit 100 people into our top 100.

George W. Bush is almost certainly overrated by our methods. He served eight years as president, and was for better or worst the dominant political figure thus far this century. But it is extremely difficult to put contemporary figures into the proper context in a systematic way, and I expect Bush will drop if we repeated our analysis twenty years from now.

And speaking of George W. Bush, why include him but not Lyndon Johnson? In other lists in the book, we do find Thatcher and Lyndon Johnson. Could you tell us a little about the various lists and the criteria you used to compile them?

We have dozens of ranking lists, identifying the most significant people in many different domains: presidents, authors, scientists, criminals, musicians, and even dentists.

Getting back to “Our 100” — How is it that Wagner is included but not Verdi? And Elvis Presley but not Scott Joplin, Louis Armstrong, or Duke Ellington? And Tchaikovsky but not Haydn?

I am personally quite comfortable with all of the decisions you mention here. For example, Elvis Presley is the historical figure most strongly associated with Rock ‘n Roll, which has dominated popular music for the past sixty years. Roll over Beethoven, and tell Tchaikovsky the news!

Did everyone on the list have to be dead? Is that why Lech Wałęsa and Lee Kuan Yew are not on it?

No, living people are certainly eligible for our rankings. Both of these figures are ranked highly by our algorithms. Indeed, we have a table of the most significant leaders of every nation on earth, and these two hold pride of place in Poland and Singapore, respectively.

The list is quite light on Asian political figures. You can’t be serious about not including Mao Zedong. That strikes me as bizarre. You include Mahatma Gandhi, but not Mao? They both affected the lives of comparable numbers of people. And Joan of Arc but not Deng Xiaoping? 

Our statistical analysis is drawn from English language sources, so indeed we may fairly underestimate the significance of non-Western figures. Mao ranks 151 and Deng 1163. Even with this cultural bias, both rank as strong figures and Mao very close to the top 100.

How did you decide to include Grover Cleveland but not Ataturk, Eamon de Valera, Ahmed Ben Bella, Muhammad Ali Jinnah, Ruhollah Khomeini, Nasser, or Theodor Herzl? And King Arthur (and it is not clear he even existed) but not Margaret Sanger, Emmeline Pankhurst, Margaret Fuller, or Simone de Beauvoir? 

Most of the people you cite receive very strong ratings by our method, and typically rank as the most significant leader from their respective countries.

Sanger ranks at 2672, in the 99.5th percentile of all the people in Wikipedia. We think this is evidence that she should be taught in children’s textbooks, for example. Fuller ranks even higher, at 2135. Pankhurst at 2561 and Beauvoir at 3302.

On the literary front why Poe and Wilde but not Baudelaire, Orwell, Tennyson, Wordsworth, T.S. Eliot, Virginia Woolf, James Joyce or Rilke?

This is what the algorithms say. It does seem to favor popular literary figures like Poe and Wilde and Mark Twain (ranked 53) ahead of unreadable ones like Joyce.

Let’s talk about your use of Wikipedia to compile your various lists. As a woman, I found that worrisome. Women are notoriously underrepresented in Wikipedia both as contributors and as subjects of entry. If you rely heavily on Wikipedia for much of your data, your results will be heavily skewed in favor of male subjects. How are you compensating for the systemic pro-male bias of Wikipedia? You seem fairly confident that the representation of women in Wikipedia is improving. Could you explain in a non-wonky fashion why you think so?

Our methods show that for historical figures from the past 300 years, the average significance of women appearing in Wikipedia was substantially greater than that of the average man. This implies that women required greater credentials to get into Wikipedia, analogous to being about 4 IQ points smarter in the mean. Fortunately, this significance gap has closed and essentially eliminated in modern times.

The strength of our methods is that we make no assumptions of what is biased in favor of what. That the average women in Wikipedia is historically stronger than the average man is a statistical observation which is revealed because we do not take sides on such issues – we simply quantify effects which are present in our data. Interpreting what it means requires integrity and judgment, but has to start from the data.

How do you compensate for the lack of interest most English speakers show in the history of huge swathes of the world such as Latin America, Southeast Asia (both of which are very underrepresented on your list as are Japan and Eastern and Central Europe), the Middle East, the South Pacific region, the Caribbean and for the heavily white male demographic of Wikipedia contributors? Even Canada, Australia and New Zealand don’t do very well on your list.

Given that we analyze only English-language sources, there likely is a bias against non-Western figures. But Canada, Australia, and New Zealand are treated quite fairly in our methods, with their most significant leaders ranked 159, 1114, and 2282, respectively. This sounds about right to me, given the size of these nations on the world scene.

I was worried by this statement, “Our methods summarize the knowledge of all the authors and readers of the English-language Wikipedia, to order historical figures consistent with the general views of this community…” and I am certain that many in the library community would find that method not confidence inspiring. How would you address the fears of librarians and probably scholars in many fields that you are simply adding to the world’s over-reliance on Wikipedia as a source for determining historical importance?

Our algorithms have an Anglo-centric bias because of our reliance on English-language sources. But we have no shame about relying on Wikipedia as our primary data source. Wikipedia is open, comprehensive enough for statistical analysis in a wide variety of domains, and frankly a hell of a lot better than it has any right to be. That our rankings derived from Wikipedia data so accurately predict poll ratings, expert rankings, painting and autograph prices, and other things is testimony to the general quality.

I have no doubt that many librarians and scholars turn to it as a trusted first source when they want to get acquainted with a new subject. As a tenured professor of Computer Science and author of five books, I count as a scholar, and I am not ashamed to say I read Wikipedia – don’t you?

You also say, “…a mix of famous people, including the major pillars of Western Civilization.” What about those of East Asia? You have hardly any thinkers (other than Buddha) from there or from Latin America, Africa, Central Asia and so on.

The same goes for art in that you include Vincent van Gogh but not Hokusai. 

More interesting than the question of how the leading figures in one culture rank compared to other cultures are how the leading figures in each group rank. We have identified the most significant political leader for every nation on earth, and, generally-speaking, natives tell us we have identified the right figure for their country.

In short, could you tell us how your methods help to eliminate the existing Western bias of such lists of historical importance? You say, “The success of our ranking methods is best established by the banality of our results.” Now, that seems like a strange way to try to get people to buy your book. What’s novel here? What is paradigm shifting? Do we learn anything about neglected figures, unsung heroes, brilliant people of whom we never before heard and whom the world needs to know of in order to understand how our world came to be?

A top 100 ranking which shocked readers with who was included would be ridiculous, not informative. The exciting things come from analyzing rankings you generally agree with, so that the surprises come in context. Read the Wikipedia article of Joseph Brant – a Native American who we rank 1328 but who you probably never heard of – and tell me we don’t reveal unsung historical heroes.

The paradigm shifting part of what we do is that we provide a rigorous statistical analysis of historical reputation on a scale never previously attempted. This enables us to answer a variety of interesting questions in ways never before possible: are women really underrepresented in Wikipedia (yes), do children’s textbooks include only the most historically-significant figures (not really), do hall of fame committees generally recognize the right people (by no means).

There doesn’t seem to be any lists of disabled people. Did you prefer not to use disability as a category of its own?

Disabled people would be an interesting category to study. High ranking people would include Franklin Roosevelt, Helen Keller, and Louis Braille.

You are not at all shy about bringing commercial considerations into your discussion. There is an endearing crassness to this, “…our significance rankings can be used to predict the prices of such diverse commodities as celebrity autographs, baseball cards and modern paintings.” (I can hear the teeth grinding of art historians the world over.) Have you purchased or invested in anything based on your number-crunching for the book?

Popper said that falseifiablity is what makes something a science. The degree to which our rankings predict quantifiable phenomena in the world is evidence that they are measuring something important. Predictions, when money is on the line, are generally more serious than those when nothing is at stake.

My previous Cambridge University Press book was Calculated Bets: Computers, Gambling, and Mathematical Modeling to Win, where we developed a betting system for the sport of Jai-alai. All winnings were donated to charity.

You discuss the historical figures who appear and who do not appear in Steve’s daughter’s grade school history textbook and mention that it seems under-populated with modern figures in technology, science, medicine and business. But you don’t suggest any women in those fields that should be included in Bonnie’s textbook. Is she doomed for the rest of her life to read only about Steve Jobs, Bill Gates and Warren Buffett? 

We do explicitly identify several women who are worthy of my daughter’s textbook, including Hillary Clinton, environmentalist Rachel Carson, and poet Emma Lazarus. And of course there are others. But some fields have historically stronger women figures than others.

You don’t seem to address regional bias when it comes to fame in the U.S. As an Oregonian, I get fairly tired of hearing about people in New York or California. Do your methods compensate for the historical cultural hegemony of New York in fields such as publishing, journalism, etc.?

I personally believe that Manhattan is the center of the universe, so you won’t catch me making any such complaints. We do not correct regional biases, although we do identify the most significant Senator, Governor, and Representative for each of the 50 states, so you can judge whether we treat them fairly.

You mention Lou Gehrig. As he fades into memory and football overtakes baseball as the national sport, do you think that ALS will become known more by that name and less as Lou Gehrig’s Disease? How does disease play into the fame game? 

There is no question that martyrdom and other dramatic deaths affect the fame of an individual. Certainly much of Lou Gehrig’s fame outside baseball is because of his death by ALS. We figure that perhaps half of all references in books to Lou Gehrig refer to the disease.

You say Stan Musial was the most respected baseball player of the 1950s. In what respect? How do you measure that? Jackie Robinson was still playing in the 1950s, as was Ted Williams. 

By measures like the votes he received for the Most Valuable Player award. Musial received far more votes over his career for this award than any other player.

You mention several leaders of Israel (Ben-Gurion, Meir, Rabin) but assert their stature without really explaining why they should be considered any more important than leaders in comparably lightly populated countries. Does fluency in English and being closely allied the US automatically endow leaders with stature? What makes Rabin more important than Raúl Alfonsín of Argentina, say?

Since our rankings are produced by algorithms that work with the best data we could identify, they are not perfect, but nothing is. Rabin won a Nobel Prize and was a martyr to the cause of peace. I am very content with where we rank him.

We confess to an Anglo-centric bias because of analyzing English language sources. For better or worse most of our readers are likely to share this bias, so our rankings will likely resonate with our readership.

Do the enormous fashion and cosmetics industries not count as businesses in your world? Why are entrepreneurs, tycoons or designers like Coco Chanel, Estee Lauder, Helena Rubinstein, and Elizabeth Arden not (nor is any woman) on your list of significant business leaders? (Are you starting to get why some women don’t buy into the male-dominated Wikipedia standard of significance?) 

Coco Chanel ranks as 1736, which sounds about right to me. The issue is not the bias of Wikipedia here. Admittedly, neither Charles nor I are particularly interested in fashion and did not particularly highlight this area.

Longfellow but not Elizabeth Bishop as a significant poet? Could you explain to us how your rankings take into account critical esteem versus baby-boomers’ rosy recollections of poems they enjoyed in grade school?

Elizabeth Bishop ranks 7610. Poets today are simply not as significant as poets of earlier generations. It is a literary form of declining relevance, as measured by our rankings. Today’s leading poets write rock-and-roll lyrics, not poems.

I notice you did not really address the matter of prominence in fields like journalism, academia or publishing. Why is that?

Our publisher already felt that our book was long enough. Journalists would have been an interesting category.

Do you have any predictions of how prominence will be achieved in the age of Twitter? Will your system take into account the rise of social media and the decline of the printed book?

The same people who write tweets and post on Facebook can edit Wikipedia. Our methods will adjust to changing tastes in communication.

What do you hope the results of your book will be vis-à-vis the measurement of greatness from here on out? 

I hope our book will be a respected contribution to the studies of the Digital Humanities and Computational Social Sciences. Certainly we make our data available for further study on our website,

One reviewer claims that our book is a “guaranteed argument-starter.” We can happily live with this. The fun of our book comes from seeing where we agree with your preconceptions and where we differ. We lay out exactly what our methods are, and follow them scrupulously. You are free to disagree, but we dare you to do so with the same level of rigor and fairness that we do.

Why did you choose Cambridge University Press as your publisher?

Cambridge University Press published two of my previous books, and it has always been a pleasure to work with my editor Lauren Cowles.

Thank you for your time.