Friday, March 16, 2012

Annealing Elsevier

Through a bipartisan pair of shills, Elsevier introduced a bill that would have abolished the NIH open-access mandate and prevented other government research-funding agencies from requiring open access to government-sponsored research. In this Research Works Act (RWA) episode, Elsevier showed its hand. Twice. When it pushed for this legislation, and when it withdrew.

Elsevier was one of the first major publishers to support green open access. By pushing RWA, Elsevier confirmed the suspicion that this support is, at most, a short-term tactic to appease the scholarly community. Its real strategy is now in plain sight. RWA was not done on a whim. They cultivated at least two members of the House of Representatives and their staff. Just to get it out of committee, they would have needed several more. No one involved could possibly have thought they could sneak in RWA without anyone noticing. Yet, after an outcry from the scholarly community, they dropped the legislation just as suddenly as they introduced it. If Elsevier executives had a strategy, it is in tatters.

Elsevier’s RWA move and its subsequent retrenchment have more than a whiff of desperation. I forgive your snickering at this suggestion. After all, by its own accounting, Elsevier’s adjusted operating margin for 2010 was 35.7% and has been growing monotonously at least since 2006. These are not trend lines of a desperate company. (Create your own Elsevier reports here. Thanks to Nalini Joshi, @monsoon0, for tweeting the link and the graph!)

Paradoxically, its past success is a problem going forward. Elsevier’s stock-market shares are priced to reflect the company’s consistently high profitability. If it were to deteriorate, even by a fraction, share prices would tumble. To prevent that, Elsevier must raise revenue from a client base of universities that face at least several more years of extremely challenging budgets. For universities, the combination of price increases and budget cuts puts options on the table once thought unthinkable. Consider, for example, the University of California and the California State University systems. These systems have already cut to the bone, and they may face even more dire cuts, unless voters approve a package of tax increases. Because of their size, just these two university systems by themselves have a measurable impact on Elsevier’s bottom line. This is repeated across the country and the world.

Clearly, RWA was intended to make cancelling site licenses a less viable option for universities, now and in the future. When asked to deposit their publications in institutional repositories, it is an unfortunate fact that most scholars ignore their own institutions. They cannot ignore their funding agencies. Over time, funder-mandated repositories will become a fairly comprehensive compilation of the scholarly record. They may also erode the prestige factor of journals. After all, what is more prestigious? That two anonymous referees and an editor approved the paper or that the NIH funded it to the tune of a few million dollars? Advanced web-usage statistics of the open-access literature may further erode the value of impact factor and other conventional measures. Recently, I expressed some doubts that the open access movement could contribute to reining in journal prices. I may rethink some of this doubt, particularly with respect to funder-mandated open access.

Elsevier’s quick withdrawal from RWA is quite remarkable. Tim Gowers was uniquely effective, and deserves a lot of credit. When planning for RWA, Elsevier must have anticipated significant push back from the scholarly community. It has experience with boycotts and protests, as it has survived several. Clearly, the size and vehemence of the reaction was way beyond Elsevier's expectations. One can only speculate how many of its editors were willing to walk away over this issue.

Long ago, publishers figured out how to avoid becoming a low-profit commodity-service business: they put themselves at the hub of a system that establishes a scholarly pecking order. As beneficiaries of this system, current academic leaders and the tenured professoriate assign great value to it. Given the option, they would want everything the same, except cheaper, more open, without restrictive copyrights, and available for data mining. Of course, it is absurd to think that one could completely overhaul scholarly publishing by tweaking the system around the edges and without disrupting scholars themselves. Scholarly publishers survived the web revolution without disruption, because scholars did not want to be disrupted. That has changed.

Because of ongoing budget crises, desperate universities are cutting programs previously considered untouchable. To the dismay of scholars everywhere, radical options are on the table as a matter of routine. Yet, in this environment, publishers like Elsevier are chasing revenue increases. Desperation and anger are creating a unique moment. In Simulated Annealing terms (see a previous blog post): there is a lot of heat in the system, enabling big moves in search of a new global minimum.

Disruption: If not now, when?


Wednesday, February 22, 2012

Annealing the Information Market




When analyzing complex systems, applied mathematicians often turn to Monte Carlo simulations. The concept is straightforward. Change the state of the system by making a random move. If the new state is an improvement, make a new random move in a direction suggested by extrapolation. Otherwise, make a random move in a different direction. Repeat until a certain variable is optimized.

A commodity market is a real-life concurrent Monte Carlo system. Market participants make sequences of moves. Each new move is random, though it incorporates experience gained from previous moves. The resulting system is a remarkably effective mechanism to produce commodities at the lowest possible cost while adjusting to changing market conditions. Adam Smith called it the invisible hand of the free market.

In severely disrupted markets, the invisible hand may take an unacceptably long time, because Monte Carlo systems may remain stuck in local minima. We may understand this point by visualizing a mountain range with many peaks and valleys. An observer inside one particular valley thinks the lowest point is somewhere on that valley’s floor. He is unaware of other valleys at lower altitudes. To see these, he must climb to the rim of the valley, far away from the observed local minimum. This takes a very long time with small random steps that are biased in favor of going towards the observed local minimum.

For this reason, Monte Carlo simulations use strategies that incorporate large random moves. One such strategy, Simulated Annealing, is inspired by a metallurgical technique that improves the crystallographic structure of metals. During the annealing process, the metal is heated and cooled in a controlled fashion. The heat provides energy to change large-scale crystal structures in the metal. As the metal cools, restructuring occurs only at gradually smaller scales. In Simulated Annealing, the simulation is run “hot” when large random moves are used to optimize the system at coarse granularity. When sufficiently near a global minimum, the system is “cooled“, and smaller moves are used for precision at fine granularity. Note that, from a Monte Carlo perspective, large moves are just as random as small moves. Each individual move may succeed or fail. What matters is the strategy that guides the sequence of moves.

When major market disruptions occur, resistance to change breaks down and large moves become possible. (The market to runs “hot” in the Simulated Annealing sense.) Sometimes, government leaders or tycoons of industry initiate large moves, because they believe, right or wrong, that they can take the market to a new global minimum. Politicians enact new laws, or they orchestrate bailouts. Tycoons make large bets that are risky by conventional measures. Sometimes, unforeseen circumstances force markets into making large moves.

The music industry experienced such an event in late 1999, when Napster, the illegal music-sharing site, suddenly became popular. Eventually, this disruption enabled then-revolutionary business models like iTunes, which could compete with illegal downloading. This stopped the hemorrhaging, though not without leaving a disastrous trail. Traditional music retailers, distributors, and other middlemen were forced out. Revenue streams never recovered. With the Stop Online Piracy Act (SOPA), the music industry, joined by the entertainment industry, was trying to undo some of the damage. If enacted, it would have caused significant collateral damage, but it would have done nothing to reduce piracy. This is covered widely in the blogosphere. For example, consider blog posts by Eric Hellman [1] [2] and David Post [3].

While SOPA is dead, other attempts at antipiracy legislation are in the works. Some may succeed legislatively and may be enacted. In the end, however, heavy-handed legislation will fail. The evolution towards ubiquitous information availability (pirated or not) is irreversible. Even the cruelest of dictators cannot contain the flow of information. Why would anyone think democracies could? Eventually, laws follow society’s major trends. They always do.

When Napster became popular, the music industry was unable to fight back, because its existing distribution channels had become technologically obsolete. Napster was the large random move that made visible a new valley at lower altitude. Without Napster, some other event, circumstance, or product would eventually have come along, caused havoc, and be blamed. Antipiracy legislation might have delayed the music industry’s problems in 1999, but it will not solve the entertainment industry’s problems in 2012.

In the new market, piracy may no longer be the problem it once was. Consumers are willing to pay for convenience, quality of service, and security (absence of malware). Piracy may still depress revenues, but there are at least three other reasons for declining revenues. (1) Revenues no longer support many middlemen, and this is reflected in lower music prices through free-market competition. (2) Some consumers are interested in discovering new artists themselves, not in listening to artists discovered on their behalf by record labels. (3) The recession has reduced discretionary income.

It is difficult to assess the relative importance of disintermediation, behavior change, recession, and piracy. But the effect of piracy on legal downloads is probably much less than thought. This may be good news for the music industry. After many large and disruptive moves, the music market may be near a new global minimum. Here, it can rebuild and find new profit-making ventures. These are the kind of conventional “small” moves for a normal, non-disrupted market.

Other information markets are not that lucky.



Friday, October 28, 2011

Open Access Doubts


Science embraces the concept of weakly held strong ideas. This was illustrated recently by the excited reaction of the High-Energy Physics (HEP) community to a recent experiment. ("Measurement of the neutrino velocity with the OPERA detector in the CNGS beam", arXiv:1109.4897v1) If confirmed, it puts into doubt the speed of light as an absolute limit. The relevant paper is available through arXiv, which started as a HEP preprint repository and blazed a trail for Open Access. In light of the origins of the Open Access Movement, let us again be inspired by the HEP community and its willingness to follow experiments, wherever they may lead. Assessing the ongoing Open Access experiment, where are our doubts? I have three.

Is Affordable Better than Free?

All else being equal, open is better than closed. But… all else is not equal. A robust and user-friendly network of open scholarly systems seems farther away than ever because of inexpertly formatted content and bad, incomplete, and non-public (!) metadata. While there is always room for improvement, pay-walled journals provide professionally formatted and organized content with excellent metadata and robust services. The problem is cost. Unfortunately, we did nothing to reduce cost. We only negotiated prices.

What if we could significantly reduce cost by implementing pay walls differently? The root of the problem is site licenses. For details, see “What if Libraries were the Problem?”, “Libraries: Paper Tigers in a Digital World”, “The Fourth Branch Library”, and “The Publisher’s Dilemma”. Site licenses are market-distorting products that preserve paper-era business processes of publishers, aggregators, and libraries.

Universities can cut the Gordian knot right now by replacing site licenses with direct subsidies to researchers. After a few months of chaos, consumer-oriented services with all kinds of pricing models would emerge. Researchers, empowered to make individual price-value judgments, would become consumers in a suddenly competitive market for content and information services. The inception of a vibrant marketplace is impossible as long as universities mindlessly renew site licenses.

What are the Goals of Institutional Repositories?

Open Access advocates have articulated at least five goals for institutional repositories: (1) release hidden information, (2) rein in journal prices, (3) archive an institution’s scholarly record, (4) enable fast research communication, and (5) provide free access to author-formatted articles.

Institutional repositories are ideal vehicles for releasing hidden information that, until recently, had no suitable distribution platform (1). For example, archives must protect original pieces, but they can distribute the digitized content.

The four remaining goals, all related to scholarly journals, are more problematic. Institutional repositories fall short as a mechanism to rein in journal prices (2), because they are not a credible alternative for the current archival scholarly record. Without (2), goals (3), (4), and (5) are irrelevant. If we pay for journals anyway, we can achieve (3) by maintaining a database of links to the formal literature. Secure in the knowledge that their journals are not in jeopardy, publishers would be happy to provide (4) and (5).

A scenario consistent with this analysis is unfolding right now. The HEP community launched a rescue mission for HEP journals, which lost much of their role to arXiv. The SCOAP3 initiative pools funds currently spent on site-licensing HEP journals. This strikes me as a heavy-handed approach to protect existing revenue streams of established journals. On the other hand, SCOAP3 protects the quality of the HEP archival scholarly record and converts HEP journals to the open-access model.

Are Open-Access Journals a Form of Vanity Publishing?

If a journal’s scholarly discipline loses influence or if its editorial board lowers its standards, the journal’s standing diminishes and various quality assessments fall. In these circumstances, a pay-walled journal loses subscribers and, eventually, fails. An open-access journal, on the other hand, survives as long as it attracts a sufficient number of paying authors (perhaps by lowering standards even further). Financial viability of a pay wall is a crude measure of quality, but it is nonnegotiable and cannot be rationalized away: the journal fails, its editorial board disappears, its scholarly discipline loses some of its stature, and its authors must publish elsewhere.

We should not overstate this particular advantage of the pay wall. Publishers have kept marginal pay-walled journals alive through bundling and consortium incentives, effectively using strong journals to shore up weak ones. Open-access journals may not be perfect, but we happily ignore some flaws in return for free access to the scholarly record. For now, open-access journals are managed by innovators out to prove a point. Can successive generations maintain quality despite a built-in incentive to the contrary?

Wednesday, October 19, 2011

The Birth of the Open Access Movement


Twelve years ago, on October 21st 1999, Clifford Lynch and Don Waters called to order a meeting in Santa Fe, New Mexico. The organizers, Paul Ginsparg, Rick Luce, and Herbert Van de Sompel, had a modest goal: generalize the High Energy Physics preprint archive into a Universal Preprint Service available to any scholarly discipline. (Currently known as arXiv and hosted by Cornell University, the HEP preprint archive was then hosted at the Los Alamos National Laboratory.)

This meeting constructed the technical foundation for open access: the Open Archives Initiative and the OAI Protocol for Metadata Harvesting (OAI-PMH). It coined the term repository. (Yes, it was a compromise.) It inspired participants. Some went home and developed OAI-compliant repository software. Some built or expanded institutional and disciplinary repositories. Some started initiatives to raise awareness.

At the meeting, there were high-flying discussions on the merits of disciplinary versus institutional repositories. Some argued that disciplinary repositories would be better at attracting content. Others (including me) thought institutional repositories were easier to sustain for the long haul, because costs are distributed. In retrospect, both sides were right and wrong. In the years that followed, even arXiv, our inspirational model, had problems sustaining its funding, but the HEP community rallied to its support. Institutional repositories got relatively easily funded, but never attracted a satisfactory percentage of research output. (It is too early to tell whether sufficiently strong mandates will be widely adopted.)

There were high hopes for universal free access to the scholarly literature, for open access journals, for lower-priced journals, for access to data, for better research infrastructure. Many of these goals remain high hopes. Yet, none of the unfulfilled dreams can detract from the many significant accomplishments of the Open Access Movement.

Happy Twelfth Birthday to the Open Access Movement!

Friday, September 23, 2011

Information Literacy, Libraries, and Schools

On September 14th, Los Angeles Times’ columnist Steve Lopez covered the closure and near-closure of libraries in elementary, middle, and high schools. In the best of times, school libraries play second fiddle to issues like improving student-teacher ratio. In crisis times like today, these libraries do not stand a chance. A week later, he covered the parents’ reaction.

The parents’ efforts to rescue these libraries are laudable, but lack vision and ambition. They are merely trying to retain a terrible status quo. A room of books is not the kind of library where primary literacy skills are learned. The school superintendent, John Deasy, has it basically right: primary literacy skills are learned in the classroom. Critical reading, identifying high-quality information, web-research techniques, and specific sources for particular subject matters are skills that can be learned only if they are incorporated in every class, every day.

At every level in our society, the response to this terrible economic crisis has been one of incremental retrenchment instead of visionary reinvention. The phrase “don’t let a crisis go to waste” may have a bad image, but it applies in this case. California is the birthplace of information technology, and its schools and their infrastructure should reflect this.

Around the same time as the first column, rumors started circulating that Amazon is planning an electronic library available by monthly subscription. This is a technology and a business model that can provide every student with a custom digital library. It may even save money by eliminating the management and warehousing of print books (including text books).

School districts should put out requests for proposals to supply every student with an e-book reader, tablet, or notebook computer that has access to a digital library of books and other resources. Big-name enterprises, such as Amazon, Apple, Barnes&Noble, and Google, would be eager to capture his young demographic. Some philanthropic organizations might be willing to pitch in by buying the rights of some books and putting them in the public domain. A slice of public library funds should be allocated to this digital library.

Traditional school libraries are inadequate. It is time to shelve twentieth century infrastructure and fund the tools students need in the twenty-first century.

Tuesday, September 13, 2011

ETD 2011 and the Library of the Future

This week, the Networked Digital Library of Theses and Dissertations (NDLTD) holds its annual international conference in Cape Town, South Africa. Founded by Prof. Edward Fox of Virginia Tech, NDLTD is dedicated to making theses and dissertations available on the web. NDLTD is an organization where library and academic-computing professionals coordinate their activities and support each other as they develop programs to improve the quality of multimedia theses and the repositories that hold them.

The good news is that universities from across the globe are adopting electronic-theses mandates at an astonishing rate. Right now, over two million theses are available with a few mouse clicks. Check out the VTLS Visualizer or the SCIRUS ETD Search. By making their research available online, universities increase its impact. This is especially important for developing nations, who are in dire need of thinkers that solve local problems and contribute to global knowledge. That makes the location of this year’s NDLTD conference crucially important, both from a practical and a symbolic point of view.

The bad news is that thesis repositories are underfunded. Often, a thesis repository is thought of as just an affordable digital service with a fast payoff in research visibility. In fact, it is much more: it is a paradigm shift for the business of university libraries. Paper-era libraries collect information from around the world to be consumed by their communities. This paradigm is largely obsolete and must be turned upside down. As discussed in a previous blog post, “The Fourth Branch Library”, digital-era libraries should focus on the information produced by their communities, collect it, manage it, and make it widely available. Setting up an electronic thesis repository, helping students and faculty develop best practices, and helping universities through policy issues are exactly the kind of activities at the core of the digital library mission.

Repositories should be funded at a level commensurate with their importance to the future of libraries. We need to redouble our efforts to get out of PDF and into structured text, to enable full-text search, to improve reference linking, and to connect scientific formulas and equations to appropriate software for manipulation. We must capture all data underlying thesis research and make it available in raw form as well as through interactive visualizations. We must standardize when appropriate and allow maximum flexibility when feasible. A lot of work is ahead.

I congratulate the organizers of ETD 2011 for putting together a fantastic program. I hope the attendees of ETD 2011 will be inspired to build the foundations for the library of the future.

Sunday, September 4, 2011

The Publisher’s Dilemma

The stinging critique of scholarly publishers by George Monbiot in The Guardian and on his blog describes the symptoms accurately, but misses the diagnosis of the problem. As commercial enterprises, publishers have a duty to their shareholders and to their employees to extract as much value as possible out of the information they own. If you think they should not own the scholarly record, blame the academics that signed over copyright. If you think site licenses for scholarly journals are too expensive, blame universities for continuing to buy into the system. Scholarly publishers are neither evil nor dishonest. They are capitalists exploiting a market they have created with eager participation of academia. Academics and librarians have been whining about the cost of scholarly journals for the last twenty years. One more yammering op-ed piece, or a thousand, will not change a dysfunctional scholarly-information market. Only economically meaningful actions can do that. Change the market, and the capitalists will follow.

By making buying decisions on behalf of a community, libraries eliminate competition between journals and create a distorted market. (See my previous blog post “What if Libraries were the Problem?”) The last twenty years were a chaotic period that included inflating and bursting economic bubbles, the worst financial crisis since the Great Depression, several wars, and unprecedented technological advances in the delivery of information. In line with normal expectations under these conditions, most publishers faced an existential crisis. Amazingly, most scholarly publishers thrived. Is it just a coincidence their main revenue source is libraries?

Researchers need access to scholarly research. This legitimate need is conflated with the necessity of buying site licenses. A site license merely extends a rigid paper-era business model that ignores the unlimited flexibility of digital information. As digital-music consumers, students and faculty will not even buy an album of ten songs if they are interested in only one or two. Yet, for this community, their library subscribes to bundles of journals and joins consortia to buy even greater bundles of journals. Pay-per-view systems are expensive and painfully slow, particularly when handled through interlibrary loan. This information-delivery system is out of step with current expectations. The recording industry serves as an example of what happens in these circumstances.

It’s time to face the music. (I could not resist.) For an author, the selection of an appropriate journal and/or publisher is crucially important. For a reader, citations and peer recommendations trump journals’ tables of content, and book reviews trump publishers’ catalogs. I call on publishers to partner with Apple, Amazon, Thomson Reuters (Web of Knowledge), EBSCO, and others to develop convenient and affordable gateways that provide access to any scholarly article or book, from any publisher, whether open or paid access. Such an initiative might eat into site-license revenue, but it just might prevent the system from collapse and provide a platform for sustainable reader-pays models or hybrid models. Publishers have already hedged their bets with sincere, but timid, open-access initiatives. This is just one additional hedge, just in case...

In fact, I suspect many publishers have mixed feelings about site licenses. They generate high revenue, but they also come with high fixed costs. An extensive sales staff keeps track of thousands of libraries and conducts endless negotiations. Middlemen take a bite out of most proceeds. Every special deal must pass through an internal approval process, taking executives’ time and energy. There are serious technical complications in controlling access to journals covered by site licenses, because publishers must cede authentication processes to libraries and because they have no direct relationship with their readership. Publishers are caught in a vicious circle of increasing costs, more difficult negotiations, more cancellations, and increasing prices. I suspect they want a better system, one in which they can offer more services to more users. Yet, they find it impossible to abandon their only significant business model, even one at danger of collapsing under its own weight.

Change will happen only if universities take economically meaningful actions. Stop buying site licenses, let students and faculty decide their personal information requirements, subsidize them where appropriate, and let the free market run its course. (See my previous blog post “Libraries: Paper Tigers in a Digital World”.) In future blog posts, I intend to discuss methods to subsidize information that are more effective than buying site licenses and gradual approaches to get us there. Just as a thought experiment, consider the following: Cancel all site licenses, and use the savings to lower student tuition and raise faculty salaries. How long would it take for alternative distribution channels develop? How would prices evolve? How popular would open access be?

In a web-connected world, the role of libraries as intermediaries between information providers and readers is obsolete. As discussed in “The Fourth Branch Library”, libraries should increase their focus on collecting, managing, and broadcasting the information their communities generate. They should not be concerned with the information their communities consume.