Tuesday, June 27, 2023

The University Library: Closing the Book

At the memorial service, the eulogies expressed deep sadness at the loss of a great institution, once a cornerstone of academia. Everyone blamed The Shelfless Revolution for this sad death. In fact, The University Library had been weak for a long time, and it could not survive any shock.

The Transition from Print to Digital

The transition from print to digital was swift, particularly for scholarly journals. In the 1990s, The University Library, publishers, and the middlemen of the supply chain ramped up their IT infrastructure and adapted their business relationships. The switch to digital was achieved quickly and with little interruption.

Print vs. Digital Lending

Lending books and journals, whether print or digital, is a high overhead enterprise. Since print and digital lending involve different kinds of work, it is obvious that their overheads are quantitatively different. It is less obvious and easily ignored that they are qualitatively different: Print overhead is an investment. Digital overhead is waste.

Consider print lending. The overhead builds a valuable collection housed in community-owned real estate. Barring disasters, the value of the collection and the infrastructure increases over time. The cumulative effect is most obvious in old libraries, which are showcases of accumulated treasure.

Contrast this with digital lending. Digital overhead pays for short-term operational expenses to acquire site licenses whose value is zero when they expire. Even infrastructure spending has only short-term benefits. Computing and networking hardware must be replaced every few years. Site-licensed software to manage the digital lending library, like site licenses for content, have zero value upon expiration.

The digital lending library never accumulates value. It does not contribute anything to future generations. It only provides services here and now. It just needs to perform current responsibilities in a cost effective manner. Evaluating The University Library as a digital lender boiled down to a few simple questions: Was The University Library a cost-effective negotiator and content provider? Did it provide a user friendly service? Could others do better?

The Ineffective Negotiator

While other publishers suffered years of disruption and catastrophic downsizing, scholarly publishers thrived throughout the digital revolution and afterwards. Their profit margins remained sky high. Their new business model was even better than the old. By selling site licenses, they retained control of the content forever. New content provided an immediate revenue stream, and accumulated old content ensured an ever increasing future revenue stream.

The University Library was a predictable customer with a budget that kept pace with inflation. Not satisfied with this, publishers increased their prices at a rate well above inflation. Every so often, The University Library and its funders pressed the panic button. This would start a round of negotiations. Librarians were caught between scholars who wanted to maximize content and administrators who wanted to reduce costs. They negotiated with publishers, each of whom had a monopoly over their island of the literature. Predictably, most negotiations ended with some performative cutbacks by The University Library and a few temporary price concessions by the publishers. Then, the cycle started all over again.

The Market Distorter

The University Library distorted the scholarly communication market, merely by being present in it. Normal economic forces did not apply.

To maintain quality, The University Library acquired content from publishers with a track record. The barriers against new unproven publishers created an oligarchy of publishers that kept prices artificially high.

The University Library also eliminated competition between established publishers. Imagine two competing journals, A and B. A survey among the relevant scholars reveals that 60% prefer A, 40% prefer B, and 20% adamantly insist that they need both A and B. The University Library had no choice but to license both journals for all scholars. By erasing individual preferences, it eliminated competition.

For most textbooks, publishers knew well in advance how many copies The University Library would buy. Given this information, publishers inflated their textbook prices to a level where library sales covered their production costs. All other sales were pure profit from a riskless enterprise.

Providing Access

Under the terms of the site licenses, only authorized users were allowed access, and systematic downloading was prohibited. It was the responsibility of The University Library to protect the content against inappropriate users and use. This work on behalf of publishers was a significant part of digital overhead paid for by The University Library.

Aside from being costly, access controls inconvenienced users. Links to content might stop working without notice because of miscommunication between publishers and library systems. When visiting another campus or when changing jobs, scholars had to adapt to new user interfaces. What had been an asset in the print era, a library built for a local community, had become a liability in the digital era.

Personal Digital Libraries

In their personal lives, scholars subscribed to online newspapers and magazines, to movie and music streaming services, and to various social networks where they posted and consumed content. They easily managed these personal subscriptions. What was so different about scholarly subscriptions? What exactly did The University Library do that they could not do themselves faster and more efficiently?

The University Library no longer accumulated long-term value. It was an ineffective negotiator unable to control costs. It blocked competition from new publishers. It eliminated competition between established publishers. It spent considerable overhead to control access on behalf of publishers while inconveniencing users.

The Shelfless Revolution changed all that. Overnight, scholars were in charge of acquiring their own information needs. During the initial period of chaos, scholars were forced to subscribe to each journal individually. Publishers quickly adapted by bundling books and journals into various packages. Third-party service providers, working with all publishers, offered custom personal libraries. The undergraduate pre-med student who loved mystery novels and the assistant professor in chemistry who hiked wilderness trails no longer shared the same library. Their competing interests no longer needed to be balanced.

Many journals did not survive the suddenly competitive market. With fewer journals, publishing a paper became more competitive. Over time, the typical scholar published fewer papers of higher quality. With fewer opportunities to publish in classical peer-reviewed journals, scholars had an incentive to create and/or try out new forms of scholarly communication.

Sticky Digital Lending

Looking back, it is difficult to grasp how controversial a step it was to switch to personal libraries.

Before The Shelfless Revolution, academic administrators would have committed career suicide if they proposed such an outrageous idea. The backlash would have been harsh and immediate. The opposition message would have written itself: They are outsourcing The University Library to the publishers who have been extorting the scholarly community for years. This slogan would have had the benefit of being true. The counterargument would have been the idea that publishers lose their price-setting power when scholars make their own individual purchasing decisions. While standard capitalist theory, the idea was untested in scholarly communication.

The unlikely university where the faculty approved the outrageous proposal would be mired in endless debate. How should the library subscription budget be divided? How much should go to undergraduate students? to graduate students? to postdocs? to faculty? Should they receive these funds in the form of tuition rebates and salary increases or in the form of university accounts? What would be allowable purchases on such accounts?

No single university could have implemented such a change on its own. Accreditation authorities would have expressed doubts or outright opposition. Publishers would not have changed their business models to accommodate one university. It would have required a large coalition of universities.

It took a catastrophic shock to the system, The Shelfless Revolution, to cut this Gordian knot.

Open Access

Many years before The Shelfless Revolution, a few academics started a project to kickstart a revolution in scholarly communication. As this grew into The Open Access Movement, The University Library was called upon to support some of the infrastructure. Many librarians considered this a promising opportunity for a digital future.

The Open Access Movement coalesced around three goals: Provide free access to scholarly works, Reduce the cost of scholarly communication, and Create innovative forms of scholarly communication.

The first goal was quite successful. Three mechanisms were developed to provide free access to scholarly works: institutional repositories, disciplinary repositories, and open access journals. The University Library was primarily responsible for institutional repositories, which contained author-formatted versions of conventionally published papers, unpublished technical reports, theses and dissertations, data sets, and other scholarly material. Several groups of scholars developed disciplinary repositories to collect works in specific areas of research and make them freely available. Finally, various entities created open access journals, which relied on alternative funding mechanisms and did not charge subscription fees.

The second goal, reducing the cost of scholarly communication, was an utter failure. The Open Access Movement had assumed that making a large part of the scholarly literature available for free would put downward pressure on the price of subscription journals. This assumption was proved wrong. Scholars continued to publish in the same journals. The familiar cycle of site license price increases and performative negotiations continued. Repositories were never a threat. Open-access journals were never competition.

Institutional repositories were particularly valuable for scholarly works that were previously hard to find, such as theses, technical reports, data, etc. For author-formatted papers, they evolved into a costly backup for conventional scholarly publishing. They provided a valuable service for those without access to journals. Most scholars would not risk their research by relying on pre-published unofficial versions, and they required the version of record. Besides, repositories were too cumbersome to use.

Disciplinary repositories were more user friendly, but they needed outside funding. Occasionally, the priorities of the funders would change, and the repository would have to find a new source for funding. Each funding crisis was an opportunity for publishers to buy the repository. To keep the repository under scholars’ control, an interested government agency or philanthropic organization had to step forward every time. To control the repository, publishers had to be lucky just once.

Open access journals just increased the number of scholarly journals. Subscription journals did not suddenly fail because of competing open access journals. At most, subscription journals responded by introducing an open access option. Authors could choose to pay a fee to put their papers outside of the paywall. These authors just trusted publishers not to include these open access papers in the calculation of subscription prices. The publisher’s promise was impossible to verify. This was the level of dysfunction of the scholarly communication market at that time.

The University Library paid ever increasing prices for site licenses and their maintenance. It also paid for the maintenance of institutional repositories. Government and philanthropic funding agencies paid for disciplinary repositories. Scholars used a combination of library funding, research accounts, departmental accounts, and personal resources to pay for open access charges. The scholarly community was spending more than ever on scholarly communication, and no one knew how much.

The Open Access Movement also failed to deliver on its third goal, innovations in scholarly communication. Early stage ventures were too risky for responsible organizations like The University Library. Most ideas failed or remained unexecuted. The Shelfless Revolution changed the environment. Individual scholars in charge of their own budget and confronted with the actual costs of scholarly communication were willing to fund risky but promising experiments.

The Fallout

The Shelfless Revolution killed the digital lending library. This started a chain reaction that affected every service offered by The University Library.

It was immediately obvious that archives had to survive. The print archive was scanned and stored in repositories. In spite of their limitations, repositories became the primary portal into the print archive. Print volumes became museum artifacts virtually untouched by humans. The digital archive mostly contains university-owned scholarly material. Copyright issues created too many obstacles to archive publisher-owned content. New legislative proposals would put the burden on publishers to preserve digital collections of significant cultural, scientific, and/or historical value. This is similar to how we treat protected historical buildings. Publishers will have to store such digital collections in audited standardized archives with government-backed protections against all kinds of calamity.

Print lending died out when most books contained multimedia illustrations and interactive components. Print material of historical importance was moved from the lending library to the nonlending print archive. This killed interlibrary loan services of printed material. Digital interlibrary loans all but disappeared with custom personal libraries.

After losing collection development staff, the reference desk could no longer cover a broad cross-section of scholarly disciplines. It got caught in a downward spiral of decreasing usefulness and declining use.

Long ago, librarians controlled what information was readily available. As technology advanced, their gatekeeping power evaporated. They still nudged publishers towards quality using the power of the purse. This too is now gone. The battle against disinformation seems lost. The profound political differences on where fighting disinformation ends and censorship begins are nowhere near being resolved.

After wreaking havoc on public school libraries, The University Library was braced against attempts at censorship. Before it could engage in that fight, The Shelfless Revolution happened. The switch to personal digital libraries reduced the political heat as universities no longer directly paid for controversial content. Censorship lost the battle, but The University Library lost the war.

Thousands of library projects got caught in the turmoil. Some survived by being moved to other organizations. Most did not. We will never know how much destruction was caused by The Shelfless Revolution.


The University Library made all the right moves. It embraced new technology. It executed the transition from print to digital without major disruption. It was open to new opportunities.

Yet, things went wrong. Open access repositories were supposed to be subversive weapons. Open access journals were supposed to be deadly competitors. Instead, they turned out to be paper tigers, powerless against the oligarchy of the scholarly communication market.

Publishers of newspapers, magazines, music, and video barely survived the disruptive transition to digital. As they rebuilt their businesses from the ruins, they developed business models for the new reality. In contrast, the smooth transition of the scholarly communication market protected existing organizations. It also perpetuated the flaws of old business models, and it let the distorted market grow more dysfunctional every day.

With the benefit of hindsight, the necessary changes could have been implemented more humanely. This was never a realistic option, however. The chaotic and disruptive change of The Shelfless Revolution was inevitable.

#scholcomm #AcademicTwitter #ScienceTwitter #scicomm

Tuesday, June 27, 2017

Forward to the Past

What will academic libraries look like in 2050?

In the early days of the web, librarians had to fight back against the notion that libraries would soon be obsolete. They had solid arguments. Information literacy would become more important. Archiving and managing information would become more difficult. In fact, academic libraries saw an opportunity to increase their role on campus. This opportunity did not materialize. Libraries remain stuck in a horseless-carriage era. They added an IT department. They made digital copies of existing paper services. They continued their existing business relationships with publishers and various intermediaries. They ignored the lessons of the web-connected knowledge economy. Thriving organizations create virtuous cycles of abundance by solving hard problems: better solutions, more users, more revenue, more content, more expertise, and better solutions.

Academic libraries seem incapable of escaping commodity-service purgatory, even when tackling their most ambitious projects. They are eager to manage data archives, but the paper-archive model produces an undifferentiated commodity preservation service. A more appropriate model would be the US National Virtual Astronomical Observatory, where preservation is a happy side effect of extracting maximum research out of existing data. Data archives should be centers of excellence. They focus on a specific field. They are operated by researchers who keep abreast of the latest developments, who adapt data sets to evolving best practices, who make data sets interoperable, who search for inconsistencies between different studies, who detect, flag, and correct errors, and who develop increasingly sophisticated services.

No university can take a center-of-excellence approach to data archiving for every field in which it is active. No archive serving just one university can grow to a sufficiently large scale for excellence. Each field has different needs. How many centers does the field need? How should centers divide the work? What are their long-term missions? Who should manage them? Where are the sustainable sources for funding? Libraries cannot answer these questions. Only researchers have the required expertise and the appropriate academic, professional, and governmental organizations for the decision-making process.

Looking back over the past twenty years, all development of digital library services has been limited by the institutional nature of academic libraries, which receive limited funding to provide limited information and limited services to a limited community. As a consequence, every major component of the digital library is flawed, and none has the foundation to rise to excellence.

General-purpose institutional repositories did not live up to their promise. [Let IR RIP] The center-of-excellence approach of disciplinary repositories, like ArXiv or PubMed, performed better in spite of less stable funding. Geographical distance between repository managers and scholars did not matter. Disciplinary proximity did.

Once upon a time, the catalog was the search engine. Today, it tells whether a printed item is checked out and/or where it is shelved. It is useless for digital information. It is often not even a good option to find information about print material. The catalog, bloated into an integrated library system, wastes resources that should be redirected towards innovation.

Libraries provide access to their site licenses through journal databases, OpenURL servers, and proxy servers. They pay for this expensive system so publishers can perpetuate a business model that eliminates competition, is rife with conflict of interest, and can impose almost unlimited price increases. Scholars should be able to subscribe to personal libraries as they do for their infotainment. [Hitler, Mother Teresa, and Coke] [Where the Puck won't be] [Annealing the Library] [What if Libraries were the Problem?]

In the paper era, the interlibrary-loan department was the gateway to the world's information. Today, it is mostly a buying agent for costly pay-per-view access to papers not covered by site licenses. Personal libraries would eliminate these requests. Digitization and open access can eliminate requests for out-of-copyright material.

Why is there no scholarly app store, where students and faculty can build their own libraries? By replacing site licenses with app-store subsidies, universities would create a competitive marketplace for subscription journals, open-access journals, experimental publishing platforms, and other scholarly services. A library making an institutional decision must be responsible and safe. One scholar deciding where to publish a paper, whether to cancel a journal, or which citation database to use can take a risk with minimal consequence. This new dynamic would kickstart innovation. [Creative Destruction by Social Network]

Libraries seem safe from disruption for now. There are no senior academics sufficiently masochistic to advocate this kind of change. There are none who are powerful enough to implement it. However, libraries that have become middlemen for outsourced mediocre information services are losing advocates within the upper echelons of academic administrations every day. The cost of site licenses, author page charges, and obsolete services are effectively cutting the innovation budget. Unable to attract or retain innovators, stagnating libraries will just muddle through while digital services bleed out. When some services fall apart, others become collateral damage. The print collection will shrink until it is a paper archive of rare and special items locked in a vault.

Postscript: I intended to write about transforming libraries into centers of excellence. This fell apart in the writing. I hesitated. I rewrote. I reconsidered. I started over again.
If I am right, libraries are on the wrong track, and there is no better track. Libraries cannot possibly remain relevant by replicating the same digital services on every campus. There is a legitimate need for advanced information services supported by centers of excellence. However, it is easier to build new centers from scratch than to transform libraries tied up in institutional straitjackets.
Perhaps, paper-era managers moved too slowly and missed the opportunity that seemed so obvious twenty years ago. Perhaps, that opportunity was just a mirage. Whatever the reason, rank-and-file library staff will be the unwitting victims. 
Perhaps, I am wrong. Perhaps, academic libraries will carve out a meaningful digital future. If they do, it will be by taking big risks. The conventional options have been exhausted.

Monday, March 13, 2017

Creative Destruction by Social Network

Academia.edu bills itself as a platform for scholars to share their research. As a start-up, it still provides mostly free services to attract more users. Last year, it tried to make some money by selling recommendations to scholarly papers, but the backlash from academics was swift and harsh. That plan was shelved immediately. [Scholars Criticize Academia.edu Proposal to Charge Authors for Recommendations]

All scholarly publishers sell recommendations, albeit artfully packaged in prestige and respectability. Academia.edu's direct approach seemed downright vulgar. If they plan a radically innovative replacement for journals, they will need a subtler approach. At least, they chose the perfect target for an attempt at creative destruction: Scholarly communication is the only type of publishing not disrupted by the web, it has sky-high profit margins, it is inefficient, and it is dominated by a relatively few well-connected insiders.

If properly designed (and that is a big if), a scholarly network could reduce the cost of all aspects of scholarly communication, even without radical innovation. It could improve the delivery of services to scholars. It could increase (open) access to research. And it could do all of this while scholars retain control over their own output for as long as feasible and/or appropriate. A scholarly network could also increase the operational efficiency of participating universities, research labs, and funding agencies.

All components of such a system already exist in some form:

Personal archive. Academics are already giving away ownership of their published works to publishers. They should not repeat this historic mistake by giving social networks control over their unpublished writings, data, and scholarly correspondence. They should only participate in social networks that make it easy to pack up and leave. Switching or leaving networks should be as simple as downloading an automatically created personal archive of everything the user shared on the network. Upon death or incapacity, the personal archive and perhaps the account itself should transfer to an archival institution designated by the user.

Marketplace for research tools. Every discipline has its own best practices. Every research group has its preferred tools and information resources. All scholars have their idiosyncrasies. To accomplish this level of customization, a universal platform needs an app store, where scholars could obtain apps that provide reference libraries, digital lab notebooks, data analysis and management, data visualization, collaborative content creation, communication, etc.

Marketplace for professional services. Sometimes, others can do the work better, faster, and/or cheaper. Tasks that come to mind are reference services, editorial and publishing services, graphics, video production, prototyping, etc.

Marketplace for institutional services. All organizations manage some business processes that need to be streamlined. They can do this faster and cheaper by sharing their solutions. For example, universities might be interested to buy and/or exchange applications that track PhD theses as they move through the approval process, that automatically deposit faculty works into their institutional repositories, that manage faculty-research review processes, that assist the preparation of grant applications, and that manage the oversight of awarded research grants. Funding agencies might be interested in services to accept and manage grant applications, to manage peer review, and to track post-award research progress.

Certificates. When a journal accepts a paper, it produces an unalterable version of record. This serves as an implied certificate from the publisher. When a university awards a degree, it certifies that the student has attended the university and has completed all degree requirements. Incidentally, it also certifies the faculty status of exam-committee members. Replacing implicit with explicit certificates would enable new services, such as CVs in which every paper, every academic position, and every degree is certified by the appropriate authority.

A scholarly network like this is a specialized business-application exchange, a concept pioneered by the AppExchange of Salesforce.com. Every day, thousands of organizations replace internal business processes with more efficient applications. Over time, this creates a gradual cumulative effect: Business units shrink to their essential core. They disappear or merge with other units. Corporate structures change. Whether or not we are prepared for the consequences of these profound changes, these technology-enabled efficiencies advance unrelentingly across all industries.

These trends will, eventually, affect everyone. While touting the benefits of creative destruction in their journals, the scholarly-communication system successfully protected itself. Like PDF, the current system is a digitally replication the paper system. It ignores the flexibility of digital information, while it preserves the paper-era business processes and revenue streams of publishers, middlemen, and libraries.

Most scholars manage several personal digital libraries for their infotainment. Yet, they are restricted by the usage terms of institutional site licenses for their professional information resources. [Where the Puck won't be] When they share papers with colleagues and students, they put themselves at legal risk. Scholarly networks will not solve every problem. They will have unintended consequences. But, like various open-access projects, they are another opportunity for scholars to reclaim the initiative.

Recently, ResearchGate obtained serious start-up funding. [ResearchGate raises $52.6M for its social research network for scientists] I hope more competitors will follow. Organizations and projects like ArXiv, Figshare, Mendeley, Web of Knowledge, and Zotero have the technical expertise, user communities, and platforms on which to build. There are thousands of organizations that can contribute to marketplaces for research tools, professional services, and institutional services. There are millions of scholars eager for change.

Build it, and they will come... Or they will just use Sci-Hub anyway.

Thursday, November 10, 2016

Simpler Times

“The Library of Congress is worried about the exponential growth of the number of journals. By 2025, their shelves will fill up faster than the speed of light. However, a professor of physics assured them there was no problem: exceeding the speed of light is allowed when no information is transmitted.” 

There are references to variations of this joke as far back as 1971. I first heard it in 1983 or 1984, when I was a graduate student. This is how I learned that some academics were concerned about the state of scholarly communication.

In simpler times, the values of publishing and scholarship were well aligned. The number of slots in respected journals was extremely limited, and fierce competition for those slots raised the quality and substance of papers. As publishers became more efficient and savvy, they created more journals and accepted more papers. Scholars competing in the academic job market were always eager to contribute ever more papers. As scholars published more, hiring committees demanded more. A vicious cycle with no end in sight.

It is doubtful that the typical scholar of 2016 produces more good ideas than the typical scholar of 1956. The former certainly writes a lot more papers than the latter. The publish-or-perish culture reduced the scholarly paper to a least publishable unit. The abundance of brain sneeze is correlated with several other issues. Many reported results cannot be reproduced. [A Joke Syllabus With a Serious Point: Cussing Away the Reproducibility Crisis] A growing number of papers are retracted for fraud and serious errors. [Retraction Watch] Clinical trials are hidden when they do not have the desired results. [AllTrials] Fake journals scam honest-but-naive scholars, embellish the scholarly records of fraudulent scholars, and/or provide the sheen of legitimacy to bad research. [Beall's List]

This race to the bottom was financed by universities through their libraries. Every year, they paid higher subscription prices to more journals. In the 1990s, library budgets spiraled out of control and finally caught the attention of university administrators. This was also when the internet grew exponentially. Scholars who realized the web's potential demanded barrier-free online access to research. The Open Access (OA) movement was born.

Good scholarship is elitist: we expect scholars to gain status and influence for getting it right, particularly when they had to fight against majority opinion. Journals are essential components in the arbitration of this elitism. Yet, even well before the OA movement, it was in the publishers' interest to lower the barriers of publishing: every published paper incentivizes its authors to lobby their institutions in favor of a journal subscription.

Gold OA journals [Directory of Open Access Journals] with business models that do not rely on subscription revenue made the problem worse. They were supposed to kill and replace subscription journals. Instead, subscription journals survived virtually intact. Subscriptions did not disappear. Their impact factors did not fall even after competing Gold OA journals scaled the impact-factor ladder. The net result of Gold OA is more opportunities to publish in high-impact-factor journals.

The Green OA strategy had a plausible path to reverse the growth of journals: libraries might be able to drop some subscriptions if scholars should shift their use to Green OA institutional repositories (IRs). [OAI Registered Data Providers] This outcome now seems unlikely. I previously argued that IRs are obsolete, and that the Green OA strategy needs social networks that create a network effect by serving individual scholars, not their institutions. [Let IR RIP] In an excellent response by Poynder and Lynch [Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository?], we learned how some academic libraries are contracting with Elsevier to manage their IRs. They seem to have given up on Green OA as a strategy to reclaim ownership of the scholarly literature from publishers. They have pivoted their IRs towards a different and equally important goal: increasing the visibility and accessibility of theses, archives, technical papers, lab notebooks, oral histories, etc.

The OA movement tried to accomplish meaningful change of the scholarly-communication system with incremental steps that preserve continuity. I called it isentropic disruption. [Isentropic Disruption] However, scholarly publishers have proven extra-ordinarily immune to any pressure. Just the transition to digital wiped out every other kind of publisher. Scholarly publishers did not even change their business model. They also brushed off reproducibility and fraud scandals. They survived boycotts and editorial-board resignations. They largely ignored Green and Gold OA. Perhaps, the OA movement just needs more time. Perhaps, the OA movement is falling victim to a sunk-cost fallacy.

The current system is financially not sustainable and, worse, is bad for scholarship. Within the shared-governance structure of universities, it is virtually impossible to take disruptive action in the absence of immediate crisis. Universities tend to postpone such decisions until no alternative remains. Then, they inflict maximum pain by implementing unplanned change overnight.

Yet, there are options available right now. With time to plan a transition, there would be much less collateral damage. For example, I proposed replacing library site licenses with personal subscriptions to iTunes-like services for academics. [Where the Puck won't be] Personal digital libraries would be much easier to use than the current site-licensed monstrosities. With scholars as direct customers, the market for these services would be extremely competitive. By configuring and using their personal library, scholars would create market-driven limits on the number of available publication slots. Those willing to consider out-of-the-box crazy approaches can even achieve such limits within an OA context. [Market Capitalism and Open Access]

Academics created the problem. Only academics can solve it. Not libraries. Not publishers. Digital journals are already filling the virtual shelves at the speed of light... The punch line of the joke is in sight.

Sunday, July 24, 2016


The Institutional Repository (IR) is obsolete. Its flawed foundation cannot be repaired. The IR must be phased out and replaced with viable alternatives.

Lack of enthusiasm. The number of IRs has grown because of a few motivated faculty and administrators. After twenty years of promoting IRs, there is no grassroots support. Scholars submit papers to an IR because they have to, not because they want to. Too few IR users become recruiters. There is no network effect.

Local management. At most institutions, the IR is created to support an Open Access (OA) mandate. As part of the necessary approval and consensus-building processes, various administrative and faculty committees impose local rules and exemptions. After launch, the IR is managed by an academic library accountable only to current faculty. Local concerns dominate those of the worldwide community of potential users.

Poor usability. Access-, copy-, reuse, and data-mining rights are overly restrictive or left unstated. Content consists of a mishmash of formats. The resulting federation of IRs is useless for serious research. Even the most basic queries cannot be implemented reliably. National IRs (like PubMed) and disciplinary repositories (like ArXiv) eliminate local idiosyncrasies and are far more useful. IRs were supposed to duplicate their success, while spreading the financial burden and immunizing the system against adverse political decisions. The sacrifice in usability is too high a price to pay.

Low use. Digital information improves with use. Unused, it remains stuck in obsolete formats. After extended non-use, recovering information requires a digital version of archaeology. Every user of a digital archive participates in its crowd-sourced quality control. Every access is an opportunity to discover, report, and repair problems. To succeed at its archival mission, a digital archive must be an essential research tool that all scholars need every day.

High cost. Once upon a time, the IR was a cheap experiment. Today's professionally managed IR costs far too much for its limited functionality.

Fragmented control. Over the course of their careers, most scholars are affiliated with several institutions. It is unreasonable to distribute a scholar's work according to where it was produced. At best, it is inconvenient to maintain multiple accounts. At worst, it creates long-term chaos to comply with different and conflicting policies of institutions with which one is no longer affiliated. In a cloud-computing world, scholars should manage their own personal repositories, and archives should manage the repositories of scholars no longer willing or able.

Social interaction. Research is a social endeavor. [Creating Knowledge] Let us be inspired by the titans of the network effect: Facebook, Twitter, Instagram, Snapchat, etc. Encourage scholars to build their personal repository in a social-network context. Disciplinary repositories like ArXiv and SSRN can expand their social-network services. Social networks like Academia.edu, Mendeley, Zotero, and Figshare have the capability to implement and/or expand IR-like services.

Distorted market. Academic libraries are unlikely to spend money on services that compete with IRs. Ventures that bypass libraries must offer their services for free. In desperation, some have pursued (and dropped) controversial alternative methods of monetizing their services. [Scholars Criticize Academia.edu Proposal to Charge Authors for Recommendations]

Many academics are suspicious of any commercial interests in scholarly communication. Blaming publishers for the scholarly-journal crisis, they conveniently forget their own contribution to the dysfunction. Willing academics, with enthusiastic help from publishers, launch ever more journals.[Hitler, Mother Teresa, and Coke] They also pressure libraries to site license "their" journals, giving publishers a strong negotiation position. Without library-paid site licenses, academics would have flocked to alternative publishing models, and publishers would have embraced alternative subscription plans like an iTunes for scholarly papers. [Where the Puck won't be] [What if Libraries were the Problem?] Universities and/or governments must change how they fund scholarly communication to eliminate the marketplace distortions that preserve the status quo, protect publishers, and stifle innovation. In a truly open market of individual subscriptions, start-up ventures would thrive.

I believed in IRs. I advocated for IRs. After participating in the First Meeting of the Open Archives Initiative (1999, Santa Fe, New Mexico), I started a project that would evolve into Caltech CODA. [The Birth of the Open Access Movement] We encouraged, then required, electronic theses. We captured preprints and historical documents. [E-Journals: Do-It-Yourself Publishing]

I was convinced IRs would disrupt scholarly communication. I was wrong. All High Energy Physics (HEP) papers are available in ArXiv. Being a disciplinary repository, ArXiv functions like an idealized version of a federation of IRs. It changed scholarly communication for the better by speeding up dissemination and improving social interaction, but it did not disrupt. On the contrary, HEP scholars organized what amounted to an an authoritarian take-over of the HEP scholarly-journal marketplace. While ensuring open access of all HEP research, this take-over also cemented the status quo for the foreseeable future. [A Physics Experiment] 

The IR is not equivalent with Green Open Access. The IR is only one possible implementation of Green OA. With the IR at a dead end, Green OA must pivot towards alternatives that have viable paths forward: personal repositories, disciplinary repositories, social networks, and innovative combinations of all three.

*Edited 7/26/2016 to correct formatting errors.

Tuesday, January 20, 2015

Creating Knowledge

Every scholar is part wizard, part muggle.

As wizards, scholars are lone geniuses in search of original insight. They question everything. They ignore conventional wisdom and tradition. They experiment.

As muggles, scholars are subject to the normal rules of power and influence. They are limited by common sense and group think. They are ambitious. They promote and market their ideas. They have the perfect elevator pitch ready for every potential funder of research. They connect their research to hot fields. They climb the social ladder in professional societies. As muggles, they know that the lone voice is probably wrong.

The sad fate of the wizards is that their discoveries, no matter how significant, are not knowledge until accepted by the muggles.

Einstein stood on the shoulder of giants: he needed all of the science that preceded him. First, he needed it to develop special relativity theory. Then, he needed it as a starting point from where to lead the physics community on an intellectual journey. Without that base of prior shared knowledge, they would not have followed.

As a social construct, knowledge moves at a speed limited by the wisdom of the crowd. The real process by which scholarly research moves from the world of the wizard into the world of muggles is murky, complicated, longwinded, and ambiguous. Despising these properties, muggles created a clear and straightforward substitute: the peer-review process.

When only a small number of distinguished scholarly bodies published journals, publishing signaled that the research was widely accepted as valid and important. Today, thousands of scholarly groups and commercial entities publish as many as 28,000 scholarly journals, and publishing no longer functions as a serious proxy for wide acceptance.

Most journals are created when some researchers believe established journals ignore or do not sufficiently support a new field of inquiry. New journals give new fields the time and space to grow and to prove themselves. They also reduce the size of the referee pool. They avoid generalists critical of the new field. Gradually, peer review becomes a process in which likeminded colleagues distribute stamps of approval to each other.

Publishers thrive by amplifying scholarly fractures and by creating scholarly islands. As discussed in previous blog posts, normal free-market principles do not apply to the scholarly-journal market. [What if Libraries were the Problem] Without an effective method to kill off journals, their number and size keep increasing. Unfortunately, the damage to universities and to scholarship far exceeds the cost of journals.

Niche fields use their success in the scholarly-communication market to acquire departmental status, making the scholarly fracture permanent. The economic crisis may have stopped or reversed the trend of ever more specialized, smaller, university departments, but the increased cost structure inherited from the boom years lingers. Creating a new department should be an exceptional event. Universities went overboard, influenced and pressured by commercial interests.

As a quality-control system, the scholarly-communication system should be conservative and skeptical. As a communication system, it should give exposure to new ideas and give them a chance to develop. By simultaneously pursuing two contradictory goals, scholarly journals have become ineffective at both. They are too specialized to be credible validators. They are too slow and bureaucratic for growing new ideas.

Journals survive because universities use them for assessment. Not surprisingly, scholarly papers solidly reside in muggle world. Too many papers are written by Very Serious Intellectuals (VSIs) for VSIs. Too many papers are written in self-aggrandizing pompous prose, loaded with countless footnotes. Too many papers are written to flatter VSIs with too many irrelevant references. Too many papers are written to puff up a tidbit of incremental information. Too many papers are written. Too few papers detail negative results or offer serious critique, because that only makes enemies.

When given the opportunity, scholarly authors produce awe inspiring presentations. The edutainment universe of TED Talks may not be an appropriate forum for the daily grunt work of the scholar, but is it really too much to ask that the scholarly-communication system let the wizardry shine through?

Universities claim to be society's engines of innovation. They have preached the virtues of creative destruction brought on by technological innovation. Yet, the wizards of the ivory tower resist minor change as much as the muggles of the world.

Open Access is catalyzing reform on the business side of the scholarly-communication system. Will Open Access be enough to push universities into experimentation on the scholarly side?

That is an Open question.

Wednesday, October 1, 2014

The Metadata Bubble

In an ideal world, scholars deposit their papers in an Open Access repository, because they know it will advance their research, support their students, and promote a knowledge-based society. A few disciplinary repositories, like ArXiv, have shown that it is possible to close the virtuous cycle where scholars reinforce each other's Open Access habits. In these communities, no authority is needed to compel participation.

Institutional repositories have yet to build similar broad-based enthusiastic constituencies. Yet, many Open Access advocates believe that the decentralized approach of institutional repositories creates a more scalable system with a higher probability for long-term survival. The campaign to enact institutional deposit mandates hopes to jump start an Open Access virtuous cycle for all scholarly disciplines and all institutions. The risk of such a campaign is that it may backfire if scholars should experience Open Access as an obligation with few benefits. For long-term success, most scholars must perceive their compelled participation in Open Access as a positive experience.

It is, therefore, crucial that repositories become essential scholarly resources, not dark archives to be opened only in case of emergency. The Open Archives Initiative (OAI) repository design provided what was thought to be the necessary architecture. Unfortunately, we are far from realizing its anticipated potential. The Protocol for Metadata Harvesting (OAI-PMH) allows service providers to harvest any metadata in any format, but most repositories provide only minimal Dublin Core metadata, a format in which most fields are optional and several are ambiguous. Extremely few repositories enable Object Reuse and Exchange (OAI-ORE), which allows for complex inter-repository services through the exchange of multimedia objects, not just metadata about them. As a result, OAI-enabled services are largely limited to the most elementary kind of searches, and even these often deliver unsatisfactory results, like metadata-only placeholder records for works restricted by copyright or other considerations.

In a few years, we will entrust our life and limb to self-driving cars. Their programs have just milliseconds to compute critical decisions based on information that is imprecise, approximate, incomplete, and inconsistent: all maps are outdated by the time they are produced, GPS signals may disappear, radar and/or lidar signatures are ambiguous, and video or images provide obstructed views in constantly changing environments. When we can extract so much actionable information from such "dirty" information, it seems quaint to obsess about metadata.

Databases automatically record user interactions. Users fill out forms and effectively crowdsource metadata. Expert systems can extract, from any document in any format and in any language, author information, citations, keywords, DNA sequences, chemical formulas, mathematical equations, etc. Other expert systems have growing capabilities to analyze sound, image, and video. Technology is evaporating the pool of problems that require human intervention at the transaction level. The opportunities for human metadata experts to add value are disappearing fast.

The metadata approach is obsolete for an even more fundamental reason. Metadata are the digital extension of a catalog-centered paper-based information system. In this kind of system, today's experts organize today's information so tomorrow's users may solve tomorrow's problems efficiently. This worked well when technology changed slowly, when experts could predict who the future users would be, what kind of problems they would like to solve, and what kind of tools they would have at their disposal. These conditions no longer apply.

When digital storage is cheap, why implement expensive selection processes for an archive? When search technology does not care whether information is excruciatingly organized or piled in a heap, why spend countless hours organizing and curating content? Why agonize over potential future problems with unreadable file formats? Preserve all the information about current software and standards, and start developing the expert systems to unscramble any historical format. Think of any information-management task. How reasonable is the proposition that this task will require direct human intervention in two years? In five years? In ten years?

For content, more is more. We must acquire as much content as possible, and store it safely.

For content administration, less is more. Expert systems give us the freedom to do the bare minimum and to make a mess of it. While we must make content useful and enable as many services as possible, it is no longer feasible to accomplish that by designing systems for an anticipated future. Instead, we must create the conditions that attract developers of expert systems. This is remarkably simple: Make the full text and all data available with no strings attached.

Real Open Access.