Showing posts with label technology. Show all posts
Showing posts with label technology. Show all posts

Tuesday, June 27, 2023

The University Library: Closing the Book

At the memorial service, the eulogies expressed deep sadness at the loss of a great institution, once a cornerstone of academia. Everyone blamed The Shelfless Revolution for this sad death. In fact, The University Library had been weak for a long time, and it could not survive any shock.

The Transition from Print to Digital

The transition from print to digital was swift, particularly for scholarly journals. In the 1990s, The University Library, publishers, and the middlemen of the supply chain ramped up their IT infrastructure and adapted their business relationships. The switch to digital was achieved quickly and with little interruption.

Print vs. Digital Lending

Lending books and journals, whether print or digital, is a high overhead enterprise. Since print and digital lending involve different kinds of work, it is obvious that their overheads are quantitatively different. It is less obvious and easily ignored that they are qualitatively different: Print overhead is an investment. Digital overhead is waste.

Consider print lending. The overhead builds a valuable collection housed in community-owned real estate. Barring disasters, the value of the collection and the infrastructure increases over time. The cumulative effect is most obvious in old libraries, which are showcases of accumulated treasure.

Contrast this with digital lending. Digital overhead pays for short-term operational expenses to acquire site licenses whose value is zero when they expire. Even infrastructure spending has only short-term benefits. Computing and networking hardware must be replaced every few years. Site-licensed software to manage the digital lending library, like site licenses for content, have zero value upon expiration.

The digital lending library never accumulates value. It does not contribute anything to future generations. It only provides services here and now. It just needs to perform current responsibilities in a cost effective manner. Evaluating The University Library as a digital lender boiled down to a few simple questions: Was The University Library a cost-effective negotiator and content provider? Did it provide a user friendly service? Could others do better?

The Ineffective Negotiator

While other publishers suffered years of disruption and catastrophic downsizing, scholarly publishers thrived throughout the digital revolution and afterwards. Their profit margins remained sky high. Their new business model was even better than the old. By selling site licenses, they retained control of the content forever. New content provided an immediate revenue stream, and accumulated old content ensured an ever increasing future revenue stream.

The University Library was a predictable customer with a budget that kept pace with inflation. Not satisfied with this, publishers increased their prices at a rate well above inflation. Every so often, The University Library and its funders pressed the panic button. This would start a round of negotiations. Librarians were caught between scholars who wanted to maximize content and administrators who wanted to reduce costs. They negotiated with publishers, each of whom had a monopoly over their island of the literature. Predictably, most negotiations ended with some performative cutbacks by The University Library and a few temporary price concessions by the publishers. Then, the cycle started all over again.

The Market Distorter

The University Library distorted the scholarly communication market, merely by being present in it. Normal economic forces did not apply.

To maintain quality, The University Library acquired content from publishers with a track record. The barriers against new unproven publishers created an oligarchy of publishers that kept prices artificially high.

The University Library also eliminated competition between established publishers. Imagine two competing journals, A and B. A survey among the relevant scholars reveals that 60% prefer A, 40% prefer B, and 20% adamantly insist that they need both A and B. The University Library had no choice but to license both journals for all scholars. By erasing individual preferences, it eliminated competition.

For most textbooks, publishers knew well in advance how many copies The University Library would buy. Given this information, publishers inflated their textbook prices to a level where library sales covered their production costs. All other sales were pure profit from a riskless enterprise.

Providing Access

Under the terms of the site licenses, only authorized users were allowed access, and systematic downloading was prohibited. It was the responsibility of The University Library to protect the content against inappropriate users and use. This work on behalf of publishers was a significant part of digital overhead paid for by The University Library.

Aside from being costly, access controls inconvenienced users. Links to content might stop working without notice because of miscommunication between publishers and library systems. When visiting another campus or when changing jobs, scholars had to adapt to new user interfaces. What had been an asset in the print era, a library built for a local community, had become a liability in the digital era.

Personal Digital Libraries

In their personal lives, scholars subscribed to online newspapers and magazines, to movie and music streaming services, and to various social networks where they posted and consumed content. They easily managed these personal subscriptions. What was so different about scholarly subscriptions? What exactly did The University Library do that they could not do themselves faster and more efficiently?

The University Library no longer accumulated long-term value. It was an ineffective negotiator unable to control costs. It blocked competition from new publishers. It eliminated competition between established publishers. It spent considerable overhead to control access on behalf of publishers while inconveniencing users.

The Shelfless Revolution changed all that. Overnight, scholars were in charge of acquiring their own information needs. During the initial period of chaos, scholars were forced to subscribe to each journal individually. Publishers quickly adapted by bundling books and journals into various packages. Third-party service providers, working with all publishers, offered custom personal libraries. The undergraduate pre-med student who loved mystery novels and the assistant professor in chemistry who hiked wilderness trails no longer shared the same library. Their competing interests no longer needed to be balanced.

Many journals did not survive the suddenly competitive market. With fewer journals, publishing a paper became more competitive. Over time, the typical scholar published fewer papers of higher quality. With fewer opportunities to publish in classical peer-reviewed journals, scholars had an incentive to create and/or try out new forms of scholarly communication.

Sticky Digital Lending

Looking back, it is difficult to grasp how controversial a step it was to switch to personal libraries.

Before The Shelfless Revolution, academic administrators would have committed career suicide if they proposed such an outrageous idea. The backlash would have been harsh and immediate. The opposition message would have written itself: They are outsourcing The University Library to the publishers who have been extorting the scholarly community for years. This slogan would have had the benefit of being true. The counterargument would have been the idea that publishers lose their price-setting power when scholars make their own individual purchasing decisions. While standard capitalist theory, the idea was untested in scholarly communication.

The unlikely university where the faculty approved the outrageous proposal would be mired in endless debate. How should the library subscription budget be divided? How much should go to undergraduate students? to graduate students? to postdocs? to faculty? Should they receive these funds in the form of tuition rebates and salary increases or in the form of university accounts? What would be allowable purchases on such accounts?

No single university could have implemented such a change on its own. Accreditation authorities would have expressed doubts or outright opposition. Publishers would not have changed their business models to accommodate one university. It would have required a large coalition of universities.

It took a catastrophic shock to the system, The Shelfless Revolution, to cut this Gordian knot.


Open Access

Many years before The Shelfless Revolution, a few academics started a project to kickstart a revolution in scholarly communication. As this grew into The Open Access Movement, The University Library was called upon to support some of the infrastructure. Many librarians considered this a promising opportunity for a digital future.

The Open Access Movement coalesced around three goals: Provide free access to scholarly works, Reduce the cost of scholarly communication, and Create innovative forms of scholarly communication.

The first goal was quite successful. Three mechanisms were developed to provide free access to scholarly works: institutional repositories, disciplinary repositories, and open access journals. The University Library was primarily responsible for institutional repositories, which contained author-formatted versions of conventionally published papers, unpublished technical reports, theses and dissertations, data sets, and other scholarly material. Several groups of scholars developed disciplinary repositories to collect works in specific areas of research and make them freely available. Finally, various entities created open access journals, which relied on alternative funding mechanisms and did not charge subscription fees.

The second goal, reducing the cost of scholarly communication, was an utter failure. The Open Access Movement had assumed that making a large part of the scholarly literature available for free would put downward pressure on the price of subscription journals. This assumption was proved wrong. Scholars continued to publish in the same journals. The familiar cycle of site license price increases and performative negotiations continued. Repositories were never a threat. Open-access journals were never competition.

Institutional repositories were particularly valuable for scholarly works that were previously hard to find, such as theses, technical reports, data, etc. For author-formatted papers, they evolved into a costly backup for conventional scholarly publishing. They provided a valuable service for those without access to journals. Most scholars would not risk their research by relying on pre-published unofficial versions, and they required the version of record. Besides, repositories were too cumbersome to use.

Disciplinary repositories were more user friendly, but they needed outside funding. Occasionally, the priorities of the funders would change, and the repository would have to find a new source for funding. Each funding crisis was an opportunity for publishers to buy the repository. To keep the repository under scholars’ control, an interested government agency or philanthropic organization had to step forward every time. To control the repository, publishers had to be lucky just once.

Open access journals just increased the number of scholarly journals. Subscription journals did not suddenly fail because of competing open access journals. At most, subscription journals responded by introducing an open access option. Authors could choose to pay a fee to put their papers outside of the paywall. These authors just trusted publishers not to include these open access papers in the calculation of subscription prices. The publisher’s promise was impossible to verify. This was the level of dysfunction of the scholarly communication market at that time.

The University Library paid ever increasing prices for site licenses and their maintenance. It also paid for the maintenance of institutional repositories. Government and philanthropic funding agencies paid for disciplinary repositories. Scholars used a combination of library funding, research accounts, departmental accounts, and personal resources to pay for open access charges. The scholarly community was spending more than ever on scholarly communication, and no one knew how much.

The Open Access Movement also failed to deliver on its third goal, innovations in scholarly communication. Early stage ventures were too risky for responsible organizations like The University Library. Most ideas failed or remained unexecuted. The Shelfless Revolution changed the environment. Individual scholars in charge of their own budget and confronted with the actual costs of scholarly communication were willing to fund risky but promising experiments.


The Fallout

The Shelfless Revolution killed the digital lending library. This started a chain reaction that affected every service offered by The University Library.

It was immediately obvious that archives had to survive. The print archive was scanned and stored in repositories. In spite of their limitations, repositories became the primary portal into the print archive. Print volumes became museum artifacts virtually untouched by humans. The digital archive mostly contains university-owned scholarly material. Copyright issues created too many obstacles to archive publisher-owned content. New legislative proposals would put the burden on publishers to preserve digital collections of significant cultural, scientific, and/or historical value. This is similar to how we treat protected historical buildings. Publishers will have to store such digital collections in audited standardized archives with government-backed protections against all kinds of calamity.

Print lending died out when most books contained multimedia illustrations and interactive components. Print material of historical importance was moved from the lending library to the nonlending print archive. This killed interlibrary loan services of printed material. Digital interlibrary loans all but disappeared with custom personal libraries.

After losing collection development staff, the reference desk could no longer cover a broad cross-section of scholarly disciplines. It got caught in a downward spiral of decreasing usefulness and declining use.

Long ago, librarians controlled what information was readily available. As technology advanced, their gatekeeping power evaporated. They still nudged publishers towards quality using the power of the purse. This too is now gone. The battle against disinformation seems lost. The profound political differences on where fighting disinformation ends and censorship begins are nowhere near being resolved.

After wreaking havoc on public school libraries, The University Library was braced against attempts at censorship. Before it could engage in that fight, The Shelfless Revolution happened. The switch to personal digital libraries reduced the political heat as universities no longer directly paid for controversial content. Censorship lost the battle, but The University Library lost the war.

Thousands of library projects got caught in the turmoil. Some survived by being moved to other organizations. Most did not. We will never know how much destruction was caused by The Shelfless Revolution.

Conclusion

The University Library made all the right moves. It embraced new technology. It executed the transition from print to digital without major disruption. It was open to new opportunities.

Yet, things went wrong. Open access repositories were supposed to be subversive weapons. Open access journals were supposed to be deadly competitors. Instead, they turned out to be paper tigers, powerless against the oligarchy of the scholarly communication market.

Publishers of newspapers, magazines, music, and video barely survived the disruptive transition to digital. As they rebuilt their businesses from the ruins, they developed business models for the new reality. In contrast, the smooth transition of the scholarly communication market protected existing organizations. It also perpetuated the flaws of old business models, and it let the distorted market grow more dysfunctional every day.

With the benefit of hindsight, the necessary changes could have been implemented more humanely. This was never a realistic option, however. The chaotic and disruptive change of The Shelfless Revolution was inevitable.





#scholcomm #AcademicTwitter #ScienceTwitter #scicomm

Tuesday, June 27, 2017

Forward to the Past

What will academic libraries look like in 2050?

In the early days of the web, librarians had to fight back against the notion that libraries would soon be obsolete. They had solid arguments. Information literacy would become more important. Archiving and managing information would become more difficult. In fact, academic libraries saw an opportunity to increase their role on campus. This opportunity did not materialize. Libraries remain stuck in a horseless-carriage era. They added an IT department. They made digital copies of existing paper services. They continued their existing business relationships with publishers and various intermediaries. They ignored the lessons of the web-connected knowledge economy. Thriving organizations create virtuous cycles of abundance by solving hard problems: better solutions, more users, more revenue, more content, more expertise, and better solutions.

Academic libraries seem incapable of escaping commodity-service purgatory, even when tackling their most ambitious projects. They are eager to manage data archives, but the paper-archive model produces an undifferentiated commodity preservation service. A more appropriate model would be the US National Virtual Astronomical Observatory, where preservation is a happy side effect of extracting maximum research out of existing data. Data archives should be centers of excellence. They focus on a specific field. They are operated by researchers who keep abreast of the latest developments, who adapt data sets to evolving best practices, who make data sets interoperable, who search for inconsistencies between different studies, who detect, flag, and correct errors, and who develop increasingly sophisticated services.

No university can take a center-of-excellence approach to data archiving for every field in which it is active. No archive serving just one university can grow to a sufficiently large scale for excellence. Each field has different needs. How many centers does the field need? How should centers divide the work? What are their long-term missions? Who should manage them? Where are the sustainable sources for funding? Libraries cannot answer these questions. Only researchers have the required expertise and the appropriate academic, professional, and governmental organizations for the decision-making process.

Looking back over the past twenty years, all development of digital library services has been limited by the institutional nature of academic libraries, which receive limited funding to provide limited information and limited services to a limited community. As a consequence, every major component of the digital library is flawed, and none has the foundation to rise to excellence.

General-purpose institutional repositories did not live up to their promise. [Let IR RIP] The center-of-excellence approach of disciplinary repositories, like ArXiv or PubMed, performed better in spite of less stable funding. Geographical distance between repository managers and scholars did not matter. Disciplinary proximity did.

Once upon a time, the catalog was the search engine. Today, it tells whether a printed item is checked out and/or where it is shelved. It is useless for digital information. It is often not even a good option to find information about print material. The catalog, bloated into an integrated library system, wastes resources that should be redirected towards innovation.

Libraries provide access to their site licenses through journal databases, OpenURL servers, and proxy servers. They pay for this expensive system so publishers can perpetuate a business model that eliminates competition, is rife with conflict of interest, and can impose almost unlimited price increases. Scholars should be able to subscribe to personal libraries as they do for their infotainment. [Hitler, Mother Teresa, and Coke] [Where the Puck won't be] [Annealing the Library] [What if Libraries were the Problem?]

In the paper era, the interlibrary-loan department was the gateway to the world's information. Today, it is mostly a buying agent for costly pay-per-view access to papers not covered by site licenses. Personal libraries would eliminate these requests. Digitization and open access can eliminate requests for out-of-copyright material.

Why is there no scholarly app store, where students and faculty can build their own libraries? By replacing site licenses with app-store subsidies, universities would create a competitive marketplace for subscription journals, open-access journals, experimental publishing platforms, and other scholarly services. A library making an institutional decision must be responsible and safe. One scholar deciding where to publish a paper, whether to cancel a journal, or which citation database to use can take a risk with minimal consequence. This new dynamic would kickstart innovation. [Creative Destruction by Social Network]

Libraries seem safe from disruption for now. There are no senior academics sufficiently masochistic to advocate this kind of change. There are none who are powerful enough to implement it. However, libraries that have become middlemen for outsourced mediocre information services are losing advocates within the upper echelons of academic administrations every day. The cost of site licenses, author page charges, and obsolete services are effectively cutting the innovation budget. Unable to attract or retain innovators, stagnating libraries will just muddle through while digital services bleed out. When some services fall apart, others become collateral damage. The print collection will shrink until it is a paper archive of rare and special items locked in a vault.

Postscript: I intended to write about transforming libraries into centers of excellence. This fell apart in the writing. I hesitated. I rewrote. I reconsidered. I started over again.
If I am right, libraries are on the wrong track, and there is no better track. Libraries cannot possibly remain relevant by replicating the same digital services on every campus. There is a legitimate need for advanced information services supported by centers of excellence. However, it is easier to build new centers from scratch than to transform libraries tied up in institutional straitjackets.
Perhaps, paper-era managers moved too slowly and missed the opportunity that seemed so obvious twenty years ago. Perhaps, that opportunity was just a mirage. Whatever the reason, rank-and-file library staff will be the unwitting victims. 
Perhaps, I am wrong. Perhaps, academic libraries will carve out a meaningful digital future. If they do, it will be by taking big risks. The conventional options have been exhausted.

Monday, March 13, 2017

Creative Destruction by Social Network

Academia.edu bills itself as a platform for scholars to share their research. As a start-up, it still provides mostly free services to attract more users. Last year, it tried to make some money by selling recommendations to scholarly papers, but the backlash from academics was swift and harsh. That plan was shelved immediately. [Scholars Criticize Academia.edu Proposal to Charge Authors for Recommendations]

All scholarly publishers sell recommendations, albeit artfully packaged in prestige and respectability. Academia.edu's direct approach seemed downright vulgar. If they plan a radically innovative replacement for journals, they will need a subtler approach. At least, they chose the perfect target for an attempt at creative destruction: Scholarly communication is the only type of publishing not disrupted by the web, it has sky-high profit margins, it is inefficient, and it is dominated by a relatively few well-connected insiders.

If properly designed (and that is a big if), a scholarly network could reduce the cost of all aspects of scholarly communication, even without radical innovation. It could improve the delivery of services to scholars. It could increase (open) access to research. And it could do all of this while scholars retain control over their own output for as long as feasible and/or appropriate. A scholarly network could also increase the operational efficiency of participating universities, research labs, and funding agencies.

All components of such a system already exist in some form:

Personal archive. Academics are already giving away ownership of their published works to publishers. They should not repeat this historic mistake by giving social networks control over their unpublished writings, data, and scholarly correspondence. They should only participate in social networks that make it easy to pack up and leave. Switching or leaving networks should be as simple as downloading an automatically created personal archive of everything the user shared on the network. Upon death or incapacity, the personal archive and perhaps the account itself should transfer to an archival institution designated by the user.

Marketplace for research tools. Every discipline has its own best practices. Every research group has its preferred tools and information resources. All scholars have their idiosyncrasies. To accomplish this level of customization, a universal platform needs an app store, where scholars could obtain apps that provide reference libraries, digital lab notebooks, data analysis and management, data visualization, collaborative content creation, communication, etc.

Marketplace for professional services. Sometimes, others can do the work better, faster, and/or cheaper. Tasks that come to mind are reference services, editorial and publishing services, graphics, video production, prototyping, etc.

Marketplace for institutional services. All organizations manage some business processes that need to be streamlined. They can do this faster and cheaper by sharing their solutions. For example, universities might be interested to buy and/or exchange applications that track PhD theses as they move through the approval process, that automatically deposit faculty works into their institutional repositories, that manage faculty-research review processes, that assist the preparation of grant applications, and that manage the oversight of awarded research grants. Funding agencies might be interested in services to accept and manage grant applications, to manage peer review, and to track post-award research progress.

Certificates. When a journal accepts a paper, it produces an unalterable version of record. This serves as an implied certificate from the publisher. When a university awards a degree, it certifies that the student has attended the university and has completed all degree requirements. Incidentally, it also certifies the faculty status of exam-committee members. Replacing implicit with explicit certificates would enable new services, such as CVs in which every paper, every academic position, and every degree is certified by the appropriate authority.

A scholarly network like this is a specialized business-application exchange, a concept pioneered by the AppExchange of Salesforce.com. Every day, thousands of organizations replace internal business processes with more efficient applications. Over time, this creates a gradual cumulative effect: Business units shrink to their essential core. They disappear or merge with other units. Corporate structures change. Whether or not we are prepared for the consequences of these profound changes, these technology-enabled efficiencies advance unrelentingly across all industries.

These trends will, eventually, affect everyone. While touting the benefits of creative destruction in their journals, the scholarly-communication system successfully protected itself. Like PDF, the current system is a digitally replication the paper system. It ignores the flexibility of digital information, while it preserves the paper-era business processes and revenue streams of publishers, middlemen, and libraries.

Most scholars manage several personal digital libraries for their infotainment. Yet, they are restricted by the usage terms of institutional site licenses for their professional information resources. [Where the Puck won't be] When they share papers with colleagues and students, they put themselves at legal risk. Scholarly networks will not solve every problem. They will have unintended consequences. But, like various open-access projects, they are another opportunity for scholars to reclaim the initiative.

Recently, ResearchGate obtained serious start-up funding. [ResearchGate raises $52.6M for its social research network for scientists] I hope more competitors will follow. Organizations and projects like ArXiv, Figshare, Mendeley, Web of Knowledge, and Zotero have the technical expertise, user communities, and platforms on which to build. There are thousands of organizations that can contribute to marketplaces for research tools, professional services, and institutional services. There are millions of scholars eager for change.

Build it, and they will come... Or they will just use Sci-Hub anyway.

Tuesday, January 20, 2015

Creating Knowledge

Every scholar is part wizard, part muggle.

As wizards, scholars are lone geniuses in search of original insight. They question everything. They ignore conventional wisdom and tradition. They experiment.

As muggles, scholars are subject to the normal rules of power and influence. They are limited by common sense and group think. They are ambitious. They promote and market their ideas. They have the perfect elevator pitch ready for every potential funder of research. They connect their research to hot fields. They climb the social ladder in professional societies. As muggles, they know that the lone voice is probably wrong.

The sad fate of the wizards is that their discoveries, no matter how significant, are not knowledge until accepted by the muggles.

Einstein stood on the shoulder of giants: he needed all of the science that preceded him. First, he needed it to develop special relativity theory. Then, he needed it as a starting point from where to lead the physics community on an intellectual journey. Without that base of prior shared knowledge, they would not have followed.

As a social construct, knowledge moves at a speed limited by the wisdom of the crowd. The real process by which scholarly research moves from the world of the wizard into the world of muggles is murky, complicated, longwinded, and ambiguous. Despising these properties, muggles created a clear and straightforward substitute: the peer-review process.

When only a small number of distinguished scholarly bodies published journals, publishing signaled that the research was widely accepted as valid and important. Today, thousands of scholarly groups and commercial entities publish as many as 28,000 scholarly journals, and publishing no longer functions as a serious proxy for wide acceptance.

Most journals are created when some researchers believe established journals ignore or do not sufficiently support a new field of inquiry. New journals give new fields the time and space to grow and to prove themselves. They also reduce the size of the referee pool. They avoid generalists critical of the new field. Gradually, peer review becomes a process in which likeminded colleagues distribute stamps of approval to each other.

Publishers thrive by amplifying scholarly fractures and by creating scholarly islands. As discussed in previous blog posts, normal free-market principles do not apply to the scholarly-journal market. [What if Libraries were the Problem] Without an effective method to kill off journals, their number and size keep increasing. Unfortunately, the damage to universities and to scholarship far exceeds the cost of journals.

Niche fields use their success in the scholarly-communication market to acquire departmental status, making the scholarly fracture permanent. The economic crisis may have stopped or reversed the trend of ever more specialized, smaller, university departments, but the increased cost structure inherited from the boom years lingers. Creating a new department should be an exceptional event. Universities went overboard, influenced and pressured by commercial interests.

As a quality-control system, the scholarly-communication system should be conservative and skeptical. As a communication system, it should give exposure to new ideas and give them a chance to develop. By simultaneously pursuing two contradictory goals, scholarly journals have become ineffective at both. They are too specialized to be credible validators. They are too slow and bureaucratic for growing new ideas.

Journals survive because universities use them for assessment. Not surprisingly, scholarly papers solidly reside in muggle world. Too many papers are written by Very Serious Intellectuals (VSIs) for VSIs. Too many papers are written in self-aggrandizing pompous prose, loaded with countless footnotes. Too many papers are written to flatter VSIs with too many irrelevant references. Too many papers are written to puff up a tidbit of incremental information. Too many papers are written. Too few papers detail negative results or offer serious critique, because that only makes enemies.

When given the opportunity, scholarly authors produce awe inspiring presentations. The edutainment universe of TED Talks may not be an appropriate forum for the daily grunt work of the scholar, but is it really too much to ask that the scholarly-communication system let the wizardry shine through?

Universities claim to be society's engines of innovation. They have preached the virtues of creative destruction brought on by technological innovation. Yet, the wizards of the ivory tower resist minor change as much as the muggles of the world.

Open Access is catalyzing reform on the business side of the scholarly-communication system. Will Open Access be enough to push universities into experimentation on the scholarly side?

That is an Open question.

Wednesday, October 1, 2014

The Metadata Bubble

In an ideal world, scholars deposit their papers in an Open Access repository, because they know it will advance their research, support their students, and promote a knowledge-based society. A few disciplinary repositories, like ArXiv, have shown that it is possible to close the virtuous cycle where scholars reinforce each other's Open Access habits. In these communities, no authority is needed to compel participation.

Institutional repositories have yet to build similar broad-based enthusiastic constituencies. Yet, many Open Access advocates believe that the decentralized approach of institutional repositories creates a more scalable system with a higher probability for long-term survival. The campaign to enact institutional deposit mandates hopes to jump start an Open Access virtuous cycle for all scholarly disciplines and all institutions. The risk of such a campaign is that it may backfire if scholars should experience Open Access as an obligation with few benefits. For long-term success, most scholars must perceive their compelled participation in Open Access as a positive experience.

It is, therefore, crucial that repositories become essential scholarly resources, not dark archives to be opened only in case of emergency. The Open Archives Initiative (OAI) repository design provided what was thought to be the necessary architecture. Unfortunately, we are far from realizing its anticipated potential. The Protocol for Metadata Harvesting (OAI-PMH) allows service providers to harvest any metadata in any format, but most repositories provide only minimal Dublin Core metadata, a format in which most fields are optional and several are ambiguous. Extremely few repositories enable Object Reuse and Exchange (OAI-ORE), which allows for complex inter-repository services through the exchange of multimedia objects, not just metadata about them. As a result, OAI-enabled services are largely limited to the most elementary kind of searches, and even these often deliver unsatisfactory results, like metadata-only placeholder records for works restricted by copyright or other considerations.

In a few years, we will entrust our life and limb to self-driving cars. Their programs have just milliseconds to compute critical decisions based on information that is imprecise, approximate, incomplete, and inconsistent: all maps are outdated by the time they are produced, GPS signals may disappear, radar and/or lidar signatures are ambiguous, and video or images provide obstructed views in constantly changing environments. When we can extract so much actionable information from such "dirty" information, it seems quaint to obsess about metadata.

Databases automatically record user interactions. Users fill out forms and effectively crowdsource metadata. Expert systems can extract, from any document in any format and in any language, author information, citations, keywords, DNA sequences, chemical formulas, mathematical equations, etc. Other expert systems have growing capabilities to analyze sound, image, and video. Technology is evaporating the pool of problems that require human intervention at the transaction level. The opportunities for human metadata experts to add value are disappearing fast.

The metadata approach is obsolete for an even more fundamental reason. Metadata are the digital extension of a catalog-centered paper-based information system. In this kind of system, today's experts organize today's information so tomorrow's users may solve tomorrow's problems efficiently. This worked well when technology changed slowly, when experts could predict who the future users would be, what kind of problems they would like to solve, and what kind of tools they would have at their disposal. These conditions no longer apply.

When digital storage is cheap, why implement expensive selection processes for an archive? When search technology does not care whether information is excruciatingly organized or piled in a heap, why spend countless hours organizing and curating content? Why agonize over potential future problems with unreadable file formats? Preserve all the information about current software and standards, and start developing the expert systems to unscramble any historical format. Think of any information-management task. How reasonable is the proposition that this task will require direct human intervention in two years? In five years? In ten years?

For content, more is more. We must acquire as much content as possible, and store it safely.

For content administration, less is more. Expert systems give us the freedom to do the bare minimum and to make a mess of it. While we must make content useful and enable as many services as possible, it is no longer feasible to accomplish that by designing systems for an anticipated future. Instead, we must create the conditions that attract developers of expert systems. This is remarkably simple: Make the full text and all data available with no strings attached.

Real Open Access.

Monday, June 30, 2014

Disruption Disrupted?

The professor who books his flights online, reserves lodging with Airbnb, and arranges airport transportation with Uber understands the disruption of the travel industry. He actively supports that disruption every time he attends a conference. When MOOCs threaten his job, when The Economist covers reinventing the university and titles it “Creative Destruction", that same professor may have second thoughts. With or without disruption, academia surely is in a period of immense change. There is the pressure to reduce costs and tuition, the looming growth of MOOCs, the turmoil in scholarly communication (subscription prices, open access, peer review, alternative metrics), the increased competition for funding, etc.

The term disruption was coined and popularized by Harvard Business School Professor Clayton Christensen, author of The Innovator's Dilemma. [The Innovator's Dilemma, Clayton Christensen, Harvard Business Review Press, 1997] Christensen created a compelling framework for understanding the process of innovation and disruption. Along the way, he earned many accolades in academia and business. In recent years, a cooling of the academic admiration became increasingly noticeable. A snide remark here. A dismissive tweet there. Then, The New Yorker launched a major attack on the theory of disruption. [The Disruption Machine, Jill Lepore, The New Yorker, June 23rd, 2014] In this article, Harvard historian Jill Lepore questions Christensen's research by attacking the underlying facts. Were Christensen's disruptive startups really startups? Did the established companies really lose the war or just one battle? At the very least, Lepore is implying that Christensen misled his readers.

As of this writing, Christensen has only responded in a brief interview. [Clayton Christensen Responds to New Yorker Takedown of 'Disruptive Innovation', Bloomberg Businessweek, June 20th, 2014] It is clear he is preparing a detailed written response.

Lepore's critique appears at the moment when disruption may be at academia's door, seventeen years after The Innovator's Dilemma was published, much of the research almost twenty years old. Perhaps, the article is merely a symptom of academics growing nervous. Yet, it would be wrong to dismiss Lepore's (or anyone other's) criticism based on any perceived motivation. Facts can be and should be examined.

In 1997, I was a technology manager tasked with dragging a paper-based library into the digital era. When reading (and re-reading) the book, I did not question the facts. When Christensen stated that upstart X disrupted established company Y, I accepted it. I assume most readers did. The book was based on years of research, all published in some of the most prestigious peer-reviewed journals. It is reasonable to assume that the underlying facts were scrutinized by several independent experts. Truth be told, I did not care much that his claims were backed by years of research. Christensen gave power to the simple idea that sticking with established technology can carry an enormous opportunity cost.

Established technology has had years, perhaps decades, to mitigate its weaknesses. It has a constituency of users, service providers, sales channels, and providers of derivative services. This constituency is a force that defends the status quo in order to maintain established levels of quality, profit margins, and jobs. The innovators do not compete on a level playing field. Their product may improve upon the old in one or two aspects, but it has not yet had the opportunity to mitigate its weaknesses. When faced with such innovations, all organizations tend to stick with what they know for as long as possible.

Christensen showed the destructive power of this mind set. While waiting until the new is good enough or better, organizations lose control of the transition process. While pleasing their current customers, they lose future customers. By not being ahead of the curve, by ignoring innovation, by not restructuring their organizations ahead of time, leaders may put their organizations at risk. Christensen told compelling disruption stories in many different industries. This allowed readers to observe their own industry with greater detachment. It gave readers the confidence to push for early adoption of inevitable innovation.

I am not about to take sides in the Lepore-Christensen debate. Neither needs my help. As an observer interested in scholarly communication, I cannot help but noting that Lepore, a distinguished scholar, launched her critique from a distinctly non-scholarly channel. The New Yorker may cater to the upper-crust of intellectuals (and wannabes), but it remains a magazine with journalistic editorial-review processes, quite distinct from scholarly peer-review processes.

Remarkably, the same happened only a few weeks ago, when the Financial Times attempted to take down Piketty's book. [Capital in the Twenty-First Century, Thomas Piketty, Belknap Press; 2014]  [Piketty findings undercut by errors, Chris Giles, Financial Times, May 23rd, 2014] Piketty had a distinct advantage over Christensen. The Financial Times critique appeared a few weeks after his book came out. Moreover, he had made all of his data public, including all technical adjustments required to make data from different sources compatible. As a result, Piketty was able to respond quickly, and the controversy quickly dissipated. Christensen has the unenviable task of defending twenty-year old research. For his sake, I hope he was better at archiving data than I was in the 1990s.

What does it say about the status of scholarly journals when scholars use magazines to launch scholarly critiques? Was Lepore's article not sufficiently substantive for a peer-reviewed journal? Are scholarly journals incapable or unwilling to handle academic controversy involving one of its eminent leaders? Is the mainstream press just better at it? Would a business journal even allow a historian to critique business research in its pages? If this is the case, is peer review less about maintaining standards and more about protecting an academic tribe? Is the mainstream press just a vehicle for some scholars to bypass peer review and academic standards? What would it say about peer review if Lepore's arguments should prevail?

This detached observer pours a drink and enjoys the show.


PS (7/15/2014): Reposted with permission at The Impact Blog of The London School of Economics and Political Science.

Friday, June 20, 2014

The Billionaires, Part 1: Elon Musk

Elon Musk did not need a journal to publicize his Hyperloop paper. [Hyperloop Alpha] No journal can create the kind of buzz he creates on his own. He did not need the validation of peer review; he had the credibility of his research teams that already revolutionized travel on earth and to space. He did not need the prestige of a journal's brand; he is his own brand.

Any number of journals would have published this paper by this author. They might even have expedited their review process. Yet, journals could hardly have done better than the public-review process that actually took place. Within days, experts from different disciplines had posted several insightful critiques. By now, there are too many to list. A journal would have insisted that the paper include author(s) and affiliations, a publication date (Aug. 12th, 2013), a bibliography... but those are irrelevant details to someone on a mission to change the world.

Does the Hyperloop paper even qualify as a scholarly paper? Or, is it an engineering-based political pamphlet written to undermine California's high-speed rail project? As a data point for scholarly communication, the Hyperloop paper may be an extreme outlier, but it holds some valuable lessons for the scholarly-communication community.

The gate-keeping role of journals is permanently over.

Neither researchers nor journalists rely on scholarly editors to dismiss research on their behalf.

In many disciplines, day-to-day research relies more on the grey literature (preprints, technical reports, even blogs and mailing lists) than on journal articles. In other words, researchers commit considerable time to refereeing one another, but they largely ignore each other's gate keeping. When it matters, they prefer immediacy over gate keeping and their own gate keeping over someone else's.

The same is true for journalists. If the story is interesting, it does not matter whether it comes from an established journal or the press release of a venture capitalist. Many journalists balance their reports with comments from neutral or adversarial experts. This practice may satisfy a journalistic concept of objectivity, but giving questionable research "equal treatment" may elevate it to a level it does not deserve.

Public review can be fast and effective. 

The web-based debate on Hyperloop remained remarkably professional and civil. Topics that attract trolls and conspiracy theorists may benefit from a more controlled discussion environment, but the public forum worked well for Hyperloop. The many critiques provide skeptical, but largely constructive, feedback that bold new ideas need.

Speculative papers that spark the imagination do not live by the stodgy rules of peer review.

The Hyperloop paper would be a success if its only accomplishment is inspiring a handful of young engineers to research radically different modes of mass transportation. Unfortunately, publishing speculative, incomplete, sloppy, or bad research may cause real harm. The imagined link between vaccines and autism (published in a peer-reviewed journal and later retracted) serves as an unhappy reminder of the latter.

Not all good research belongs in the scholarly record.

This episode points to an interactive future of scholarly communication. After the current public discussion, Hyperloop may gain acceptance, and engineering journals may publish many papers about it. Alternatively, the idea may die a quiet death, perhaps documented by one or more historical review papers (or books).

The ideal research paper solves a significant problem with inspiration (creative bold ideas) and perspiration (proper methodology, reproducibility, accuracy). Before that ideal is in sight, researchers travel long winding roads with many detours and dead ends. Most papers are small incremental steps along that road. A select few represent milestone research.

The de-facto system to identify milestone research is journal prestige. No journal could survive if it advertised itself as a place for routine research. Instead, the number of journals has exploded, and each journal claims high prestige for the narrowest of specializations. All of these journals treat all submissions as if they are milestone research and apply the same costly and inefficient refereeing processes across the board.

The cost of scholarly communication is more than the sum of subscriptions and page charges. While refereeing can be a valuable experience, there is a point of diminishing returns. Moreover, overwhelmed scholars are more likely to conduct only cursory reviews after ignoring the requests for extended periods. The expectation that all research deserves to be refereed has reduced the quality of the refereeing process, introduced inordinate delays, increased the number of journals, and indirectly increased the pressure to publish.

Papers should earn the privilege to be refereed. By channeling informal scholarly communication to social-network platforms, research can gain some scholarly weight based on community feedback and usage-based metrics. Such social networks, perhaps run by scholarly societies, would provide a forum for lively debate, and they could act as submission and screening systems for refereed journals. By restricting refereed journals to milestone research supported and validated by a significant fraction of the profession, we would need far fewer, less specialized journals.

A two-tier system would provide the immediacy and openness researchers crave, while reserving the highest level of scrutiny to research that has already shown significant promise.

Monday, April 14, 2014

The Bleeding Heart of Computer Science

Who is to blame for the Heartbleed bug? Perhaps, it does not matter. Just fix it, and move on. Until the next bug, and the next, and the next.

The Heartbleed bug is different from other Internet scares. It is a vulnerability at the core of the Internet infrastructure, a layer that provides the foundation for secure communication, and it went undetected for years. It should be a wake-up call. Instead, the problem will be patched. Some government and industry flacks will declare the crisis over. We will move on and forget about it.

There is no easy solution. No shortcut. We must redevelop our information infrastructure from the ground up. Everything. Funding and implementing such an ambitious plan may become feasible only after a major disaster strikes that leaves no other alternative. But even if a complete redesign were to become a debatable option, it is not at all clear that we are up to the task.

The Internet is a concurrent and asynchronous system. A concurrent system consists of many independent components like computers and network switches. An asynchronous system operates without a central clock. In synchronous systems, like single processors, a clock provides the heartbeat that tells every component when state changes occur. In asynchronous systems, components are interrupt driven. They react to outside events, messages, and signals as they happen. The thing to know about concurrent asynchronous systems is this: It is impossible to de-bug them. It is impossible to isolate components from one another for testing purposes. The cost of testing quickly becomes prohibitive for each successively smaller marginal reduction in the probability of bugs. Unfortunately, when a system consists of billions of components, even extremely low-probability events are a daily occurrence. These unavoidable fundamental problems are exacerbated by continual system changes in hardware and software and by bad actors seeking to introduce and/or exploit vulnerabilities.

When debugging is not feasible, mathematical rigor is required. Current software-development environments are all about pragmatism, not rigor. Programming infrastructure is built to make programming easy, not rigorous. Most programmers develop their programs in a virtual environment and have no idea how their programs really function. Today's computer-science success stories are high-school geniuses that develop multimillion-dollar apps and college dropouts that start multibillion-dollar businesses. These are built on fast prototypes and viral marketing, not mathematical rigor. Who in their right mind would study computer science from people who made a career writing research proposals that never led to anything worth leaving a paltry academic job for?

Rigor in programming is the domain of Edsger W. Dijkstra, the most (in)famous, admired, and ignored computer-science eccentric. In 1996, he laid out his vision of Very Large Scale Application of Logic as the basis for the next fifty years of computer science. Although the examples are dated, his criticism of the software industry still rings true:
Firstly, simplicity and elegance are unpopular because they require hard work and discipline to achieve and education to be appreciated. Secondly we observe massive investments in efforts that are heading in the opposite direction. I am thinking about so-called design aids such as circuit simulators, protocol verifiers, algorithm animators, graphical aids for the hardware designers, and elaborate systems for version control: by their suggestion of power, they rather invite than discourage complexity. You cannot expect the hordes of people that have devoted a major part of their professional lives to such efforts to react kindly to the suggestion that most of these efforts have been misguided, and we can hardly expect a more sympathetic ear from the granting agencies that have funded these efforts: too many people have been involved and we know from past experience that what has been sufficiently expensive is automatically declared to have been a great success. Thirdly, the vision that automatic computing should not be such a mess is obscured, over and over again, by the advent of a monstrum that is subsequently forced upon the computing community as a de facto standard (COBOL, FORTRAN, ADA, C++, software for desktop publishing, you name it).
[The next fifty years, Edsger W. Dijkstra, circulated privately, 1996,
Document 1243a of the E. W. Dijkstra Archive,
https://www.cs.utexas.edu/users/EWD/ewd12xx/EWD1243a.PDF,
or, for fun, a version formatted in the Dijkstra handwriting font]

The last twenty years were not kind to Dijkstra's vision. The hordes turned into horsemen of the apocalypse that trampled, gored, and burned any vision of rigor in software. For all of us, system crashes, application malfunctions, and software updates are daily occurrences. It is build into our expectation.

In today's computer science, the uncompromising radicals that prioritize rigor do not stand a chance. Today's computer science is the domain of genial consensus builders, merchants of mediocrity that promise everything to everyone. Computer science has become a social construct that evolves according to political rules.

A bottoms-up redesign of our information infrastructure, if it ever becomes debatable, would be defeated before it even began. Those who could accomplish a meaningful redesign would never be given the necessary authority and freedom. Instead, the process would be taken over by political and business forces, resulting into effective status quo.

In 1996, Dijkstra believed this:
In the next fifty years, Mathematics will emerge as The Art and Science of Effective Formal Reasoning, and we shall derive our intellectual excitement from learning How to Let the Symbols Do the Work.
There is no doubt that he would still cling to this goal, but even Dijkstra may have started to doubt his fifty-year timeline.

Monday, March 17, 2014

Textbook Economics

The impact of royalties on a book's price, and its sales, is greater than you think. Lower royalties often end up better for the author. That was the publisher's pitch when I asked him about the details of the proposed publishing contract. Then, he explained how he prices textbooks.

It was the early 1990s, I had been teaching a course on Concurrent Scientific Computing, a hot topic then, and several publishers had approached me about writing a textbook. This was an opportunity to structure a pile of course notes. Eventually, I would sign on with a different publisher, a choice that had nothing to do with royalties or book prices. [Concurrent Scientific Computing, Van de Velde E., Springer-Verlag New York, Inc., New York, NY, 1994.]

He explained that a royalty of 10% increases the price by more than 10%. To be mathematical about it: With a royalty rate r, a target revenue per book C, and a retail price P, we have that C = P-rP (retail price minus royalties). Therefore, P = C/(1-r). With a target revenue per book of $100, royalties of 10%, 15%, and 20% lead to retail prices of $111.11, $117.65, and $125.00, respectively.

In a moment of candor, he also revealed something far more interesting: how he sets the target revenue C. Say the first printing of 5000 copies requires an up-front investment of $100,000. (All numbers are for illustrative purposes only.) This includes the cost of editing, copy-editing, formatting, cover design, printing, binding, and administrative overhead. Estimating library sales at 1000 copies, this publisher would set C at $100,000/1,000 = $100. In other words, he recovered his up-front investment from libraries. Retail sales were pure profit.

The details are, no doubt, more complicated. Yet, even without relying on a recollection of an old conversation, it is safe to assume that publishers use the captive library market to reduce their business risk. In spite of increasingly recurrent crises, library budgets remain fairly predictable, both in size and in how the money is spent. Any major publisher has reliable advance estimates of library sales for any given book, particularly if published as part of a well-known series. It is just good business to exploit that predictability.

The market should be vastly different now, but textbooks have remained stuck in the paper era longer than other publications. Moreover, the first stage of the move towards digital, predictably, consists of replicating the paper world. This is what all constituents want: Librarians want to keep lending books. Researchers and students like getting free access to quality books. Textbook publishers do not want to lose the risk-reducing revenue stream from libraries. As a result, everyone implements the status quo in digital form. Publishers produce digital books and rent their collections to libraries through site licenses. Libraries intermediate electronic-lending transactions. Users get the paper experience in digital form. Universities pay for site licenses and the maintenance of the digital-lending platforms.

After the disaster of site licenses for scholarly journals, repeating the same mistake with books seems silly. Once again, take-it-or-leave-it bundles force institutions into a false choice between buying too much for everyone or nothing at all. Once again, site licenses eliminate the unlimited flexibility of digital information. Forget about putting together a personal collection tailored to your own requirements. Forget about pricing per series, per book, per chapter, unlimited in time, one-day access, one-hour access, readable on any device, or tied to a particular device. All of these options are eliminated to maintain the business models and the intermediaries of the paper era.

Just by buying/renting books as soon as they are published, libraries indirectly pay for a significant fraction of the initial investment of producing textbooks. If libraries made that initial investment explicitly and directly, they could produce those same books and set them free. Instead of renting digital books (and their multimedia successors), libraries could fund authors to write books and contract with publishers to publish those manuscripts as open-access works. Authors would be compensated. Publishers would compete for library funds as service providers. Publishers would be free to pursue the conventional pay-for-access publishing model, just not with library dollars. Prospective authors would have a choice: compete for library funding to produce an open-access work or compete for a publishing contract to produce a pay-for-access work.

The Carnegie model of libraries fused together two distinct objectives: subsidize information and disseminate information by distributing books to many different locations. In web-connected communities, spending precious resources on dissemination is a waste. Inserting libraries in digital-lending transactions only makes those transactions more inconvenient. Moreover, it requires expensive-to-develop-and-maintain technology. By reallocating these resources towards subsidizing information, libraries could set information free without spending part of their budget on reducing publishers' business risk. The fundamental budget questions that remain are: Which information should be subsidized? What is the most effective way to subsidize information?

Libraries need not suddenly stop site licensing books tomorrow. In fact, they should take a gradual approach, test the concept, make mistakes, and learn from them. A library does not become a grant sponsor and/or publisher overnight. Several models are already available: from grant competition to crowd-funded ungluing. [Unglue.it for Libraries] By phasing out site licenses, any library can create budgetary space for sponsoring open-access works.

Libraries have a digital future with almost unlimited opportunities. Yet, they will miss out if they just rebuild themselves as a digital copy of the paper era.

Monday, January 20, 2014

A Cloud over the Internet

Cloud computing could not have existed without the Internet, but it may make Internet history by making the Internet history.

Organizations are rushing to move their data centers to the cloud. Individuals have been using cloud-based services, like social networks, cloud gaming, Google Apps, Netflix, and Aereo. Recently, Amazon introduced WorkSpaces, a comprehensive personal cloud-computing service. The immediate benefits and opportunities that fuel the growth of the cloud are well known. The long-term consequences of cloud computing are less obvious, but a little extrapolation may help us make some educated guesses.

Personal cloud computing takes us back to the days of remote logins with dumb terminals and modems. Like the one-time office computer, the cloud computer does almost all of the work. Like the dumb terminal, a not-so-dumb access device (anything from the latest wearable gadget to a desktop) handles input/output. Input evolved beyond keystrokes and now also includes touch-screen gestures, voice, image, and video. Output evolved from green-on-black characters to multimedia.

When accessing a web page with content from several contributors (advertisers, for example), the page load time depends on several factors: the performance of computers that contribute web-page components, the speed of the Internet connections that transmit these components, and the performance of the computer that assembles and formats the web page for display. By connecting to the Internet through a cloud computer, we bypass the performance limitations of our access device. All bandwidth-hungry communication occurs in the cloud on ultra-fast networks, and almost all computation occurs on a high-performance cloud computer. The access device and its Internet connection just need to be fast enough to process the information streams into and out of the cloud. Beyond that, the performance of the access device hardly matters.

Because of economies of scale, the cloud-enabled net is likely to be a highly centralized system dominated by a small number of extremely large providers of computing and networking. This extreme concentration of infrastructure stands in stark contrast to the original Internet concept, which was designed as a redundant, scalable, and distributed system without a central authority or a single point of failure.

When a cloud provider fails, it disrupts its own customers, and the disruption immediately propagates to the customers' clients. Every large provider is, therefore, a systemic vulnerability with the potential of taking down a large fraction of the world's networked services. Of course, cloud providers are building infrastructure of extremely high reliability with redundant facilities spread around the globe to protect against regional disasters. Unfortunately, facilities of the same provider all have identical vulnerabilities, as they use identical technology and share identical management practices. This is a setup for black-swan events, low-probability large-scale catastrophes.

The Internet is overseen and maintained by a complex international set of authorities. [Wikipedia: Internet Governance] That oversight loses much of its influence when most communication occurs within the cloud. Cloud providers will be tempted to deploy more efficient custom communication technology within their own facilities. After all, standard Internet protocols were designed for heterogeneous networks. Much of that design is not necessary on a network where one entity manages all computing and all communication. Similarly, any two providers may negotiate proprietary communication channels between their facilities. Step by step, the original Internet will be relegated to the edges of the cloud, where access devices connect with cloud computers.

Net neutrality is already on life support. When cloud providers compete on price and performance, they are likely to segment the market. Premium cloud providers are likely to attract high-end services and their customers, relegating the rest to second-tier low-cost providers. Beyond net neutrality, there may be a host of other legal implications when communication moves from public channels to private networks.

When traffic moves to the cloud, telecommunication companies will gradually lose the high-margin retail market of providing organizations and individuals with high-bandwidth point-to-point communication. They will not derive any revenue from traffic between computers within the same cloud facility. The revenue from traffic between cloud facilities will be determined by a wholesale market with customers that have the resources to build and/or acquire their own communication capacity.

The existing telecommunication infrastructure will mostly serve to connect access devices to the cloud over relatively low-bandwidth channels. When TV channels are delivered to the cloud (regardless of technology), users select their channel on the cloud computer. They do not need all channels delivered to the home at all times; one TV channel at a time per device will do. When phones are cloud-enabled, a cloud computer intermediates all communication and provides the functional core of the phone.

Telecommunication companies may still come out ahead as long as the number of access devices keeps growing. Yet, they should at least question whether it would be more profitable to invest in cloud computing instead of ever higher bandwidth to the consumer.

The cloud will continue to grow as long as its unlimited processing power, storage capacity, and communication bandwidth provide new opportunities at irresistible price points. If history is any guide, long-term and low-probability problems at the macro level are unlikely to limit its growth. Even if our extrapolated scenario never completely materializes, the cloud will do much more than increase efficiency and/or lower cost. It will change the fundamental character of the Internet.

Monday, December 16, 2013

Beall's Rant

Jeffrey Beall of Beall's list of predatory scholarly publishers recently made some strident arguments against Open Access (OA) in the journal tripleC (ironically, an OA journal). Beall's comments are part of a non-refereed section dedicated to a discussion on OA.

Michael Eisen takes down Beall's opinion piece paragraph by paragraph. Stevan Harnad responds to the highlights/lowlights. Roy Tennant has a short piece on Beall in The Digital Shift.

Beall's takes a distinctly political approach in his attack on OA:
“The OA movement is an anti-corporatist movement that wants to deny the freedom of the press to companies it disagrees with.”
“It is an anti-corporatist, oppressive and negative movement, [...]”
“[...] a neo-colonial attempt to cast scholarly communication policy according to the aspirations of a cliquish minority of European collectivists.”
“[...] mandates set and enforced by an onerous cadre of Soros-funded European autocrats.”
This is the rhetorical style of American extremist right-wing politics that casts every problem as a false choice between freedom and – take your pick – communism or totalitarianism or colonialism or slavery or... European collectivists like George Soros (who became a billionaire by being a free-market capitalist).

For those of us more comfortable with technocratic arguments, politics is not particularly welcome. Yet, we cannot avoid the fact that the OA movement is trying to reform a large socio-economic system. It would be naïve to think that that can be done without political ideology playing a role. But is it really too much to ask to avoid the lowest level of political debate, politics by name-calling?

The system of subscription journals has an internal free-market logic to it that no proposed or existing OA system has been able to replace. In a perfect world, the subscription system uses an economic market to assess the quality of editorial boards and the level of interest in a particular field. Economic viability acts as a referee of sorts, a market-based minimum standard. Some editorial boards deserve the axe for doing poor work. Some fields of study deserve to go out of business for lack of interest. New editorial boards and new fields of study deserve an opportunity to compete. Most of us prefer that these decisions are made by the collective and distributed wisdom of free-market mechanisms.

Unfortunately, the current scholarly-communication marketplace is far from a free market. Journals hardly compete directly with one another. Site licenses perpetuate a paper-era business model that forces universities to buy all content for 100% of the campus community, even those journals that are relevant only to a sliver of the community. Site licenses limit competition between journals, because end users never get to make the price/value trade-offs critical to a functional free market. The Big Deal exacerbates the problem. Far from providing a service, as Beall contends, the Big Deal gives big publishers a platform to launch new journals without competition. Consortial deals are not discounts; they introduce peer networks to make it more difficult to cancel existing subscriptions. [What if Libraries were the Problem?] [Libraries: Paper Tigers in a Digital World]

If Beall believes in the free market, he should support competition from new methods of dissemination, alternative assessment techniques, and new journal business models. Instead, he seems to be motivated more by a desire to hold onto his disrupted job description:
“Now the realm of scholarly communication is being removed from libraries, and a crisis has settled in. Money flows from authors to publishers rather than from libraries to publishers. We've disintermediated libraries and now find that scholarly system isn't working very well.”
In fact, it is the site-license model that reduced the academic library to the easy-to-disintermediate dead-end role of subscription manager. [Where the Puck won't Be] Most librarians are apprehensive about the changes taking place, but they also realize that they must re-interpret traditional library values in light of new technology to ensure long-term survival of their institution.

Thus far, scholarly publishing has been the only type of publishing not disrupted by the Internet. In his seminal work on disruption [The Innovator's Dilemma], Clayton Christensen characterizes the defenders of the status quo in disrupted industries. Like Beall, they are blinded by traditional quality measures, dismiss and/or denigrate innovations, and retreat into a defense of the status quo.

Students, researchers, and the general public deserve a high-quality scholarly-communication system that satisfies basic minimum technological requirements of the 21st century. [Peter Murray-Rust, Why does scholarly publishing give me so much technical grief?] In the last 20 years of the modern Internet, we have witnessed innovation after innovation. Yet, scholarly publishing is still tied to the paper-imitating PDF format and to paper-era business models.

Open Access may not be the only answer [Open Access Doubts], but it may very well be the opportunity that this crisis has to offer. [Annealing the Library] In American political terms, Green Open Access is a public option. It provides free access to author-formatted versions of papers. Thereby, it serves the general public and the scholarly poor. It also serves researchers by providing a platform for experimentation without having to go through onerous access negotiations (for text mining, for example). It also serves as an additional disruptive trigger for free-market reform of the scholarly market. Gold Open Access in all its forms (from PLOS to PEERJ) is a set of business models that deserve a chance to compete on price and quality.

The choice is not between one free-market option and a plot of European collectivists. The real choice is whether to protect a functionally inadequate system or whether to foster an environment of innovation.

Monday, December 2, 2013

Amazon Floods the Information Commons

Amazon is bringing cloud computing to the masses. Any individual with access to a browser now has access to almost unlimited computing power and storage. This may be the moment that marks the official beginning of the end of the desktop computer, which was already on a downward slide because of the rise of notebooks, netbooks, tablets, and smartphones.

For managers of computer labs, this technology eliminates a slew of nitty gritty management problems without good solutions. When a shared computer is idle, do you take action after 5, 10, or 15 minutes? If you wait too long, you annoy users who are waiting for their turn, and you invite unauthorized users to sneak into someone else's session. If you act too soon, you ruin the experience for the current user. Should you immediately log off an idle user or do you lock the screen for a while before logging off? Again, you balance the interests of the current user against those of the next user. Which software do you install where? Installing all software on every computer is usually too expensive. But if each computer in the lab has its own configuration, how do you communicate those differences to the users? The ultimate challenge of the shared computer is how to let students install software that they themselves are developing while keeping the computer relatively secure, usable to others, and free from pirated software.

Amazon has solved all of this and more. With cloud-based computers, there is no such thing as an idle computer, only idle screens. Shutting down a screen and turning it over to another user does not ruin a session in progress. It is more like turning over a printer. The cloud-based personal computer is configured for one user according to his or her requirements. Students and faculty can install whatever software they need, including their own research software. As to the usual suite of standard applications, cloud services like Adobe Creative Cloud, Google Apps, and Windows Azure have eliminated software installation and maintenance entirely.

The potential of cloud computing in the Information Commons is more than substituting one technology with another. Students and faculty suddenly have their own custom computing laboratory with an unlimited number of computers over which they have complete control. One can imagine projects in which cloud-based computers harvest measurements from sensors across the globe (weather-related, for example), read and analyze the news, and data mine social networks. All of this data can then be fed to high-performance servers running research software for analysis and visualization.

Currently, retail pricing for a cloud-based personal computer starts at $35 per month. This is already a very good price point, considering that it eliminates the hardware replacement cycle, software maintenance, security issues, etc. One can also add and drop computers as needed. Moreover, this is a price point established before competitors have even entered the market. 

When computing and storage become relatively inexpensive on-demand commodity services, computing labs are no longer in the business of sharing computing devices, storage, and software; they are in the business of sharing visualization devices. Currently, Information Commons provide large-screen high-resolution monitors attached to a computer. As large-scale, high-performance, big-data projects grow in popularity across many disciplines, there will be increasing demand for more advanced equipment to visualize and render the results. Today's computing labs will morph into advanced visualization labs. They will provide the capacity to use multiple large high-resolution screens. They may provide access to CAVEs (CAVE Automatic Virtual Environment) and/or additive-manufacturing equipment (which includes 3-D printing). The support requirements for such equipment are radically different from those for current computer labs. CAVEs need large rooms with no windows, multiple projectors, and a sound system. Additive manufacturing may be loud and may require specialized venting systems.

For managers of Information Commons, it is not too early to start planning for this transition. They may look forward to getting rid of the nitty-gritty unsolvable problems mentioned above, but integrating these technologies into the real estate currently used for computing labs and libraries will require all of the organizational and management skills they can muster.

Tuesday, May 21, 2013

Turow vs Everyone

According to celebrated author, lawyer, and president of the Author's Guild Scott Turow, the legal and technological erosion of copyright endangers writers. (New York Times, April 7th, 2013) His enemy list is conspiratorial in length and breadth. It includes the Supreme Court, publishers, search engines, the Hathi trust, Google, academics, libraries, and Amazon. Nevertheless, Turow makes compelling arguments that deserve scrutiny.

The Supreme Court decision on re-importation. (Kirtsaeng v. John Wiley & Sons, Inc.)
This 6-3 decision merely reaffirmed the first sale doctrine. It is highly unlikely that this will significantly affect book prices in the US. If it does, any US losses will be offset by price increases in foreign markets. More importantly, the impact will be negligible because paper books will soon be a niche market in the US.

Publishers restrict royalties on e-books.
Publishers who manage the technology shift by making minor business adjustments, such as transferring costs to authors, libraries, and consumers, underestimate the nature of current changes. Traditional publishers built their business when disseminating information was difficult. Once they built their dissemination channels, making money was relatively easy. In our current world, building dissemination channels is easy and cheap. Making money is difficult. Authors may need new partners who built their business in the current environment; there are some in his list of enemies.

Search engines make money of referring users to pirate sites.
Turow has a legitimate moral argument. However, politicizing search engines by censoring search results is as wrong as it is ineffective. Pirate sites also spread through social networks. Cutting off pirate sites from advertizing networks, while effective, is difficult to achieve across international borders and requires unacceptable controls on information exchange. iTunes and its competitors have shown it is possible to compete with pirate sites by providing a convenient user interface, speed, reliability, quality, and protection against computer viruses.

The Hathi trust and Google scanned books without authorization.
Hathi and Google were careless. Authors and publishers were rigid. Experimentation gave way to litigation.

Some academics want to curtail copyright.
Scholarly publishers like Elsevier have profit margins that exceed 30%. Yet, Turow claims that “For many academics today, their own copyrights hold little financial value because scholarly publishing has grown so unprofitable.”

Academics' research is often funded in part by government, and it is always supported by universities. Universities have always been committed to research openness, and they use published research as means for assessment. This is why academics forego royalties when they publish research. The concept of research openness is changing, and many academics are lobbying for the idea that research should be freely available to all. The idea of Open Access was recently embraced by the White House. Open Access applies only to researchers funded by the government and/or employed by participating universities and research labs. It only covers research papers, not books. It does not apply to independent authors. Open Access does not curtail copyright.

Legal academics like Prof. Lawrence Lessig have argued for stricter limits on traditional copyright and alternative copyrights. Pressured by industry lobbyists, Congress has repeatedly increased the length of copyright. If this trend continues, recent works may never enter into the public domain. Legislation must balance authors' intellectual property rights and everyone's (including authors') freedom to produce derivative works, commentaries, parodies, etc.

Amazon patents a scheme to re-sell used e-books.
This patent is a misguided attempt to monetize the human frailty of carrying familiar concepts from old technology senselessly into the new. It is hardly the stuff that made this forward-looking company formidable.

Libraries expand paper lending into digital lending.
Turow demands more money from libraries for digital lending privileges. He is too modest; he should demand their whole budget.

When a paper-based library acquires a book, it permanently increases the value of its collection. This cumulative effect over many years created the world's great collections. When a community spends resources on a digital-lending library, it rents information from publishers and provides a fleeting service for only as long as the licenses last. When the license ends, the information disappears. There is no cumulative effect. That digital-lending library only adds overhead. It will never own or contribute new information. It is an empty shell.

Digital lending is popular with the public. It gives librarians the opportunity to transition gradually into digital space. It continues the libraries' billion-dollar money stream to publishers. Digital lending have a political constituency, but it does not stand up to rational scrutiny. Like Amazon's scheme to resell used e-books, digital-lending programs are desperate attempts to hang on to something that simulates the status quo.

Lending is the wrong paradigm for the digital age. Instead, libraries should use their budgets to accumulate quality open-access information. They should sponsor qualified authors to produce open-access works of interest to the communities they serve. This would give authors a choice. They could either produce their work commercially behind a pay wall, or they could produce library-funded open-access works.

Monday, April 22, 2013

The Sibyl of Cumae


“The seventh was of Cumae, by name Amalthaea, who is termed by some Herophile, or Demophile and they say that she brought nine books to the king Tarquinius Priscus, and asked for them three hundred philippics, and that the king refused so great a price, and derided the madness of the woman; that she, in the sight of the king, burnt three of the books, and demanded the same price for those which were left; that Tarquinius much more considered the woman to be mad; and that when she again, having burnt three other books, persisted in asking the same price, the king was moved, and bought the remaining books for the three hundred pieces of gold: and the number of these books was afterwards increased, after the rebuilding of the Capitol; because they were collected from all cities of Italy and Greece, and especially from those of Erythraea, and were brought to Rome, under the name of whatever Sibyl they were.”
The myth of the Sibyl of Cumae from: The Divine Institutes, by Lactantius (b. ca. A.D. 250), Book I, Chapter VI.

Publishers select, prepare, market, and disseminate information. They developed their selection processes at a time when it was expensive to prepare and disseminate information. As these costs decreased, they could publish more and be less selective. However, the selection process endows information with gravitas, a valuable commodity for marketing. Today's publishers must balance two conflicting interests: increase revenue by publishing as much as possible vs. increase profit margins by selectively publishing high-value information. Scholarly publishers found a way to do both.

Where the Sibyl of Cumae burned some books to increase the value of the remaining books, a scholarly journal rejects a certain number of papers for each paper it publishes. Many of the rejected papers may be interesting, but they do not fit the journal's mission. For the publisher, this is an opportunity to spawn new journals in the wake of its successful journals. Such portfolios of journals are less selective than their individual journals. Of course, if one considers the scholarly publishing industry at the macro level, the notion of selectivity virtually vanishes. Papers are submitted and re-submitted until an outlet is found.

The Sibyls of Scholarly Publishing perform an elaborate dance with pyrotechnic effects that give the illusion they burn papers. In fact, each Sibyl takes in new and rejected papers, packages some of them in a journal, and pretends to burn the rest before handing them off to her sisters. Each Sibyl maximizes the price in her respective corner of the universe. Academia repeatedly acts like King Tarquinius, who thinks the woman mad and pays the price she demands.

It may take years and several turnovers of the editorial board before an established journal that covers a large domain accepts papers in an emerging field. This has created a seemingly insatiable demand for new highly specialized journals. Each successful journal serves its publisher by raising revenue, its editorial-board members by raising their research prestige, and its authors by providing an avenue for dissemination of material without a natural home in existing journals. Many of these journals cater to such a small cadre of specialists that they subvert the single largest scholarly benefit of the refereeing process: a critical reading by someone with a different point of view and background. Even when run with the best of intentions, these narrow journals are echo chambers for group think. Emerging fields need some breathing room, particularly in the early developmental stages, but they should not be immune from outside criticism. Do these journals really serve the cause of good scholarship? Are they worth the super-inflationary cost increases, which they help create?

Open Access may not reduce the cost of scholarly communication as originally hoped. A large-scale conversion to Gold Open Access would shift the costs from universities to governments. Once university administrations no longer feel the budgetary pain and the costs are baked into government budgets, publishers would be free to continue the super-inflationary trajectory. There would not be any market forces that limit the introduction of new journals, the growth of existing journals, or the price charged per paper published. The access problem would be resolved by hiding, compounding, and postponing the cost problem. In the end, the scholarly-communication market would remain as dysfunctional as ever.

Technology has eroded the foundation of the current scholarly-communication system. It assumes that there is a scarcity of dissemination, and it uses that scarcity for the purpose of gatekeeping. In fact, dissemination is abundant and nearly free. The scarcity and associated gatekeeping are marketing illusions.

The reluctance to change is understandable. A scholarly-communication system is a delicate balancing act. It must be fair, but critical. It must discourage poor research, yet be supportive of new ideas, including ideas that challenge established views. Because scholarly communication is tied to research assessment, any changes to the system must gain wide institutional acceptance.

Ultimately, we have little choice but to accept today's reality. Anyone has the power to disseminate any information, regardless of quality. No one has the power to be a gatekeeper. At most, editorial boards have the power of influence in their respective communities; they can highlight important achievements and developments. But even this power to influence may soon be challenged by crowd-sourced quality labels of alternative metrics. (Perhaps not.)

We should be elated about the recent successes of the Open Access movement. We should also recognize that Open Access is not an end point. It is only the first step in the reinvention of scholarly communication.