UNC Research Data Is Disappearing. This Team Is Trying To Save It

Getting your Trinity Audio player ready...

In the early days of the Trump administration, when the White House was “flooding the zone” with immigration actions, ending diversity programs, and yanking security clearances, a quieter change started to worry scientists around the country.

A decades-long survey about behavioral risk factors, normally posted on the Centers for Disease Control and Prevention website, went offline. So did the U.S. Agency for International Development’s Demographic and Health Surveys and the CDC’s Environmental Justice Index. Even the U.S. Census website went down for a time.

A team from the University of North Carolina at Chapel Hill launched into action to try to save some of the data—and, ironically, they were positioned to do so because of a federal government directive.

Many of the affected repositories are back online, but ProPublica and The Associated Press reported that President Donald Trump’s cuts to federal offices have halted other long-term data collection projects, including statistics on drug use, maternal health, job-related injuries, and more. Scientists remain concerned about the long-term availability and integrity of information that is crucial to understanding a wide array of topics, from public health trends and crime rates to weather patterns and traffic accidents.

“The erasure of vital public data has far-reaching consequences for researchers, policymakers, and everyday citizens who rely on these resources to make informed decisions,” Stacy Woods, a research director at the Union of Concerned Scientists, wrote in a blog post. “This loss of essential data jeopardizes the health and prosperity of communities across the country and directly undermines our nation’s pursuit of opportunity and justice for all.”

The UNC-CH team trying to back up the data so it can’t go missing again is known as the Research Data Management Core. It was formed in 2023 to provide a central repository for researchers’ studies across the university, coordinate university-wide data infrastructure, and offer expertise about sharing and finding data.

Jonathan Crabtree, the director of the Research Data Management Core, explained the need by reference to a famous project a Harvard University professor would assign his classes: Read a single scientific article, get their data, and redo their work.

“Turns out it’s impossible,” said Crabtree.

He did a similar assignment as a student at UNC-CH. Many of his classmates weren’t able to access the data at all. He was, but it was full of errors. “My point is, this is an ongoing issue that everyone in the scientific community knows about,” he added. “This is not unusual. It’s been happening for a long time.”

Without data, studies can’t be reproduced—a crucial step in the scientific process. So for the past few decades, federal agencies have been trying to solve the problem by pushing to make the data behind research accessible and usable to other scientists. Both the National Science Foundation and National Institutes of Health require grantees to include data management and sharing plans in their proposals.

But even with those policies, data wasn’t often being stored or shared in the long term. A research project would end, or funding would run out, or researchers would move to new schools. When compliance officers searched for the data, they often couldn’t find it.

In 2020, the NIH announced that it was updating its policy, requiring that by 2023 researchers not only share their data, but that they do so in a way that was FAIR—that is, findable, accessible, interoperable, and reusable. Two years later, the Office of Science and Technology Policy announced that all federal agencies had to adopt similar policies.

UNC-CH—home to one of the country’s most acclaimed library and information science programs—responded by launching Crabtree’s team. Crabtree, who had been at the Odum Institute for Research in Social Science at UNC-CH for more than 30 years, was named the inaugural director.

Ctrl+S

The first problem to solve was the sheer diversity of information types: There are qualitative social science surveys, DNA sequences, images from cytometry machines, and more. There’s metadata and lab notebooks, pieces of national datasets that were combined with others.

Jonathan Crabtree leads UNC-Chapel Hill’s Research Data Management Core. (Photo courtesy of UNC-Chapel Hill)

“Imagine a world where you put your scientific data in WordPerfect or WordStar—let me show you how old I am,” Crabtree said, referring to older word processors. “You put it in one of these old systems, and now they don’t exist, so you can’t use those data anymore. Even though you have the file, it’s just unusable.”

It’s like having a collection of VHS tapes but only a Blu-ray player.

The Research Data Management Core works with researchers to create and follow a data-sharing plan and audits to ensure all projects abide by the regulations. It also offers specialized services for more particular needs, like big data projects or those with sensitive personal information.

Then came the Trump administration.

In addition to their own data, researchers suddenly worried about access to external data they needed—all those federal repositories and surveys that the Trump administration removed. “They had been clicking on the button they clicked for many years, and it didn’t exist,” Crabtree said.

So researchers reached out to the data-saving experts.

Crabtree’s team was plugged into national networks already concerned with open data access, including the Internet Archive and the International Association for Social Science Information Service and Technology. In concert, the 14-person team began backing up data that UNC-CH’s research community needed, saving it to a central repository called the UNC Dataverse.

“The erasure of vital public data has far-reaching consequences for researchers, policymakers, and everyday citizens who rely on these resources to make informed decisions.”
Stacy Woods, Union of Concerned Scientists

The goal was never to back up everything—“We’re not becoming a new federal repository because other people are doing that,” Crabtree said—but to save what UNC-CH’s researchers needed so their work could continue unimpeded.

The Dataverse now has California polling data from the 1980s, anti-slavery petitions from 1800s Massachusetts, information from Raleigh’s Dorothea Dix Hospital patient ledger, and tens of thousands of other datasets.

“If I had Elon Musk in my office right here, I can explain to him exactly what we’re doing,” Crabtree said. “We’re being transparent while saving the data. The business case for it is that it saves more money when we have the data and don’t have to recollect that data.”

Other schools have taken note. Crabtree said they’ve received calls from Boise State University, Duke University, and others asking for advice about implementing similar teams.

The Office of Science and Technology Policy update that helped motivate the Research Data Management Core required compliance by December 31, 2025. In a perhaps unsurprising twist, the policy was deleted from the White House website after Trump took office, though it was never rescinded.

“It is a very fluid environment,” Crabtree said, “but all signs point to this continuing.”

Anyone who wants to view the policy is in luck: “It is archived in the National Archives,” Crabtree said.

Matt Hartman is a higher education reporter at The Assembly. He’s also written for The New Republic, The Ringer, Jacobin, and other outlets. Contact him at matt@theassemblync.com.

More by this author

Get N.C.'s biggesthigher ed storiesin your inbox.

Ctrl+S

Get N.C.'s biggest
higher ed stories
in your inbox.