File structure in CORA.RDR / Dataverse
The main goal of CORA.RDR is to store and preserve digital data.
Digital data related is stored in Datasets. You can understand each dataset in CORA.RDR as a computer folder that can contain files or other folders, just as any folder of your computer.
Digital data are stored in software-specific file formats. The choice of software and format depends on the intended use. For example:
- Spreadsheets support formulas, sorting, and filtering — features not preserved in word processors.
- Word processors support complex formatting such as page numbers and tables of contents.
However, saving a file in a program’s default format does not guarantee long-term usability. Risks include:
- Dependency on specific software or software versions
- Obsolescence of proprietary formats
- Exclusive or expensive software required to access files
- Loss of significant characteristics if the format does not support them
To minimize these risks and improve long-term accessibility, use file formats with a high likelihood of remaining usable for many years.
Here are some guidelines on which formats are accepted and the recommended folder structure for a dataset. We encourage that you enforce these guidelines on the datasets that you upload to CORA.RDR, but it is not a technical limitation, which means that in some case the guidelines can be bypassed with a justified reason.
For example, we do recommend using CSV for tabular data, but usually users use Excel files to store their tabular data. It may be possible that when transforming this Excel file into a CSV file,
the original Excel loses an important part of format of the Excel, affecting to the quality of the data. In this case and others it may be reasonable to bypass the guidelines on the recommended file formats.
Here is a revised version of your text with a clearer explanation of DANS and its relevance:
Preferred and Non-Preferred File Formats
Preferred formats are file formats that Data Archiving and Networked Services (DANS), the Dutch national centre of expertise and repository for research data, recommends based on international standards and long-term digital preservation practices.
DANS is an institute of the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Dutch Research Council (NWO). It manages certified research data repositories and develops policies to ensure that research data remain accessible, usable, and sustainable over time. Because of its national mandate and alignment with international agreements on digital preservation, DANS defines which file formats offer the strongest long-term guarantees.
Deposits of research data in preferred formats will always be accepted by DANS.
Non-preferred formats are also widely used and may still be accepted; however, they are expected to remain only moderately to reasonably usable, accessible, and robust in the long term compared to preferred formats.
General Guidelines
DANS believes that file formats best suited for long-term sustainability and accessibility:
- Are frequently used
- Have open specifications
- Are independent of specific software, developers, or vendors
In practice, it is not always possible to use formats that satisfy all criteria.
It may be desirable to deposit certain original data in non-preferred formats because these are common usage formats (e.g., Esri Shapefiles, Microsoft Access databases, SPSS .sav files). In those cases, DANS requests you to deposit:
- The original format and
- A preferred format for long-term sustainability.
File Format Overview
This information is based upon this article and has suffered some modifications.
Text Documents
| Type | Preferred Formats | Non-Preferred Formats |
|---|---|---|
| Text documents | PDF/A (.pdf), ODT (.odt), Microsoft Word (.doc), Office Open XML (.docx), Rich Text File (.rtf) | PDF other than PDF/A (.pdf) |
| Plain text | Unicode text (.txt) | Non-Unicode text (.txt) |
| Markup languages | XML (.xml), HTML (.html) + related (.css, .xslt, .js, .es), Markdown (.md) | — |
| Programming languages | MATLAB, NetCDF, Text-Fabric, Python | — |
Spreadsheets
| Preferred | Non-Preferred |
|---|---|
| ODS (.ods), CSV (.csv) | Microsoft Excel (.xls), Office Open XML (.xlsx), PDF/A (.pdf) |
Databases
| Preferred | Non-Preferred |
|---|---|
| SQL (.sql), SIARD (.siard), CSV (.csv) | Microsoft Access (.mdb, .accdb), dBase (.dbf), HDF5 (.hdf5, .he5, .h5) |
Statistical Data
| Preferred | Non-Preferred |
|---|---|
| SPSS (.dat/.sps), STATA (.dat/.do), JASP (.csv/.html), R | SPSS Portable (.por), SPSS (.sav), STATA (.dta), SAS (.7dat, .sd2, .tpt), JASP (.jasp) |
Raster Images
| Preferred | Non-Preferred |
|---|---|
| JPEG (.jpg, .jpeg), TIFF (.tif, .tiff), PNG (.png), JPEG 2000 (.jp2), DICOM (.dcm) | — |
Vector Images
| Preferred | Non-Preferred |
|---|---|
| SVG (.svg) | Adobe Illustrator (.ai), EPS (.eps), WMF/EMF (.wmf, .emf), CDR (.cdr) |
Audio
| Preferred | Non-Preferred |
|---|---|
| BWF (.bwf), MXF (.mxf), Matroska (.mka), FLAC (.flac), OPUS, WAVE (.wav) | MP3 (.mp3), AAC (.aac, .m4a), AIFF (.aif, .aiff), OGG (.ogg) |
Video
| Preferred | Non-Preferred |
|---|---|
| MXF (.mxf), Matroska (.mkv), MPEG-4 (.mp4, .m4a, .m4v, …), MPEG-2 (.mpg, .mpeg, .m2v, .mpg2, …) | AVI (.avi), QuickTime (.mov, .qt) |
CAD (Computer-Aided Design)
| Preferred | Non-Preferred |
|---|---|
| AutoCAD DXF R12 (ASCII) (.dxf), SVG (.svg) | AutoCAD DXF (other versions), DWG (.dwg), DGN (.dgn) |
GIS (Geographical Information Systems)
| Preferred | Non-Preferred |
|---|---|
| GML (.gml), MIF/MID (.mif/.mid), GeoJSON (.json), GeoPackage (.gpgk) | Esri Shapefiles (.shp + related), MapInfo (.tab + related), KML (.kml, .kmz), Esri Geodatabase (.gdb), Project files (.mxd, .wor, .qgs) |
Georeferenced Images
| Preferred | Non-Preferred |
|---|---|
| GeoTIFF (.tif, .tiff), TIFF World File (.tfw + .tif), JPEG World File (.jgw + .jpg) | ERDAS IMAGINE (.img) |
Raster GIS
| Preferred | Non-Preferred |
|---|---|
| ASCII GRID (.asc, .txt) | Esri GRID (.grd), Surfer Grid (.grd, .srf), ERDAS IMAGINE (.img) |
3D
| Preferred | Non-Preferred |
|---|---|
| OBJ (.obj), PLY (.ply), X3D (.x3d), glTF 2.0 (.gltf, .glb), COLLADA (.dae), LAS (.las, .laz), IFC (.ifc) | Autodesk FBX (.fbx), Blender (.blend), glTF 1.0, 3D PDF (.pdf), Google Draco (.drc), Artec (.a3d), Agisoft Metashape (.psx, .psz), STL (.stl), VRML (.wrl, .wrz, .vrml) |
RDF
| Preferred | Non-Preferred |
|---|---|
| RDF/XML, Trig (.trig), Turtle (.ttl), NTriples (.nt), JSON-LD | — |
CAQDAS (Qualitative Data)
| Preferred | Non-Preferred |
|---|---|
| REFI-QDA | ATLAS.TI copy bundle, NVivo project file |