SF Nexus is an open access project. Data and resources provided here are free for everyone, including:
- Extracted Features: Disaggregated feature sets from copyrighted literature, available for research purposes
- Python Notebooks: Custom Jupyter notebooks in Google colab environments for easy exploration of our data
- Documentation: Descriptions of pipelines used to digitize and analyze our dataset, from OCR cleaning to topic modeling and visualization
- Visualizations: Output generated from analyses of our dataset, including topic modeling and word embeddings
Overviewing the SF Nexus
The SF Nexus comprises a collaborative network of research and public libraries with collections of SF, dedicated to making science fiction available online, including as data. While the SF Nexus project is based at Temple University’s Charles Library, we are committed to growing our collaborations with a SF-focused collective research community. This project presents a prototype of what could be developed as a large-scale collaborative digitization between the dozens of science fiction collections across England and North America, including but not limited to the members of the Science Fiction Collecting Libraries Consortium
The current phase of this website showcases a demonstration project of how libraries can digitize and make available their copyrighted cultural collections as data. Our current focus has been on sharing extracted features of the data, as well as documenting the corpus’ ingestion and curation in the HathiTrust Research Center. Additional projects at Temple Libraries involve developing localized data capsules for confidential computing access to copyrighted corpora, as well as novel ways of digitizing corpora under controlled circumstances.
Explore the Project
- About — the project’s history, the Paskow Science Fiction Collection, and how the corpus was digitized and ingested into HathiTrust
- Data — freely available extracted-features datasets drawn from our 403-text corpus
- OCR and Models — our digitization pipeline and topic-modeling analyses
- Scholarship — related projects, datasets, and digital archives of science fiction as data
Ultimately, the SF Nexus seeks to build and share a comprehensive dataset of science fiction literature. Due to limitations imposed on copyright, this project explores speculative approaches to data curation that can make elements of each book (extracted features) available to scholars seeking to engage in large scale analysis of text as data.
For an overview of our approach, see Alex Wermer-Colan’s and James Kopaczewski’s article, “The New Wave of Digital Collections: Speculating on the Future of Library Curation” (2022).