Offline Catalogs and Feeds

This page tells you how to find and get Project Gutenberg eBooks if:

you want notifications as new books become available, or
you don’t want to use a browser to download eBooks but prefer other software like an ftp-client or wget, or
you are on a slow or limited internet connection, or
you’d rather have a handy book catalog to consult offline, or
you would like to make your own listings or derivatives from this information.

Contents

Feeds of new books
1. RSS
2. Email
3. Social media
4. OPDS
List of Sites Hosting Project Gutenberg EBooks
The GUTINDEX Listings of EBooks
1. GUTINDEX Listings by Year
Affiliate sites
Directory/Folder Listings
The Project Gutenberg Catalog Metadata in Machine-Readable Format

Feeds of new books

RSS

Find our RSS feed in the cache/feeds location. Updated daily after 2am U.S. Eastern time.

Email

The “posted” list is where every new eBook is announced as it is being uploaded to the Project Gutenberg servers. New books are then available for download, typically within 2 hours. The list has a once-daily digest option, and also online public archives.

News feeds of new eBooks:

OPDS

The Open Publiation Distribution System is a method Project Gutenberg makes available to discover new eBooks. OPDS is primarily intended for machine-to-machine communication for use in applications that can present or manage the lists of content.

To use Project Gutenberg’s OPDS feed, start at https://www.gutenberg.org/ebooks/search.opds/.

List of Sites Hosting Project Gutenberg EBooks

The Project Gutenberg collection is available from dozens of sites offering access via http/https, ftp, rsync, and a few other methods. See our listing of mirror sites to choose the location, access method, or speed. Mirrors generally do not have a friendly Web-based front end, but do have the collection. See the mirroring how-to for details.

The GUTINDEX Listings of EBooks

Updated at least monthly. These plain text files provide the basic information about each eBook, and are good for searching from your own system (for example, use control-F in a Web browser or word processor). They are the accession lists for Project Gutenberg. Note that these files are not recommended for automation (that is, to use as input to generate a computerized database). Instead, use one of the catalog files mentioned below.

GUTINDEX.ALL
GUTINDEX.zip (same as above zipped)

GUTINDEX Listings by Year

If GUTINDEX.ALL is too big for you or you prefer separate annual lists, you can download GUTINDEX files by year.

Affiliate sites

Not part of Project Gutenberg - check laws of the country where you are, before accessing or redistributing any eBooks.

GUTINDEX.AUS, Project Gutenberg of Australia

Directory/Folder Listings

You can navigate the directory/folder contents starting at /dirs, however this is not very user-friendly.

The Project Gutenberg Catalog Metadata in Machine-Readable Format

XML/RDF/CSV

All Project Gutenberg metadata are available digitally in the XML/RDF format. This is updated daily (other than the legacy format mentioned below). Please use one of these files as input to a database or other tools you may be developing, instead of crawling or roboting the website.

Note that the exact same metadata is available as a per-eBook .rdf file. These are found in the cache/epub (i.e., cache/generated) directory, accessible by mirroring or by the directory/folder listings above. The large XML/RDF file is simply a concatenation of all the per-eBook metadata.

Project Gutenberg metadata does not include the original print source publication date(s). Because Project Gutenberg eBooks are substantially different from the source book(s), we track the Project Gutenberg publication date (“release date”), but do not include print source information in the metadata. Differences almost always include dehyphenation, removing page headers/footers, changes to typography during markup, and sometimes relocation of images, footnotes, captions, etc. In addition, Project Gutenberg eBooks sometimes come from multiple print editions.

Many eBooks include scans of the title page or other pages, which may indicate original print publication. If matching a Project Gutenberg eBook to a particular print edition is important to you, it is likely this will need to be done by direct comparison of a print source with the eBook.

An Excel-compatible CSV spreadsheet of eBook metadata is also available here. This file is updated once a week.

MARC Records (MAchine Readable Cataloging)

MARC is the lingua franca of library catalogs. It is a way of representing data about items like books. The data about items is known as metadata. Thanks to a partnership with the Free Ebook Foundation, Project Gutenberg metadata are available as a downloadable MARC file.

Project Gutenberg’s MARC records, like the collection and most of its titles, are freely available. Libraries and others are encouraged to incorporate Project Gutenberg into their collections, in order to make Project Gutenberg’s titles more widely available at little or no cost.

Note that the MARC metadata include all of the textual titles, but not non-textual titles such as audio, maps, and data sets. Metadata for those other items are available in the XML/RDF metadata described elsewhere on this page.

To read more about the initiative to create the MARC records, including links to the software, issue reporting, and more, see this announcement from the Free Ebook Foundation.

Find the MARC records in the feeds collection. The MARC records are regenerated weekly on Sundays.

A Local, Browsable Copy on your own Computer or Mobile Device

Kiwix is an application that lets you download a large collection and use it locally. A copy of the Project Gutenberg content was made available in November 2018, and may be updated periodically.

All books as plain text

A zipped tar file of all .txt files, updated weekly: xt-files.tar.zip.