Сегодня мы решили остановиться на формате FictionBook2, более известном как FB2, и его «наследнике» FB3.
The emergence of the format
In the mid-1990s, enthusiasts began digitizing Soviet books. They translated and preserved literature in a variety of formats. One of the first libraries in Runet, the Maksim Moshkov Library, used a formatted text file (TXT).
It was chosen in its favor because of its resistance to byte corruption and versatility – TXT opens on any operating system. However, it made it difficult to process the stored text information. For example, to go to the thousandth line you had to process 999 lines in front of it. Books were also stored in “Wordovka” documents and PDF, the latter was difficult to convert into other formats, and weak computers opened and displayed PDF-documents with delays.
HTML was also used to “store” electronic literature. It simplified indexing, conversion to other formats, and document creation (marking up text with tags), but it brought its own drawbacks. One of the most significant was the “vagueness” of the standard: it allowed certain liberties in the writing of tags. Some of them had to be closed, others (such as
) did not have to be. Tags themselves could have any order of attachment.
While this kind of work with files was discouraged – these documents were considered invalid – the standard required readers to try and display the content. This is where the difficulties arose, because each application implemented the process of “second-guessing” in its own way. At the same time, the devices and reading apps available on the market at the time understood one or two specialized formats. If a book was available in one format, it had to be reformatted in order to be read. FictionBook2, or FB2, was called upon to solve all these shortcomings, and it took care of the primary “combing” of the text and conversion.
Note that the format had a first version – FictionBook1 – but it was only experimental in nature, was short-lived, is not supported today, and has no backward compatibility. Therefore, FictionBook is most often understood as its “successor” – the FB2 format.
FB2 was created by a group of developers headed by Dmitry Gribov, who is the technical director of LitRes, and Mikhail Matsnev, the creator of the Haali Reader. The format is based on XML, which is stricter than HTML and regulates work with unclosed and enclosed tags. An XML document is accompanied by what is called an XML schema. XML Schema is a special file that contains all the tags and describes the rules for their application (sequence, nesting, mandatory and optional, etc.). In FictionBook the schema is in the file FictionBook2.xsd. You can find an example of the XML schema here (it is used by the LitRes ebookstore).
Structure of an FB2 document
The text in the document is stored in special tags – elements of paragraph types:
, and . There is also a element which has no content and is used to insert skips.
All documents begin with a root tag below which , , and may appear.
The tag contains style sheets to ease conversion to other formats. In the are base64 encoded data that may be needed to render the document.
Element contains all the necessary information about the book: genre, list of authors (full name, email and web site), title, block with keywords, annotation. It can also contain information about changes made to the document and information about the publisher of the book, if it was published in hard copy.
This is what the part of the FictionBook entry for A Study in Crimson by Arthur Conan Doyle from Project Gutenberg looks like:
<?xml version="1.0" encoding="iso-8859-1"?> <FictionBook xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.gribuser.ru/xml/fictionbook/2.0"> <description> <title-info> <genre match="100">detective</genre> <author> <first-name>Arthur</first-name> <middle-name>Conan</middle-name> <last-name>Doyle</last-name> </author> <book-title>A Study in Scarlet</book-title> <annotation> </annotation> <date value="1887-01-01">1887</date> </title-info> </description>
The key component of the FictionBook document is . It contains the text of the book itself. There can be several of these tags throughout the document – additional blocks are used to hold footnotes, comments, and notes.
FictionBook also provides several tags to handle hyperlinks. They are based on the XLink specification, developed by the W3C consortium specifically for linking between different resources in XML documents.
Advantages of the format
The FB2 standard includes only the minimum necessary set of tags (sufficient to “design” fiction), which simplifies its processing by readers. Moreover, in the case of direct reader work with the FB format, the user has the ability to customize almost all display parameters.
The strict structure of the document allows to automate the process of conversion from FB format to any other format. The same structure makes it possible to work with individual document elements and adjust filters by author, title, genre and so on. For this reason, the FB2 format has become popular in Runet and has become the default standard in Russian electronic libraries and libraries in the CIS countries.
Disadvantages of the format
The simplicity of the FB2 format is both an advantage and a disadvantage. It limits the functionality for complex text layout (for example, notes in the margins). It has no vector graphics and no support for numbered lists. For this reason, the format is not very suitable for textbooks, reference books and technical literature (even the name of the format – fiction book, or “fiction book” – is indicative of this). At the same time, to display the book’s minimal information – the title, author and cover – the program needs to process practically the whole XML document. This is because the metadata is placed at the beginning of the text and the images at the end.
FB3 – format development
Due to the increased requirements for the formatting of book texts (and to level out some of the drawbacks of FB2), Gribov began work on the FB3 format. Later development stopped, but was resumed in 2014. According to the authors, they studied the real needs in publishing technical literature, looked at textbooks, reference books, manuals, and outlined a more specific set of tags that would allow to display any book. In the new specification, the FictionBook format is a zip archive that stores metadata, images, and text in separate files. The requirements for the zip file format and the conventions for organizing it are spelled out in ECMA-376, the standard that defines Open XML. Several formatting improvements were made (emphasis, underline) and a new object – “block” – was added which forms an arbitrary fragment of the book as a quadrangle and can be embedded in the text with an envelope. Support for numbered and bulleted lists appeared. FB3 is distributed under a free license and has open source code, so publishers and users have access to all the utilities: converters, cloud editors and readers. The current version of the format, reader and editor can be found in the project’s repository on GitHub. In general, FictionBook3 is still less common than its big brother, but several electronic libraries already offer books in this format. A couple of years ago, LitRes announced its intention to switch its entire catalog to the new format. Some readers already support all of the necessary functionality of FB3. For example, all modern models of ONYX readers, such as Darwin 3 or Cleopatra 3, can work with this format “out of the box”. A wider distribution of FictionBook3 will allow to form an ecosystem focused on full and efficient work with the text on any device with limited resources: black and white or a small display, low memory space, etc. According to the developers, a once-built book will be most usable in any environment.
Based on the blog of the company MacCenter
Add to favorites