E-book format DJVU
E-book format DJVU

E-book format DJVU

Many document formats give you the opportunity to display information from books in electronic form as fully as possible. The djvu format stands out among other formats because it is capable of storing a huge amount of information at a comparatively insignificant “weight”. DjVu (or “deja vu” in French) is the ability to recognize scanned documents that contain large amounts of bulky tables, special characters, handwritten symbols, bulky formulas, pictures, diagrams, and other elements that mostly accompany scientific literature.

The technology of the djvu format was developed by At&t Labs at the end of the 20th century (1996-2001) through Jan Lekun, Patrick Heffner and Leon Bothoux. Thanks to the new format, several libraries specializing in scientific literature began their existence. At the moment, the djvu format is in great demand all over the world. It owes its popularity to a number of advantages over other formats, such as small size, full recognition of information overloaded texts, ability to customize the display, and much more.

Over the past five years, the Internet has become a recognized channel for the distribution of a variety of text and graphic information. Electronic newspapers and magazines have become as common as traditional ones. Many publications are published electronically sooner than on paper. This is aided by the widespread introduction of desktop publishing and the use of Adobe’s PDF format, which has become the de-facto standard for the distribution of electronic publications, including amateur radio charts, reference tables, etc. The archiving of graphic files with the possibility of their subsequent forwarding over the Net is used as an alternative to the PDF format. However, any uploading of archived graphics files, especially technical ones, takes a long time even on good communication channels. Before a file can be viewed it must be downloaded! Before unzipping a file, it is impossible to tell for sure if it is what you need or if it is a completely different file. You may not be satisfied with the image quality or completeness of the material, not to mention the volume and the time spent on downloading it. Anyone who has often had to scan black and white diagrams and send them over the Internet has probably noticed the relatively low compression ratio of information for files with such images. Software developers were looking for a way to increase the compression ratio of graphic information. And as a solution the DJVU format has been proposed.

DJVU format

The new graphic format DjVu, developed by AT&T, is primarily intended for placing scanned images on the Internet. These can be reference books, manuscripts, circuit diagrams of televisions, radios, amplifiers and other devices. The DjVu technology ensures compression of about 500:1 for files with black-and-white monochrome images. The average gain in file size in comparison with the GIF format is 20-fold. The essence of DjVu technology consists in automatic division of images into several sections (for example, text, company logo and bitmap picture) for each of which an optimum compression algorithm is chosen which is optimal for the given graphic image. The right to commercially use DjVu technology was sold to the company LizardTech. The new compression technology will solve the problem of publishing radio charts, blueprints and graphs on the Internet, which previously had too long download time. In order to be able to view radio charts in the new format, you need to install a special plug-in in Internet Explorer or Netscape Navigator. At the same time it is made in a very interesting way. The thing is that, unlike usual viewers, DjVu does not decrypt the entire compressed file, but only that part of it which is currently displayed. This allows you to view files of huge size and resolution even on very weak computers. Demonstrate these schemes plug-in can gradually – as you download: within a couple of seconds you can fully see the layout of the page, a couple of seconds later you can read the text, and after waiting a little longer – there are pictures. Of course, the web server has the same thing: first the text and then gradually the pictures, but what we see with DjVu is entirely graphics and not a combination of the recognized text and pictures! The DjVu format makes it possible to quickly view the material as it is opened and then decide whether it is worth saving or not. You can evaluate the content of the material right away and simply view it without saving the file on your computer. Considering that a page of black and white graphics with text in A4 format takes about 30 Kb in DjVu format and about 60 Kb in color, the savings in time and money are evident.

Compare PDF and DJVU

If we compare DjVu with the popular PDF format, there are advantages here as well – users usually have only Acrobat Reader installed. Uploading a file to Reader is possible only for viewing, without saving it. There is an option to “save the object as…”, but the file can only be viewed after a complete download, which is not very convenient. DjVu combines both of these features. If you use the free plag-in, you can view the file first, and then right-click to save it if you need to. Using the DjVu format means saving money while maintaining sufficient quality. Rather objective quality evaluation comparing to the already known formats shows that the insignificant quality loss in color images completely pays off with the degree of compression, and it is not noticeable even in black and white images. Possible competitors in the form of tiff, gif, jpg are losing out in volume. It seems that in the coming years this method will take its rightful place on the Internet. It can be used by radio amateurs to send schemes, photos, etc. and save a lot of time and money. The basic idea of the format, around which the other possibilities grew, is based on the notion that text and pictures are not equivalent parts of the document. For text, there are a large number of compression methods, and the percentage of compression information is quite high. But unfortunately these methods cannot archive graphics. A special technology is used in DjVu which separates all text from the scanned image and compresses it, preserving original quality. Pictures are converted to 100 dpi and subjected to wavelet compression (a very popular method nowadays used for on-line decompression of data). Additionally the background parts of the image are processed – so the image fragments that are simply invisible (for example, behind the pictures or behind the text) are excluded from the final file.

Each picture lends itself to some transformations designed to reduce the file size. First of all, it is divided into several layers from which it can then be reconstructed. And more often than not, it is divided into substrate, mask and foreground. A fairly simple algorithm is used for this: the raster file is viewed pixel by pixel. All light pixels are automatically attributed to the background and dark pixels to the mask or foreground. All pixels displayed on the screen get their color based on logical calculations based on the values of the corresponding colors from all the layers. This separation helps to compress the graphics in the most efficient way. The mask, which usually has only one color, is archived using a fax machine document compression method called JB2. The idea of such separation is not new, for the first time it was proposed by Xerox Corporation, which uses a similar approach in creating the XIFF format. We should separately mention the legality of publishing a scanned text on the site. Most likely, the legal solution to the problem is still a thing of the future.

DJVU Technologies

The DjVu format is based on several technologies developed at AT&T Labs. These are: the algorithm for separating text from the background in a scanned image, the IW44 wavelet algorithm for background compression, the powerful algorithm for black and white image compression JB2, the effective universal compression algorithm ZP, the algorithm for unpacking “on demand”, and the algorithm for “masking” images. The first four algorithms provide an extremely high compression ratio. A typical example is the conversion of a 25 MB TIFF file (A4 format, scanned with a 300 dpi color scanner) into an 80 kb DjVu file, with no loss of quality noticeable to the eye. For black and white images, the DjVu file size could be even smaller, about 30 kb. It is possible to further increase the compression ratio, up to a 1000:1 ratio, but the quality loss becomes rather noticeable. Thus, 15-20 high-quality images can be placed on a standard 1.44 Mb diskette. Additionally note that distortions introduced by wavelet compression are much less noticeable compared to distortions in JPEG files. Unpacking algorithms allow to show a part of an image without having to unpack the whole picture in RAM, as well as to easily scale the image. This allows you to quickly view the file even on a relatively weak machine – a computer with a 486 processor and 16 MB of RAM. Another interesting feature of the unpacking algorithms is the incremental restoration of the image. When viewed over the Internet, only the text is displayed first, then the low-resolution background, and only then the high-resolution background. This allows you to quickly evaluate a document without downloading the entire document. Separating the text from the background makes it extremely legible, especially if the text is printed on colored paper or placed over a picture. A separate viewing of the background is also possible, with the “masking” algorithm restoring those parts of the background that have been obscured by the text. Images without text can be converted into the IW4 format, which corresponds to the part of the DjVu format responsible for storing the background.

Features of the DJVU format

AT&T announced the further development of the format – DjVu 2.0. The new version provides for combining several images into one file, with the possibility of “flipping” pages as well as adding so-called “hot spots,” sections of the image serving as hyperlinks. For those who are not yet satisfied with the DjVu format as such, it is advisable to be patient and “download” the original image, although it is hardly advisable. But out of respect for the followers of traditional methods, our site will present the materials in two formats (GIF and DjVu), except for the cases when the original file was already in DjVu format. If the appropriate software is available, it is always possible to “decompress” DjVu into standard graphic formats (BMP, for example). However, the resulting files will be up to 30-40 Mb in size. Especially noteworthy is the possibility of navigating through the file with the help of hyperlinks. Links allow to navigate through a file, as well as to go to the address on the Internet. The possibility to put e.g. a book in DJVU format on separate pages which would reduce the size of the information transferred to the user because it is not necessary to transfer the entire file.

Software

There is a large amount of software for DJVU format. The most popular software is from LizardThech. In particular, DJVU Solo products and browser plugins. DJVU Solo allows developers to effortlessly create DJVU files. Solo supports all common image file formats (jpeg, gif, tiff, bmp). Solo allows you to make links for navigation in document and to different URLs. Any part of a page (image, table, chart) can become a link, and it can have any size and shape. Adding a page to an already created DJVU file is a simple matter of pasting and saving the document. Unlike most of the existing software Solo is more automated.

Plugins

LizardTech offers DjVu Browser Plug-in for Internet Explorer and Netscape Navigator to view the created files. The company also offers as an option DjVu iFilter plug-in for searching inside a DJVU document and ExpressView for PocketPC plug-in for viewing files on PocketPC.

Loading

FavoriteLoadingAdd to favorites
Spread the love

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.