Principia Discardia

September 24, 2007

dslibris: Preparing books for reading

Filed under: Projects — by Ray Haleblian @ 1:22 pm
Tags: , , , , ,

dslibris understands books stored as XHTML in UTF-8 encoding with numeric entities. The files must end with the extension ‘.xhtml’ or ‘.xht’ and be saved to the ‘book’ folder on your media.

Converting from HTML

Use HTML Tidy to clean up HTML and convert it to XHTML. An online Tidy service at http://infohound.net/tidy lets you upload an HTML file and get XHTML back.

If you’re using command line tidy, here’s an example:

tidy -asxhtml -utf8 -numeric -o book.xhtml book.html

Also, people have used these programs to save as XHTML:

  • Microsoft Word
  • Amaya
  • AbiWord
  • OpenOffice Writer

Converting from PDF

This generally doesn’t work since PDF formats are preformatted assuming a certain page size and so can’t be reliably converted to a form that will flow properly on the DS. If you’re willing to massage the text in a text editor after copying it out of a PDF you can sometimes get a reasonable result.

Converting from TXT

As with PDF, the lack of information for reformatting ASCII text files is a problem. If your source is from Project Gutenberg, the Gutenmark project provides programs for generating reasonable HTML from the ASCII text format files. That HTML can then go through Tidy as above.

Of course, those who can write HTML could rewrite text files into HTML.

What a pain! Is there relief in sight?

There are efforts afoot to provide Gutenberg texts in ePub format, and Feedbooks provides ePub material. ePub support is on my wish list for dslibris. Cross-platform tools for generating XHTML and ePub from other formats is also in the works.

If you’re having problems with converting a book or getting it to work, please post in the Help forum on Sourceforge:

https://sourceforge.net/forum/forum.php?forum_id=739965

Next Page »

Powered by WordPress.com