Hairy Sun

Matt's Blog on Some Geeky Topics

E-books: It's Just Text

I’m volunteering at my kid’s Elementary School to teach eight 45 minute classes. There are various suggested pre-canned courses that one can offer to teach. But I choose to teach a course of my own creation, How to Create E-books. So, I’m currently in the middle of teaching ten 3rd, 4th and 5th graders HTML and CSS. I’ve also taught them how to count in hex so they can use colors like the cool kids, but that is another post.

We aren’t using fancy tools in this class like desktop publishing systems or even word processors. We are using perhaps one of the poorest text editors around, notepad.

The Dirty Secret

Most e-books are just HTML. While this is probably obvious to technical people, most people don’t have a clue and don’t care. I find it interesting that we live in a world where 3 year olds know how to navigate a smart phone, yet most elementary kids have a very rudimentary understanding of computers and how to use them. Still, they can master the basic concepts of XML (or HTML) and CSS relatively quickly. Most if not all of my problems have been when they have to do tasks that I thought were would be straightforward. Things like save files, open them in a text editor, or understand why they have to have an extension when the file explorer hides it from them. Ok, maybe not completely straightforward, but enough to make me wonder if most computer labs are not just excuses for letting the teacher recharge while kids play online games.

Under the guise of teaching writing, I’m subverting their little minds to learn something deeper that I doubt they will have a chance to learn in any traditional class during their K-12 years. Once you understand HTML, you can create web pages, e-books, understand how word processors might work, and have a notion of encapsulation and the engineering principles of building on top of existing technology. The black box of the web should be a little more clear to them.

The Basics

The basic rules of XML are easy for a 3rd grader to understand.

  • There are start tags and end tags. A start tag is just enclosed in brackets. Here’s is a start tag for a paragraph, <p>. End tags have a / before the tag name. Here’s the corresponding end tag for a paragraph, </p>.
  • If you open a tag, close it. This is good:

      <p>A paragraph with <em>emphasized</em> text.</p>

    This is bad, because the <em> tag is not closed:

      <p>A paragraph with <em>emphasized text.</p>

  • Sometimes start tags and end tags are combined. For example an image tag normally looks like this :

      <img src='cow.jpg'/>

    The / before the > indicates that this image tag is closed.

  • Tags can have attributes. The src='cow.jpg' in the image tag above is an attribute. src is the name of the attribute, and in this case indicates that the value will be where the image where be found. The value of the attribute is always found following an = and quotes. Here’s another example might be:

      <p class="opening">Treat opening paragraphs differently.</p>

    The above attribute has the name of class and the value of opening. Class attributes are usually combined with css to style certain elements differently from those elements lacking the attribute.

And that’s about all there is. 3rd graders understand this in five minutes after you give them some examples and ask them some questions that make them think.

Digging In

Epub files are just zip files, so you can open them and peek around (if they lack DRM).

I examined a recent technical e-book from a well known publisher and it only contained 20 html tags:

  • a
  • body
  • code
  • dd
  • div
  • dl
  • dt
  • em
  • h1
  • h3
  • head
  • html
  • li
  • link
  • meta
  • p
  • span
  • strong
  • title
  • ul

Using a utility like one can learn all of these tags pretty quickly. Most ebook readers support basic HTML, but don’t try anything too fancy if you want to work across different platforms and devices. And that is probably sufficient for most texts that are meant for reading.

A challenge

If you are writing an e-book (or a website/blog) and have not ever created content by hand, give it a try. You don’t have to use notepad. You will realize that it is not black magic. And if you happen to view generated output from a fancy WYSIWYG editor, you will see all the cruft that they create. And you will learn why people make a living removing and cleaning up all of the cruft generated by such tools.