Keep it Clean: Your Blog and Clean HTML

I frequently work with content that gets moved from one place to another. Sometimes I get guest posts here on Web Teacher that have to be formatted from one source or another into WordPress. Most often, however, I work with content from a BlogHer network blogger who is putting content from their own blog on BlogHer.com.

Sometimes it can take me as long as a hour to clean up the HTML that gets pasted in from one blog to another. It is possible to write clean HTML in a blog, but it doesn’t always happen.

Why is Clean HTML Desirable?

  • It moves easily from one place to another and looks good in both places.
  • It doesn’t have a lot of inline styles that worked on one blog, but don’t make sense anywhere else.
  • It fits right in with the look and feel of its new location, because the HTML is uncluttered with presentational information.

What is Clean HTML?

Clean HTML is the bare essentials. It is content with just the tags that format it as headings, paragraphs, lists, etc. There isn’t any style information added to the tags about alignment or margins or text colors. Here is an example of clean HTML.

clean html

Clean HTML is content formatted with an appropriate HTML tag that describes the content. The example above has some paragraphs, a couple of headings, and some links. The tags (p, h3, a) describe exactly what the content is semantically. There is nothing added to the HTML that affects alignment, spacing or anything that is related to the appearance of the content. There’s another name for clean HTML. It’s called POSH, or plain old semantic HTML.

Here’s an example of HTML that is not clean or semantic.

not clean

In this example, the tags do not describe the content. A div is a generic container, so we don’t know if what is enclosed in the div is a paragraph or something else. The style rule in every div should not be in the HTML at all. Style rules belong in the style sheet, not the HTML. When you move that blog post with its added style rules to another blog, the styles in the HTML may be completely inappropriate in the new location. It’s okay to have a class assigned to an element in your HTML, it that class is really needed. In this example, I don’t think it is. Both the class and the style rules could be eliminated by formatting the text properly as paragraphs.

How Can a Blogger Keep the HTML Clean?

A big part of it is what you touch and what you don’t touch.

I’ve outline items you do touch regularly in WordPress in this image. Other blogs look approximately the same.

do touch

  • Use the media icons to upload media.
  • Use the B icon to create <strong> tags for strong emphasis. This normally makes the text display in bold and somewhat larger. Just because it’s bold and a little larger that does NOT make it a heading. It only makes it have strong emphasis.
  • Use the I icon to create <em> tags for emphasis. This normally makes the text display in italics. Just because it’s in italics that does NOT make it a book or newspaper title, nor does it make it represent a foreign language. It only makes it emphasized. See the note on bold and italic in the next section.
  • Use the Paragraph menu to format paragraphs. The pull down menu by the Paragraph also allows you to create headings from h1 to h6. Something formatted with an h1, h2, h3, etc. is a real heading.
  • Use the list icons to create lists. You can make something that looks like a list out of a paragraph with line breaks, but it’s not a list – it’s a paragraph.
  • Use the link icons to create links.
  • Use the quote icon to create a blockquote.

Touching the other icons, such as the alignment icons or the color and size icons generally add unwanted style rules to your HTML.

The indent and outdent icons will create a blockquote if you apply it to text. (Use the quote icon if you want a blockquote.) The indent and outdent icons are for making nested lists.

What if I Actually want Bold or Italic?

There are tags that create text that is either bold or italic. They are the <b> and <i> tag sets. If you actually want something to be bold instead of <strong> or italic instead of <em> you can do it in the HTML pane of your blog. The HTML button is in the upper right corner of your blog post window. When you click it you see the HTML you’re creating.

HTML pane

You can type anything you want in this window. HTML works with opening and closing tags that turn formatting on and off. So <b> says start bold here. Then </b> says stop the bold here.

Suppose you wrote this sentence:

I love my dog, Buster.

If you want the word Buster to be bold, click into the HTML pane, find the word Buster, and put the; tags around it, like this:

I love my dog <b>Buster</b>.

You can do the same with book titles or foreign words using the <i> tag.

How Often are you “Fixing” the Appearance of your Posts?

If you find yourself fussing around with margins or borders around images, or the colors of text, or the alignment of text and images, or the spacing between things each and every time you enter something in a blog post, then you are muddying up the HTML. Appearance should be taken care of in the style sheet. The blog post window should be used only to enter content. All those things you fuss with can be in the style sheet so they can apply every time you insert an image or create a heading.

I’m not going to explain how to modify your CSS rules in this post, but I may write a separate post about it at some point. I did talk about how to hunt down the rule you need to modify in the CSS in a presentation from 2009.

Less fussing with appearance is really the key. Making your style sheet do all the fussing for you is the goal. Then you won’t need to do anything with your post but add the content and mark it up with the proper formatting using semantic tags for headings, paragraphs, lists, images, and blockquotes.