Basic Page Structure

Introduction

The inventors of the Web wanted to make sure that their content could be read by machines and therefore focused much of their attention on the logical structure of the page. This is referred to as a "Semantic Structure". You will often hear people talking about "The Semantic Web" which is a continuation of that policy. Machines such as search robots and assistive software tools can only work properly if they have these clear, unambiguous, instructions. By following these semantic rules you are able to present your page information to the user in a logical format regardless of which platform or automated tool they use.

The main structure that these tools are looking for is as follows:

DOCTYPE - Which version of HTML/XHTML is used
TITLE - The title of the document
METADATA - Key information about the document
H1 - Main Heading of the page
H2 - Sub heading
H3 - Sub-sub heading
H4 - Sub-sub-sub heading
H5 - Sub-sub-sub-sub heading
H6 - Sub-sub-sub-sub-sub heading

Most web pages will only use the top two or three heading levels. Between each of the heading levels the automated software expects to find blocks of text (paragraphs) that expand upon (explain) the heading above. Each lower level of heading will provide more detail of the relevant topic until, at the lowest level (H6), we are dealing with the minutiae of the subject.

Doctype

The declaration is the very first thing in your document, even before the opening <html> tag. This declaration tells the browser which version of HTML or XHTML is being used. It is important for the browser or assistive software to know this so that it can interpret your code correctly.  The example of a !DOCTYPE declaration given below will tell the browser that the code used complies with the strict standard of HTML 4.01, english version. The declaration also tells the browser where to look if it needs to get an updated version of the standard.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

More examples of !DOCTYPE declarations can be found on the W3C web site. It is best not to use XHTML unless you are using an XML base, and understand the restrictions of XHTML, because it is less forgiving than standard HTML.

The Title Element

The Title of a page is the first thing that a search engine looks at. It is also the item that a browser publishes in its top border area and the first thing that a blind person hears when a new page opens. The TITLE is so important that it is embedded into the "head" section of the page code. Given this importance it is rather surprising that Google lists over 27 million pages that have the title "untitled document" and a further 2 million pages with the title "No Title". That's 29 million documents that search engines will not catalogue correctly!.

Most web authoring software asks the author to provide a title for each new document it creates. The W3C states that "every HTML document must have a TITLE element in the HEAD section." This title should be "context rich" (i.e. not a single word or phrase such as "Introduction" but rather "Introduction to Medieval Bee-Keeping") so that the visitor (and computer software) has a clear, initial impression of what the document is about. It is also important to only use the standard set of ASCII characters for this title. Avoid using characters like &, $, @ and ? that might be interpreted by programmes such as PHP as machine code. Also avoid using the | (bar) character (bottom left of keyboard), or the hyphen (minus) character to separate parts of the title. The | character is read out by screen readers as "bar" and the hyphen (-) character as "minus". These can cause confusion for blind people when they hear the title read out aloud. To separate phrases in the TITLE use a coma (,).

The title element should ideally be less than 64 characters in length. Titles that are longer than 64 characters may be truncated by the browser. Whilst there are some techniques for improving search engine rankings within a title, their overall impact is minimal. From an accessibility point of view it is more important that the title makes sense to the reader and clearly indicates the content of the document.

Avoid using the same word or phrase at the beginning of each page's title as it makes it harder for blind people to navigate a site if all the page titles start with the same phrase. "Ways to contact Mycompany" is also more user-friendly that "Mycompany, contact us".

The meta elements

The element provides meta-information about your page, such as descriptions and keywords for search engines. Assistive software can also be set to read some of these elements out for blind users to get a better idea of the page content. Please do not use these elements to attract search engines by using irrelevant key words or phrases (it won�t work and will confuse blind users).

Some meta elements that should be on every page include:

  • Specifying the content type and character coding such as -
  • Providing a description of the page contents using -
  • Specifying relevant keywords that might be useful to search engines

The meta element can also be used to set refresh rates so that the page reloads itself after a set time. Please do not use this refresh attribute as it causes real problems for many disabled users. If the user is slow to work down the page it may automatically refresh itself before the user gets to the bottom. The user is returned to the top of the page each time it refreshes. This can be very frustrating. If you are running a live news feed page that requires refreshing to get the latest news then provide a �refresh� button for the user to click on.

More details on meta data can be found on the W3C web site. at http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4

For the rest of this lesson we shall concentrate upon the body of the page as it is shown in the browser window.

Typical semantic structure of a page.

The correct use of headings makes it much easier for blind people to read and understand your web page. The underlying code allows screen readers to present the page in such a way as to allow the user to skip up and down the page picking out just the sections of interest in a similar fashion to that used by sighted users.

Screen readers load the page headings into a cached memory area so that they can be processed and presented to the user whenever required. To work reliably these headings need to be short and context rich. Try to limit headings to just one line and do not include images within the heading command. Even quite modern screen reader software can seize up if the memory cache gets overloaded with images. If your web site. causes problems that require the user to re-boot their computer they are most unlikely to return!

Headings

The heading elements structure the page into meaningful blocks of information. These help assistive software (screen readers) and search robots understand the structure of the page and what is important within the context of the page. For blind users these elements can be vital as their screen reader can list just the section headings and enable the user to skip to a particular section of interest. For sighted users the section headings stand out from the body text because they are formatted differently (using bold or large text for example). Blind users cannot see these stylistic attributes and therefore have to rely upon the underlying HTML code to identify section headings.

Every page must start with a single top level heading element. This top-level heading (H1) must reflect the overall content of the page and will complement the page explained earlier. For this reason only one top level heading is allowed per page.

Section headings and their respective subsection headings need to follow a "tree" structure so that the user knows where they are within the structure of the page (see below). In order to interpret the page correctly it is important to apply heading levels sequentially. If you skip a level of heading (for example from h2to h4 without an intervening h3>) you break the sequence and thus the logic of the content.

<head>
<title>Page Title</title>
</head>
<body>
<h1>Page Heading</h1>
- some introductory text -
<h2> First section heading</h2>
- paragraph or two of text -
<h2> Next section heading</h2>
- some text -
<h3>Subsection heading</h3>
- some text -
<h3>Next subsection</h3>
more text
<h2>Site Navigation</h2>
<ul>
<li>first link</li>
<li>second link</li>
<li>third link</li>
</ul>
</body>

 

Why sequencing headings helps blind users

Imagine that you are a blind user and you are interrupted whilst reading a document (or perhaps just not paying full attention at the time). You hear the screen reader read out aloud "heading level 3 i minus sensys 5360 ". This makes no sense to you. If you were sighted you could quickly scan the page to remind yourself of the context, there might be a picture nearby or you could see the page title at the top. A blind person only has his or her memory to help them put this heading into context. If the page has been constructed correctly a blind person can press a button and hear the preceding level heading (in this case a level 2 heading). They would hear the screen reader say "heading level 2 laser printers for home and business" and immediately remember that they were looking for a printer to buy as a present for their daughter going to University. Without the preceding heading they would have to cursor up and down the page listening to the surrounding text until something jogged their memory.

Paragraphs

The paragraph element surrounds a block of text that has a common theme or purpose within the context of the preceding heading. Ideally a paragraph should contain no more than four or five sentences so that the reader or listener has a chance to digest the content before proceeding to the next paragraph. As explained in lesson 2, each sentence should be short, perhaps no more than one or two lines long. As a result you should be aiming for paragraphs that are normally be between four and ten lines long.

Soft Line Breaks

If you need to create a special line ending within a paragraph you can use the
(soft break) element without breaking out of the paragraph block. This is particularly useful if you are writing a poem or song and want to make sure that each verse stays as a single paragraph. However the use of this soft return code is purely for styling, it has no effect on the semantics of the page. For this reason you should never use two consecutive soft returns to create a new paragraph. Screen readers and search engines do not stop for soft returns in the same way that they do for paragraph endings.