In this lesson
Page Semantics
Aim
To ensure that assistive software and robots understand the content and context of the page it is important to lay it out in a logical fashion.This lesson explains the importance of using appropriate structural (semantic) codes so that blind users can gain an initial overview of the page and its' structure. These semantic codes are also used by search robots, such as Google, and academic research tools, so their correct application helps all web users.
Basic Page Structure
Introduction
The inventors of the Web wanted to make sure that their content could be read by machines and therefore focused much of their attention on the logical structure of the page. This is referred to as a "Semantic Structure". You will often hear people talking about "The Semantic Web" which is a continuation of that policy. Machines such as search robots and assistive software tools can only work properly if they have these clear, unambiguous, instructions. By following these semantic rules you are able to present your page information to the user in a logical format regardless of which platform or automated tool they use.
The main structure that these tools are looking for is as follows:
DOCTYPE –Which version of HTML/XHTML is used
TITLE – The title of the document
METADATA - Key information about the document
H1 – Main Heading of the page
H2 – Sub heading
H3 – Sub-sub heading
H4 – Sub-sub-sub heading
H5 – Sub-sub-sub-sub heading
H6 – Sub-sub-sub-sub-sub heading
Most web pages will only use the top two or three heading levels. Between each of the heading levels the automated software expects to find blocks of text (paragraphs) that expand upon (explain) the heading above. Each lower level of heading will provide more detail of the relevant topic until, at the lowest level (H6), we are dealing with the minutiae of the subject.
Doctype
The <!DOCTYPE> declaration is the very first thing in your document, before the <html> tag. This declaration tells the browser which version of HTML or XHTML is being used. It is important for the browser or assistive software to know this so that it can interpret your code correctly. W3C recommends using XHTML as it is more “future proof”. The example of a !DOCTYPE declaration given below will tell the browser that the code used complies with the strict standard of HTML 4.01, english version. The declaration also tells the browser where to look if it needs to get an updated version of the standard.
Sample code follows<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
More examples of !DOCTYPE declarations can be found on the W3C web site. Some details about the difference between HTML and XHTML are given in lesson 12 (testing your web site.).
The <TITLE>
The Title of a page is the first thing that a search engine looks at. It is also the item that a browser publishes in its top border area and the first thing that a blind person hears when a new page opens. The TITLE is so important that it is embedded into the <head> section of the page code. Given this importance it is rather surprising that Google lists over 27 million pages that have the title “untitled document” and a further 2 million pages with the title “No Title”. That's 29 million documents that search engines will not catalogue correctly.
Most web authoring software asks the author to provide a title for each new document it creates. The W3C states that “every HTML document must have a TITLE element in the HEAD section.” This title should be “context rich” (i.e. not a single word or phrase such as “Introduction” but rather “Introduction to Medieval Bee-Keeping”) so that the visitor (and computer software) has a clear, initial impression of what the document is about. It is also important to only use the standard set of ASCII characters for this title. Avoid using characters like &, $, @ and ? that might be interpreted by programmes such as PHP as machine code. Also avoid using the | (bar) character (bottom left of keyboard), or the hyphen (minus) character to separate parts of the title. The | character is read out by screen readers as “bar” and the hyphen (-) character as “minus”. These can cause confusion for blind people when they hear the title read out aloud. To separate phrases in the TITLE use a coma (,).
The title element should ideally be less than 64 characters in length. Titles that are longer than 64 characters may be truncated by the browser. Whilst there are some techniques for improving search engine rankings within a title, their overall impact is minimal. From an accessibility point of view it is more important that the title makes sense to the reader and clearly indicates the content of the document.
Avoid using the same word or phrase at the beginning of each page’s title as it makes it harder for blind people to navigate a site if all the page titles start with the same phrase. “Ways to contact Mycompany” is also more user-friendly that “Mycompany, contact us”
The meta elements
The <meta> element provides meta-information about your page, such as descriptions and keywords for search engines. Assistive software can also be set to read some of these elements out for blind users to get a better idea of the page content. Please do not use these elements to attract search engines by using irrelevant key words or phrases (it won’t work and will confuse blind users).
Some meta elements that should be on every page include:
- Specifying the content type and character coding such as - <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
- Providing a description of the page contents using - <meta name="description" content="some descriptive text"/>
- Specifying relevant keywords that might be useful to search engines using - <meta name="keywords" content="individual, keywords, or phrases, separated by a comma"/>
- <meta name="abstract" content="Short abstract just two lines that search engines might display"/>
The meta element can also be used to set refresh rates so that the page reloads itself after a set time. Please do not use this refresh attribute as it causes real problems for many disabled users. If the user is slow to work down the page it may automatically refresh itself before the user gets to the bottom. The user is returned to the top of the page each time it refreshes. This can be very frustrating. If you are running a live news feed page that requires refreshing to get the latest news then provide a “refresh” button for the user to click on.
More details on meta data can be found on the W3C web site. at http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4
For the rest of this lesson we shall concentrate upon the body of the page as it is shown in the browser window.
Headings
The heading elements, <h1 to h6>, structure the page into meaningful blocks of information. These help assistive software (screen readers) and search robots understand the structure of the page and what is important within the context of the page. For blind users these elements can be vital as their screen reader can list just the section headings and enable the user to skip to a particular section of interest. For sighted users the section headings stand out from the body text because they are formatted differently (using bold or large text for example). Blind users cannot see these stylistic attributes and therefore have to rely upon the underlying HTML code to identify section headings.
Every page must start with a single <h1> element. This top-level heading (H1) must reflect the overall content of the page and will complement the page <TITLE> explained earlier. For this reason only one top level heading is allowed per page.
Section headings and their respective subsection headings need to follow a “tree" structure so that the user knows where they are within the structure of the page (see below). In order to interpret the page correctly it is important to apply heading levels sequentially. If you skip a level of heading (for example from <h2> to <h4> without an intervening <H3>) you break the sequence and thus the logic of the content.
<head> Typical semantic structure of a page. |
The correct use of headings makes it much easier for blind people to read and understand your web page. The underlying code allows screen readers to present the page in such a way as to allow the user to skip up and down the page picking out just the sections of interest in a similar fashion to that used by sighted users.
Screen readers load the page headings into a cached memory area so that they can be processed and presented to the user whenever required. To work reliably these headings need to be short and context rich. Try to limit headings to just one line and do not include images within the heading command. Even quite modern screen reader software can seize up if the memory cache gets overloaded with images. If your web site. causes problems that require the user to re-boot their computer they are most unlikely to return!
Why sequencing headings helps blind users
Imagine that you are a blind user and you are interrupted whilst reading a document (or perhaps just not paying full attention at the time). You hear the screen reader read out aloud "heading level 3 i minus sensys 5360 ". This makes no sense to you. If you were sighted you could quickly scan the page to remind yourself of the context, there might be a picture nearby or you could see the page title at the top. A blind person only has his or her memory to help them put this heading into context. If the page has been constructed correctly a blind person can press a button and hear the preceding level heading (in this case a level 2 heading). They would hear the screen reader say "heading level 2 laser printers for home and business" and immediately remember that they were looking for a printer to buy as a present for their daughter going to University. Without the preceding heading they would have to cursor up and down the page listening to the surrounding text until something jogged their memory.
Screen reader demonstration
The YouTube vdeo below illustrates the importance of using the HTML semantic structure. It shows a screen reader arriving at this page, selecting the headings and moving to the section using the list of HTML headings. The user hears the page title and the number of headings, he then presses a key to obtain the list of headings and is able to cursor down the list until he hears the heading he wants. By pressing the ENTER key he can jump straight to that section of the page.
Transcript
00:00 - "Internet explorer, Creating a semantic structure for the page . Creating a semantic structure for the web page, lesson 3, session 1 of 2"
00:15 - " Explains the appropriate structure left parenthesis or semantic sight parenthesis code so that blind users can obtain an initial overview of the page. One hundred percent. "
00:24 - "Page has seventeen headings and seven links"
00:29 - "Heading list dialogue, heading list view. Creating a semantic structure, colon one, one of seventeen. To move through list use arrow keys"
00:41 - "Page semantics, colon two, Aim, colon three. Basic page structure, colon 2. Introduction, colon 3. Doctype, colon 3. The title, colon three. The meta elements, colon 3. Headings, colon three. Why sequencing helps, screen demonstration colon four
00:57 - Why sequencing headings helps blind users colon three. Headings colon three. ENTER
01:02 - "Headings, heading level three"
01:05 - "Blank. The heading elements less h1 to h6 greater structure the page into meaningful blocks of information. These help assistive software left parent screen readers right parent and search"
01:19 - " robots understand the structure of the page and what is important within the context of the page. For blind users these elements can be vital as their screen"
01:25 - "reader can list just the section headings and enable the user to skip to a particular section of interest. For sighted users the section headings stand"
01:32 - "out from the body text because they are formatted differently left parent using bold or large text for example right parent. Blind users cannot see these stylistic attributes"
Because these heading codes (H1 to H6) are used to define the structure of the page they must never be used within a body of text as a formatting (styling) command.
Paragraphs
The paragraph element <p> surrounds a block of text that has a common theme or purpose within the context of the preceding heading. Ideally a paragraph should contain no more than four or five sentences so that the reader or listener has a chance to digest the content before proceeding to the next paragraph. As explained in lesson 2, each sentence should be short, perhaps no more than one or two lines long. As a result you should be aiming for paragraphs that are normally be between four and ten lines long.
Soft Line Breaks
If you need to create a special line ending within a paragraph you can use the <br/> (soft break) element without breaking out of the paragraph block. This is particularly useful if you are writing a poem or song and want to make sure that each verse stays as a single paragraph. However the use of this soft return code is purely for styling, it has no effect on the semantics of the page. For this reason you should never use two consecutive soft returns to create a new paragraph. Screen readers and search engines do not stop for soft returns in the same way that they do for paragraph endings.
In the next session (2) we will look at the list structure, then in session 3 we explore data tables.
The following guidelines are relevant to this session
Principle 1: Perceivable - Information and user interface components must be presentable to users in ways they can perceive
1.3.1 Info and Relationships: Information, structure, and relationships conveyed through presentation can be programmatically determined or are available in text. (Level A)
1.3.2 Meaningful Sequence: When the sequence in which content is presented affects its meaning, a correct reading sequence can be programmatically determined. (Level A)
Principle 2: Operable - User interface components and navigation must be operable
2.4.2 Page Titled: Web pages have titles that describe topic or purpose. (Level A)
2.4.6 Headings and Labels: Headings and labels describe topic or purpose. (Level AA)
2.4.10 Section Headings: Section headings are used to organize the content. (Level AAA)