I am trying to do my part in entering tags for my webpage. Basic
stuff such as
keywords
description
revisit-after
rating
Good on you! It's of doubtful usefulness, but not harmful. Some of
them (description) are more likely to appear in practice than others.
Some (ratings) may be especially useful in some narrow contexts.
Many of these meta- properties are most useful to yourself. You can
track "editorial" workload throught them, which can be as simple as
last-changed dates, or very sophisticated indeed (if that's worth
doing for your purposes).
There are several layers to the web metadata problem. Conveniently we
can think about it as 3 layers:
* encoding and syntax
* semantics
* content
* Encoding and syntax
Encoding can be done easily with the HTML <meta>, using syntax like
this:
<meta name=" ... property name ... " content=" ... property
value ... " >
This defines one property (of a potentially large number) as a name-
value pair. It doesn't fix what the name is (or what it means) and it
doesn't restrict the values. It's just a simple syntax for associating
a set of labelled strings with a web page.
This is all that HTML defines. It doesn't define any meanings beyond
that, that's not the job of this layer.
Syntax is pretty fluid, according to the rules of HTML. name="" is the
same as NAME="". Ordering doesn't matter either. The quotes will
usually be required: not always, but it's simplest to assume so and
just use them.
<meta name="" content="" /> with the closing slash is XML or XHTML
syntax and so is wrong for HTML use. It's not badly wrong and it won't
break much, but it's still wrong. The worst thing about it is that it
will confuse a HTML validator, which I hope you're using as part of
good editing practice anyway, and then confuse you with a misleading
error report.
<meta http-equiv="" ... > is something different. Don't use it, set
the HTTP headers properly instead. If you host on Apache (a good
idea), a .htaccess tutorial will explain better ways to do what you
might want. Just about the only good reason to use this is IF you're
using non-ASCII characters and you're writing "web pages" that will be
distributed on a CD-ROM etc without a web server. Then it MIGHT be
justifiable to use:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"
otherwise avoid these http-equiv hacks and send the proper header out,
properly.
Instead of <meta>, you could use RDFa instead. More powerful, more
complicated, less widely recognised. Stick with HTML <meta>.
* Semantics
Semantics map the "property-name" we placed in name="" and use it to
map the "property-value" onto something with meaning and data type. It
explains what the properties are describing. A property named
"description" can reasonably be assumed to be an unstructured human-
readable text string that describes the overall page.
We're limited to using a small number of "well-known" properties,
because their semantics are recognisable. Anything that cares is going
to know what a "description" means and how to use it. It's impractical
to create new property names, because nothing out there will recognise
them or know how to use them. This is a very common mistake! A
property of "film-review-rating" might be really useful on a film
review site, but who will recognise that it's there or what it means?
Is "4" a good film with 4/5 stars, or a bad film with 4/10 stars?
The lists of "well-known" properties are obscure. To quote the HTML
spec, "This specification does not list legal values for this
attribute." [1]
Some de facto meta properties are defined by their established common
usage. A reasonable list of these (which I'm not even going to try to
reference, as that would be pointless) might be:
description, keywords, author, copyright, date, robots
Note that these define the meaning for the property, but not the data
type. For date in particular this is a problem. Best practice is to
use the W3C-recommended ISO 8601 format[2], but RFC 822 is just as
likely to be encountered on other pages, as is the local format.
In general, these are the only <meta> metadata properties that are
likely to be widely recognised or processed further.
The W3C HTML does suggest a few values for <meta> properties, the
"link types", originally defined for <link >. [3] These have good
authority, but aren't particularly useful.
One set of properties that does have good authority and usefulness is
Dublin Core. [4] This defines a small core set of
"elements" ("properties" in our terminology) that cover most metadata
requirements for most resources, most of the time. Credible tools that
read any metadata are very likely to understand Dublin Core as well.
Dublin Core is particularly useful because of the refinement mechanism
for its terms[5], which supports both date.created and date.modified
(and others) and sub-classes of its core date element. Its "dumbing-
down" principle means that applications understand date but not the
more specific date.modified
Some specific target markets have vocabularies of metadata properties
that are highly important, even if limited in their application. UK
government-related work is likely to require eGMS [6][7], even though
this largely duplicates the superior Dublin Core. Anything wishing to
be indexed by ht://Dig would do well to use their vocabulary. [8]
Some bad and incorrent references to <meta> properties exist too. [9]
"revised" isn't likely to be recognised by anything. Page-refresh is a
HTTP header (and better done that way), not a <meta> property.
* Content
In the informal world of web page metadata, there is little definition
of content, the "value" part of the name value pairs. What there is is
expressed through the same "well known" vocabularies as the semantics.
Some of these, such as date, do define data typing.[2] Content-rating
schemes, such as PICS or RSAC may define quite complex structures.
Some Dublin Core elements use the scheme qualifier to indicate the
subject identifiers are from a particular vocabulary or taxonomy, such
as Dewey Decimal Classification, MeSH etc.
Where the property content expresses some copyright restriction, then
Creative Commons markup is useful.
Most other property value types though are merely string values, or
sometimes a URI, depending on the property. As with semantics,
metadata at this level relies on recognising the well-known standards,
not through referring to machine-readable schema languages to indicate
data type.
References
1. $B",(B The META element. W3C.
http://www.w3.org/TR/html4/struct/global.html#edef-META
2. $B",(B 2.0 2.1 W3C Date-time format.
http://www.w3.org/TR/NOTE-datetime
3. $B",(B 6.12 Link types. W3C.
http://www.w3.org/TR/html4/types.html#type-links
4. $B",(B Dublin Core.
http://dublincore.org/documents/dcq-html/
5. $B",(B Dublin Core Terms.
http://dublincore.org/documents/dcmi-terms/#H5
6. $B",(B eGMS (more readable, but unoffical). ESD.
http://www.esd.org.uk/standards/egms/
7. $B",(B eGMS V2.0. UK Government.
http://www.govtalk.gov.uk/schemasstandards/metadata_document.asp?docnum=768
8. $B",(B ht://Dig recognized META information in HTML documents.
http://www.htdig.org/meta.html
9. $B",(B HTML <meta> tag. W3 Schools.
http://www.w3schools.com/tags/tag_meta.asp