semantic markup

G

Gordon Freeman

I was just looking at a semantic markup scheme proposed by the major search
engines at schema.org, but ISTM there is a problem that if you insert the
tags they propose (see example below) then your pages will fail the W3C
validation since the attributes itemscope, itemtype, etc are not
recognised.

I note however that there are a number of other markup schemes around, some
of which reuse existing attributes like "class" or "rel" to avoid
validation problems. Of course this could lead to other problems depending
on how you name your CSS classes!

So I wondered if anyone is using semantic markup much and if so which type
and whether it leads to better search engine listings or other benefits?

schema.org example:
<div itemscope itemtype="http://schema.org/Movie">
<h1 itemprop="name">Avatar</h1>
<div itemprop="director" itemscope itemtype="http://schema.org/Person">
Director: <span itemprop="name">James Cameron</span> (born <span
itemprop="birthDate">August 16, 1954</span>)
</div>
<span itemprop="genre">Science fiction</span>
<a href="../movies/avatar-theatrical-trailer.html"
itemprop="trailer">Trailer</a>
</div>
 
A

Adrienne Boswell

Gordon Freeman said:
I was just looking at a semantic markup scheme proposed by the major
search engines at schema.org, but ISTM there is a problem that if you
insert the tags they propose (see example below) then your pages will
fail the W3C validation since the attributes itemscope, itemtype, etc
are not recognised.

I note however that there are a number of other markup schemes around,
some of which reuse existing attributes like "class" or "rel" to avoid
validation problems. Of course this could lead to other problems
depending on how you name your CSS classes!

So I wondered if anyone is using semantic markup much and if so which
type and whether it leads to better search engine listings or other
benefits?


The only thing I'm using is the hrecipe format on my cooking blog in my
signature below. It's a WordPress plugin, so it's pretty easy.
 
J

Jonathan N. Little

Adrienne said:
The only thing I'm using is the hrecipe format on my cooking blog in my
signature below. It's a WordPress plugin, so it's pretty easy.

<li><a class="active"
href="http://the-good-plate.com"><span><span>Home</span></span></a></li>

Why the double nested spans throughout?
 
J

Jukka K. Korpela

I was just looking at a semantic markup scheme proposed by the major search
engines at schema.org,

They have pretty much decided on it. The main reason for semantic markup
is search engines, so whatever they do becomes the norm
but ISTM there is a problem that if you insert the
tags they propose (see example below) then your pages will fail the W3C
validation since the attributes itemscope, itemtype, etc are not
recognised.

Validation is overrated, and it is formal. You can make pretty much
anything validated (provided you follow some very generic markup
conventions) if you write a suitable DTD. And in this case, you don't
even need to do that; just slap
<!doctype html>
at the start of your document, and the ghastly W3C Markup Validator
transmogrifies to a nice little HTML5 linter (though they still call it
a "validator"). And in HTML5, the itemscope etc. attributes are OK.
I note however that there are a number of other markup schemes around, some
of which reuse existing attributes like "class" or "rel" to avoid
validation problems. Of course this could lead to other problems depending
on how you name your CSS classes!

Indeed. And such problems were one of the reasons why the gods... the
search engines decided to make microdata the favored rite... method.
So I wondered if anyone is using semantic markup much and if so which type
and whether it leads to better search engine listings or other benefits?

Semantic markup surely works in many cases - some of the special Google
searches like recipe search or shopping search utilize semantic markup.
The bad news is that they only do this for large sites (and in a rather
US-centric way). They don't say this very explicitly, but neither do
they say explicitly, or at all, what you _really_ need to do to make
their systems recognize your site "semantically".
 
A

Adrienne Boswell

Jonathan N. Little said:
Why the double nested spans throughout?

You know, I don't know. Normally, I would be jumping up and down,
pointing fingers, screaming at the top of my lungs - but, hey, it's
WordPress, and it is what it is. One of these days, I'll delve into it
and figure out what doing what, and clean it up.
 
J

Jonathan N. Little

Adrienne said:
You know, I don't know. Normally, I would be jumping up and down,
pointing fingers, screaming at the top of my lungs - but, hey, it's
WordPress, and it is what it is. One of these days, I'll delve into it
and figure out what doing what, and clean it up.

One of my issues with frameworks. In this case the "blackbox" produces
something harmless, but when it a security issue...yikes!
 
G

Gus Richter

I was just looking at a semantic markup scheme proposed by the major search
engines at schema.org, but ISTM there is a problem that if you insert the
tags they propose (see example below) then your pages will fail the W3C
validation since the attributes itemscope, itemtype, etc are not
recognised.

Those attributes are provided with "Microdata" feature in HTML5. Use it,
it's a fait accompli.
So I wondered if anyone is using semantic markup much and if so which type
and whether it leads to better search engine listings or other benefits?

Search engines/web crawlers love assistance - be nice to them and
they'll be nice in turn. Web crawlers/Search engines love assistance
with semantic Markup, etc.:

HTML5 provides "outlines" with new elements <section>, <article>,
<nav> and <aside>.
HTML5 provides "Microdata" (also see schema.org) with the new <time>
element and its datetime attribute and attributes (item groups);
itemscope, itemtype, itemprop, itemid, itemref.
HTML5 provides "ARIA" semantic (role and aria-) attributes to aid Web
Accessibility e.g. role="main". (WAI-ARIA = Web Accessibility Initiative
- Accessible Rich Internet Applications)

"Section 508" of the Rehabilitation Act of 1973 (USA only and
excepting private web sites unless receiving federal funds or under
contract with a federal agency) is a law to provide usability and
accessibility by the visually impaired using assistive technology such
as screen readers and refreshable Braille displays. Caveat: Very
condensed - not an authority - read Section 508.

"Sitemap" (sitemaps.org) is an XML file that lists URLs for a site
along with additional metadata about each URL - what content you would
like indexed.
robots.txt file (robotstxt.org) or robots meta tag. To exclude content
(don't want indexed) from search engines.
 
J

Jukka K. Korpela

Search engines/web crawlers love assistance - be nice to them and
they'll be nice in turn.

That's a nice diplomatic way of expressing the situation. We could also
say that the state loves assistance - pay your taxes and they might not
put you in jail.
Web crawlers/Search engines love assistance
with semantic Markup, etc.:

However, they don't tell us what they really want and what they really
do. We're supposed to listen to abstract descriptions that may change
without notice and may or may not relate to the reality.
HTML5 provides "outlines" with new elements <section>, <article>, <nav>
and <aside>.

No search engine has been reported to pay the least attention to them.
We've only seen discussion statements about what _might be_ done.
HTML5 provides "Microdata" (also see schema.org) with the new <time>
element and its datetime attribute and attributes (item groups);
itemscope, itemtype, itemprop, itemid, itemref.

HTML5 is not the thing here. Schema.org is. But they're not really
telling us how widely they use microdata. Before spending the next month
adding microdata to all pages of yours, run a simple test with one page
(to see that nothing happens unless your site happens to be a large
company or community site that search engines appreciate).
HTML5 provides "ARIA" semantic (role and aria-) attributes

For some values of "HMTL5", but I don't see what this has to do with
crawlers or search engines. They could learn from ARIA semantics, but
they don't care, partly because only a small fraction of pages use ARIA
attributes, and who knows whether they do it properly?
"Sitemap" (sitemaps.org) is an XML file that lists URLs for a site along
with additional metadata about each URL - what content you would like
indexed.
robots.txt file (robotstxt.org) or robots meta tag. To exclude content
(don't want indexed) from search engines.

Neither of these is semantic markup.
 
G

Gus Richter

2012-01-08 5:04, Gus Richter wrote:

........... but I don't see what this has to do with
crawlers or search engines.

Neither of these is semantic markup.

The OP queried regarding "Semantic Markup" and "Search Engines" so I
tried in the preamble to extend from Semantic Markup with "etc." - in
any case, they all related and he can choose his poison.

"Outlines" and "Microdata" relate to Semantic Markup for Search Engines.
"ARIA" relates to Semantic Markup for Accessibility (as does Section 508
for USA).
"sitemaps.org" and "robotstxt.org" relate to Search Engines.

~~~~~~~~~~~~~~~

Re: HTML5 document Validation, I have come across an oddity.
WAI Validation (Cynthia) demands: <meta name=language
content="English">
HTML Validation, however, states that keyword language is not
registered and therefore rejects this META.
(The META in fact is redundant due to <html lang="en"> as per HTML5)
 
J

Jukka K. Korpela

The OP queried regarding "Semantic Markup" and "Search Engines" so I
tried in the preamble to extend from Semantic Markup with "etc."

Such as robots.txt? Such references to completely different topics just
confuse, and the issue does not really require any added confusion.
"Outlines" and "Microdata" relate to Semantic Markup for Search Engines.

Do you have any actual evidence of the effect of "outlines" on search
engines or anything? (Besides, they are structure rather than semantics
in the sense discussed here. Being a header group is part of structure
and does not say a word about the _meaning_ of header texts.)
Re: HTML5 document Validation, I have come across an oddity.

It's usually a good idea to start a new thread when you have a new
question. Language markup is tangentially related to semantic markup,
but if the issue you raise is essential, it would deserve a new heading
and a new thread.
WAI Validation (Cynthia)

Cynthia is fake and probably causes more harm than good.
demands: <meta name=language content="English">

That's an example of the bogosity of Cynthia. They've just invented
rules and made software that runs some checks against their rules.
HTML Validation, however, states that keyword language is not registered
and therefore rejects this META.

No, it's the HTML5 linter, called "validator", which is another
subjective checker, though much more useful and sensible.

The linter also tells you what to do to have your meta names registered,
but I wouldn't bother. Registering it would not help anyone and might
even hide part of the bogosity. The idea is not to register whatever
meta names someone makes up but to register names with well defined
meaning and relevant support in software in the sense that the meta
information is _used_ for something.

Besides, if Cynthia were under reasonable maintenance, the people
responsible for it would have done something to this if they have
evidence that the tag they require is of some use. But Cynthia is
apparently without maintenance - you might draw some conclusion from the
fact that it claims to check WCAG 1.0 conformance. WCAG 1.0 was
succeeded by WCAG 2.0 over two years ago.
(The META in fact is redundant due to <html lang="en"> as per HTML5)

The lang attribute is nothing new in HTML5, but it is indeed the way to
declare content language.
 
G

Gus Richter

2012-01-08 14:24, Gus Richter wrote:

Such as robots.txt? Such references to completely different topics just
confuse, and the issue does not really require any added confusion.

I showed how robots.txt relates. The OP was asking for methods to
achieve "better search engine listings or other benefits". Sorry to
cause you any confusion.
Do you have any actual evidence of the effect of "outlines" on search
engines or anything? (Besides, they are structure rather than semantics
in the sense discussed here. Being a header group is part of structure
and does not say a word about the _meaning_ of header texts.)

My tests have led me to believe that "outlines" achieve code bloat with
additional Semantic Markup and no apparent further gain except for
syndication in the case of <article>. Reading several sources, draws
attention to the importance for accessibility, but "aside from
accessibility, building good HTML5 outlines is also useful for SEO
(search engine optimization). A good outline can be read by a search
engine robot and make your page easier to scan and add to the index. And
pages that can be added more quickly get indexed better and that can
help you rank higher in the results."
<http://webdesign.about.com/od/html5tutorials/a/html5-outlines.htm>
 
J

Jukka K. Korpela

G

Gus Richter

No, the question was whether and how semantic markup achieves that.

And it was my way of saying "yes" and "that's how".
Do you have any actual evidence of the effect of "outlines" on search
engines or anything?
[...]
<http://webdesign.about.com/od/html5tutorials/a/html5-outlines.htm>

I think the quotation, which presents no actual evidence, was your way
of saying "No."

I think that using this approach to negate means that everything that
you have said in this thread is presented with no actual evidence and
was just so much hot air.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,734
Latest member
manin

Latest Threads

Top