dashes in URLs

j · Jan 16, 2013

The preferred method of replacing spaces in URLs is with a dash for
SEO. So, a while back, I switched over from underscores.

Now, I have a product that naturally has a "-" in it. Not that I
couldn't see this coming, but I don't make the "rules".

What do I replace a - with? URL encoding is no help because %2D gets
converted back to a - before the software gets a hold of it.

And just why are "-" so great for SEO?

Jeff

Jukka K. Korpela · Jan 16, 2013

The preferred method of replacing spaces in URLs is with a dash for
SEO.

Virtually everything you have read about "SEO" is a mix of hearsay and
speculation. You seem to propagate the cargo cult tradition well.

So, a while back, I switched over from underscores.

And the odds are that you just destroyed much of your "SEO" rating if
you had any, because you probably did not redirect the old addresses.
Few sites do that, partly because it usually means a lot of work.

And you probably can't undo that. Once your old URLs have been removed,
as 404 Not Found, from search engine databases, you might be able to get
them back in some months, but they will most probably appear as new
URLs, so they start from the bottom. In "SEO", you can fake many things,
but not age.

Now, I have a product that naturally has a "-" in it. Not that I
couldn't see this coming, but I don't make the "rules".

What do I replace a - with?

How about the Unicode HYPHEN U+2010? Not that it would be such a great
idea, but it would be different from HYPHEN-MINUS U+002D, and this seems
to be the immediate requirement. And it will make typing the URL almost
mission impossible to most users, but who cares about such things when
you have SEO to believe in?

And just why are "-" so great for SEO?

You are asking that *now*?

j · Jan 16, 2013

Virtually everything you have read about "SEO" is a mix of hearsay and
speculation. You seem to propagate the cargo cult tradition well.

I have no opinion about SEO, it's not what I do. I was informed that SEO
order of preference is -,+ and then _ trailing badly. This was paid for
advice, not by myself, but by the site owner.

If you look at URLs for major sites you see that "-" is ubiquitous. It
is the way it is done.

And the odds are that you just destroyed much of your "SEO" rating if
you had any, because you probably did not redirect the old addresses.
Few sites do that, partly because it usually means a lot of work.

These were all set up as 301 in the .htaccess.

And you probably can't undo that. Once your old URLs have been removed,
as 404 Not Found, from search engine databases, you might be able to get
them back in some months, but they will most probably appear as new
URLs, so they start from the bottom. In "SEO", you can fake many things,
but not age.

I just checked the site on Google for some common keywords and it seemed
to do rather well. Not the top listing but page one and two.

How about the Unicode HYPHEN U+2010? Not that it would be such a great
idea, but it would be different from HYPHEN-MINUS U+002D, and this seems
to be the immediate requirement. And it will make typing the URL almost
mission impossible to most users, but who cares about such things when
you have SEO to believe in?

Few people type in URLs for products.

You are asking that *now*?

Because I'd like to know.

Jeff

Jukka K. Korpela · Jan 16, 2013

I have no opinion about SEO, it's not what I do. I was informed that SEO
order of preference is -,+ and then _ trailing badly. This was paid for
advice, not by myself, but by the site owner.

So it indeed is cargo cult "information".

If you look at URLs for major sites you see that "-" is ubiquitous. It
is the way it is done.

Using "-" is just a simple way. You can't use a space (except as
%-encoded, which looks bad), so what *can* you use?

Do you think Wikipedia is not a "major site"? Does it rank badly in
search engines? Anyway, they use underline characters "_", as in
http://en.wikipedia.org/wiki/Search_engine_optimization

These were all set up as 301 in the .htaccess.

Fine, so you did the wrong thing (useless change of URLs) the right way,
minimizing the damage. Most people don't do that. I know of rather few
major site reorganizations that handled the redirection right.

Because I'd like to know.

Your client or boss paid for the advice, and now you are asking for
better advice (that is, advice backed up with facts and arguments) for free.

Lewis · Jan 16, 2013

In message said:
The preferred method of replacing spaces in URLs is with a dash for
SEO. So, a while back, I switched over from underscores.

Now, I have a product that naturally has a "-" in it. Not that I
couldn't see this coming, but I don't make the "rules".

What do I replace a - with? URL encoding is no help because %2D gets
converted back to a - before the software gets a hold of it.

Why do you replace it at all?

And just why are "-" so great for SEO?

Because -'s are word boundaries and _'s are not.

Gene Wirchenko · Jan 16, 2013

On Wed, 16 Jan 2013 13:22:55 +0000 (UTC), Lewis

[snip]

Because -'s are word boundaries and _'s are not.

Huh?

In "co-ordinate", "-" is not a word boundary. "_" is often used
for emphasis, like this: _like this_. Can you show me a *word* that
contains "_"?

Sincerely,

Gene Wirchenko

Lewis · Jan 16, 2013

In message said:
On Wed, 16 Jan 2013 13:22:55 +0000 (UTC), Lewis

[snip]

Click to expand...

Because -'s are word boundaries and _'s are not.

Click to expand...

Huh?

Click to expand...

In "co-ordinate", "-" is not a word boundary. "_" is often used
for emphasis, like this: _like this_. Can you show me a *word* that
contains "_"?

That's nice. We're not talking about English, we are talking abut parsing URLS.

"This_is_one_word" to a computer, "this-is-not-one-word" to a computer.

Jonathan N. Little · Jan 16, 2013

Lewis said:
In message said:

On Wed, 16 Jan 2013 13:22:55 +0000 (UTC), Lewis

[snip]

Click to expand...

Because -'s are word boundaries and _'s are not.

Click to expand...

Huh?

Click to expand...

In "co-ordinate", "-" is not a word boundary. "_" is often used
for emphasis, like this: _like this_. Can you show me a *word* that
contains "_"?

Click to expand...

That's nice. We're not talking about English, we are talking abut parsing URLS.

"This_is_one_word" to a computer, "this-is-not-one-word" to a computer.

Please explain that assertion!

Jukka K. Korpela · Jan 16, 2013

"This_is_one_word" to a computer, "this-is-not-one-word" to a computer.

That depends on what the computer has been programmed to do. There is
little really reliable information of what search engines actually do.
Since they primarily deal with text in human languages, it is natural to
expect that "foo-bar" is taken as one word, though possibly as more or
less synonymous with the word pair "foo bar", because many languages
have hyphenated compound words. And since human languages do not use the
underline "_", what should search engines do with "foo_bar"?

But, fair enough, URLs may be an exception. Google says:

"Consider using punctuation in your URLs. The URL
http://www.example.com/green-dress.html is much more useful to us than
http://www.example.com/greendress.html. We recommend that you use
hyphens (-) instead of underscores (_) in your URLs."
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=76329

They don't say why, and they don't say that underscores do something bad
to search engines - even though this would be the right place to say
such things.

So, by all means, use foo-bar rather than foo_bar in URLs. But changing
existing URLs is normally a bad idea.

Jonathan N. Little · Jan 16, 2013

Jukka said:
They don't say why, and they don't say that underscores do something bad
to search engines - even though this would be the right place to say
such things.

I think the only reasons google and SEO "experts" advise dashes over
underscores is 1) with the HTML link underline decoration the underscore
is indistinguishable from a space, and 2) some newbies have no idea that
that SHIFT + HYPHEN key is an underscore!

Denis McMahon · Jan 17, 2013

The preferred method of replacing spaces in URLs is with a dash for SEO.

Do you:

1) Know this to be an absolute fact?

2) Think this because someone you paid for advice told you so?

3) Suspect this might be true because you heard it on the internet?

4) Believe this because you worked it out yourself based on something you
read somewhere?

5) Have some other basis for this statement (and if so, what)?

...........

And just why are "-" so great for SEO?

You tell us. You just stated it as if it were a solid fact.

Gene Wirchenko · Jan 17, 2013

In message said:
In message said:

On Wed, 16 Jan 2013 13:22:55 +0000 (UTC), Lewis

[snip]

Click to expand...

Because -'s are word boundaries and _'s are not.

Click to expand...

Huh?

Click to expand...

In "co-ordinate", "-" is not a word boundary. "_" is often used
for emphasis, like this: _like this_. Can you show me a *word* that
contains "_"?

Click to expand...

That's nice. We're not talking about English, we are talking abut parsing URLS.

Yes, I read the posts. URLs often have words in them, English
words.

"This_is_one_word" to a computer, "this-is-not-one-word" to a computer.

It depends on the language. For example, hyphens can be part of
variable names in COBOL as can underlines.

Sincerely,

Gene Wirchenko

j · Jan 17, 2013

On 1/16/2013 7:24 PM, Denis McMahon wrote:> On Wed, 16 Jan 2013 03:49:41

Do you:

1) Know this to be an absolute fact?

See below.

2) Think this because someone you paid for advice told you so?

This is paid for advice (not by me). It is from someone who has been in
the SEO business for years. I had no reason not to believe it and
assumed until this thread that it was so. The dash usage is widespread,
certainly it is preferred by many many webmasters.

My own preference is an underscore, but what do I know? I have in fact
modified the site software to use any delimiter specified, and I set the
replacement as requested.

3) Suspect this might be true because you heard it on the internet?
No.

4) Believe this because you worked it out yourself based on something you
read somewhere?

5) Have some other basis for this statement (and if so, what)?

You tell us. You just stated it as if it were a solid fact.

No, I said it was the "preferred method", and just looking about would
confirm that. Whether this actually affects SEO rankings I have not a clue.

In fact, I have seen no proof either way, either that dashes improve
ranking over underscores *or* that the don't. Do you have any proof?

I should probably care more about this, but whenever I have occasion to
post here, I lose desire to come back. What do we do here other than
argue over pointless arcanity? I just want to move on...

Lewis · Jan 17, 2013

In message said:
2013-01-16 22:52, Lewis wrote:

That depends on what the computer has been programmed to do. There is
little really reliable information of what search engines actually do.
Since they primarily deal with text in human languages, it is natural to
expect that "foo-bar" is taken as one word, though possibly as more or
less synonymous with the word pair "foo bar", because many languages
have hyphenated compound words. And since human languages do not use the
underline "_", what should search engines do with "foo_bar"?

The issue is that in computer languages, _ is not a word boundary
character, so the theory for SEO goes that "search-term" sill be parsed
properly and "search_term" will not.

I am not saying this is how Google works. It IS how AltaVista worked last century.

But, fair enough, URLs may be an exception. Google says:

"Consider using punctuation in your URLs. The URL
http://www.example.com/green-dress.html is much more useful to us than
http://www.example.com/greendress.html. We recommend that you use
hyphens (-) instead of underscores (_) in your URLs."

There you go.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=76329

They don't say why, and they don't say that underscores do something bad
to search engines - even though this would be the right place to say
such things.

So, by all means, use foo-bar rather than foo_bar in URLs. But changing
existing URLs is normally a bad idea.

That too.

Jukka K. Korpela · Jan 17, 2013

It depends on the language. For example, hyphens can be part of
variable names in COBOL as can underlines.

Or, to take a slightly more modern language, CSS. People often get
confused with hyphenated CSS identifiers. They have "white space: no
wrap" in their mind but can't remember whether it's "whitespace" or
"white-space", "nowrap" or "no-wrap", or maybe "no wrap" (two values, as
a CSS property may have) and then get frustrated when the correct answer
is the seemingly inconsistent "white-space: nowrap".

It's similar in human languages, too, especially in English - do I write
"half-brother", "half brother", or "halfbrother"? In searching and
indexing, such alternatives often need to be treated as more or less
equivalent, or at least similar, and it's easy to see that this is what
Google does with content.

j · Jan 17, 2013

2013-01-16 22:52, Lewis wrote:

[/QUOTE]

I am not saying this is how Google works. It IS how AltaVista worked last century.

There you go.

We have a winner!

Is Google's own recommendation Cargo Cult?

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=76329

That too.

As far as non database URLs, I've always locked the URL in at the time
of creation. So, no problem there.

That still leaves open the question of what to replace the "-" with in
product name URLs. For the moment, I am just forbidding their usage.
Although -dash- is looking tempting.

Jeff

Denis McMahon · Jan 17, 2013

The issue is that in computer languages, _ is not a word boundary
character, so the theory for SEO goes that "search-term" sill be parsed
properly and "search_term" will not.

Outside of PCRE, and possibly other forms of regex, is a "word boundary"
ever defined?

Technically in regex the word boundary is not the character, it is the
boundary between a non word character and a word character. From the PCRE
man page:

A word boundary is a position in the subject string where the
current character and the previous character do not both match
\w or \W (i.e. one matches \w and the other matches \W), or
the start or end of the string if the first or last character
matches \w, respectively.

Historically, the issue may be rooted in the fact that underscore has
been allowed in variable names for a long time, and when people developed
regex, one of their parsing requirements was computer code, so they
developed regex in which the underscore character was part of the "word"
characters rather than the "not word" characters.

I don't know if "underscore as word character" is a compile time switch
in PCRE or not, but even if not, I imagine that changing underscore from
being a word character to a non word character in the regex
implementation that any particular search engine uses is as simple as
editing the relevant header file and recompiling the regex library
object / dll file.

However, it is probably an even simpler exercise to change the definition
of a "word" in your regex from "\w+" (or "[[:word:]]+") to "[[:alnum:]]+"
if you want to include letters and digits but exclude underscores, or
"[[:alpha:]]" if you just want words comprising of upper and lower case
letters.

Jukka K. Korpela · Jan 17, 2013

Is Google's own recommendation Cargo Cult?

The cargo cult item here is the hearsay that "-" vs. "_" in URL
has an impact on search engine ranking. Google has not presented
any such statement, and the paid advice you referred to did not
apparently present any facts either.

Denis McMahon · Jan 17, 2013

We have a winner!

Is Google's own recommendation Cargo Cult?

Google's recommendation says nothing about SEO. Googles recommendations
seem to be more geared to being friendly for the crawler, which simply
grabs pages for the search engine to index, rather than making any
assertions about how the features discussed may affect page rankings.

Personally I would be very surprised if _ vs - in the url affected
pagerank at all!

Ben Bacarisse · Jan 17, 2013

j said:
That still leaves open the question of what to replace the "-" with in
product name URLs. For the moment, I am just forbidding their
usage. Although -dash- is looking tempting.

My gut reaction is to leave it alone. If it occurs in the middle of
product name, you'd get sensible-looking URLs. For a product called
"foo-bar":

foo-bar-details.html
foo-bar-summary.html

One case for writing it as "dash" might be if the product name uses the
character literally: for example, a smilie t-shirt with product name
"t-shirt

". Here, the two uses of "-" are different. The first I'd
leave alone, but the second might be rendered as a word:

t-shirt-colon-dash-close-bracket-details.html

but

t-shirt-smiley-details.html

is probably better overall. And there might considerations that apply
specifically in your product area. Do your customers (and others) talk
about the "dash x" and the "dash y" versions of a product, for example?
If so, there might be an argument for translating it into a word in the
URL.

I think the best answer depends on the role that the "-" plays in the
product name and it's hard to come up with a general answer.

Dealing with application names in a JEE web app	18	May 23, 2011
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011
Malware in Strawberry Perl v5.10.1.2	1	Nov 4, 2010
James Gosling the Creator of EMACS and JAVA - leaves ORACLE - Butthen reports started coming in of o	2	Jul 22, 2010
replacing a character in a string	7	Jun 10, 2008
Malware in Strawberry Perl v5.10.1.2	4	Nov 4, 2010
Find.find or Dir[*/] with spaces in filenames?	0	Jan 27, 2006

dashes in URLs

j

Jukka K. Korpela

j

Jukka K. Korpela

Lewis

Gene Wirchenko

Lewis

Jonathan N. Little

Jukka K. Korpela

Jonathan N. Little

Denis McMahon

Gene Wirchenko

j

Lewis

Jukka K. Korpela

j

Denis McMahon

Jukka K. Korpela

Denis McMahon

Ben Bacarisse

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads