Everything you did not want to know about Unicode in Python 3

S

Steven D'Aprano

You do not need any statements at all, copyright is automaticly assigned
to anything you create (at least that is the case in UK Law) although
proving the creation date my be difficult.

(1) In my lifetime, that wasn't always the case. Up until the 1970s or
thereabouts, you had to explicitly register anything you wanted
copyrighted, a much more sensible system which weeded out the meaningless
copyrights on economically worthless content. If we still had that
system, orphan works would be a lesser problem.

With the current system, all of us here are technically violating
copyright every time we reply to an email and quote more than a small
percentage of it. Not to mention all the mirror sites that violate
copyright by mirroring our posts in their entirety without permission.

(Author's moral rights not to be misquoted or plagiarised are a different
kettle of fish separate from their ownership rights over the work. That
should be automatic.)

(2) You don't have to just prove copyright. You also have to *identify*
who the work is copyrighted by, and it needs to be an identifiable legal
person (actual person or corporation), not necessarily the author. In the
absence of a statement otherwise, copyright is assumed to be held by the
author, but that's not always the case -- it might be a work for hire, or
copyright might have been transferred to another person or entity. Or the
author is unidentifiable. Hence the orphan work problem: it's presumed to
be copyrighted, but since nobody knows who owns the copyright, there's no
way to get permission to copy that work. It might as well be lost, even
when the original is sitting right there in front of you mouldering away.
 
C

Chris Angelico

With the current system, all of us here are technically violating
copyright every time we reply to an email and quote more than a small
percentage of it.

Oh wow... so when someone quotes heaps of text without trimming, and
adding blank lines, we can complain that it's a copyright violation -
reproducing our work with unauthorized modifications and without
permission...

I never thought of it like that.

ChrisA
 
S

Steven D'Aprano

Because Python 3 presents stdin and stdout as text streams however, it
makes them more difficult to use with binary data, which is why Armin
sets up all that extra code to make sure his file objects are binary.

What surprises me is how hard that is. Surely there's a simpler way to
open stdin and stdout in binary mode? If not, there ought to be.
 
D

Dave Angel

You've never needed to copyright something? Copyright © Roy Smith 2014...
I know some people use (c) instead, but that actually has no legal
standing. (Not that any reasonable judge would invalidate a copyright
based on a technicality like that, not these days.)


(c) has no standing whatsoever, as it's properly spelled (copr)
 
W

wxjmfauth

Le mardi 13 mai 2014 10:08:45 UTC+2, Johannes Bauer a écrit :
He's correct about file name encodings. Which can be fixed really easily

wihtout messing everything up (sys.argv binary variant, open accepting

binary filenames). But that he suggests that Go would be superior:






Is just a horrible idea. An obviously horrible idea, too.



Having dealt with the UTF-8 problems on Python2 I can safely say that I

never, never ever want to go back to that freaky hell. If I deal with

strings, I want to be able to sanely manipulate them and I want to be

sure that after manipulation they're still valid strings. Manipulating

the bytes representation of unicode data just doesn't work.



And I'm very very glad that some people felt the same way and

implemented a sane, consistent way of dealing with Unicode in Python3.

It's one of the reasons why I switched to Py3 very early and I love it.



Cheers,

Johannes



--



Ah, der neueste und bis heute genialste Streich unsere großen

Kosmologen: Die Geheim-Vorhersage.

- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>

===========

A Rob 'Commander' Pike will never put utf16 and
ebcdic in the same basket, when discussing coding
of characters.

jmf
 
C

Chris Angelico

I think I could make a very strong case that anything sent to a public
forum with the intention of being broadcast has been placed into the
public domain by this action.

I don't think so. One can reasonably assume that anything sent to a
public forum is permissible to read, and to copy verbatim (although
there may be "presumed limits" on the copying, but probably not with
python-list). But if I quote your text and edit it, then you would
rightly complain, which is not the case with public domain text. The
question is whether or not it's fair to try to scare people with that
when they repeatedly use buggy software that inserts blank lines
everywhere :)

In case it's not obvious, I am NOT seriously contemplating pursuing
anything like this legally. It's just funny to contemplate.

ChrisA
 
I

Ian Kelly

Oh wow... so when someone quotes heaps of text without trimming, and
adding blank lines, we can complain that it's a copyright violation -
reproducing our work with unauthorized modifications and without
permission...

I never thought of it like that.

I'd be surprised if this doesn't fall under fair use.
 
R

Robin Becker

On 13/05/2014 17:08, Ian Kelly wrote:
..........
And since it's so simple, it shouldn't be hard to see that the use of
the shutil module has nothing to do with the Unicode woes here. The
crux of the issue is that a general-purpose command like cat typically
can't know the encoding of its input and can't assume anything about
it. In fact, there may not even be an encoding; cat can be used with
binary data. The only non-destructive approach then is to copy the
binary data straight from the source to the destination with no
decoding steps at all, and trust the user to ensure that the
destination will be able to accommodate the source encoding. Because
Python 3 presents stdin and stdout as text streams however, it makes
them more difficult to use with binary data, which is why Armin sets
up all that extra code to make sure his file objects are binary.
Doesn't this issue also come up wherever bytes are being read ie in sockets,
pipe file handles etc? Some sources may have well defined encodings and so allow
use of unicode strings but surely not all. I imagine all of the problems
associated with a broken encoding promise for stdin can also occur with sockets
& other sources ie error messages failing to be printable etc etc. Since bytes
in Python 3 are not equivalent to the old str (Python 3 bytes != Python 2 str)
using bytes everywhere has its own problems.
 
I

Ian Kelly

Doesn't this issue also come up wherever bytes are being read ie in sockets,
pipe file handles etc? Some sources may have well defined encodings and so
allow use of unicode strings but surely not all. I imagine all of the
problems associated with a broken encoding promise for stdin can also occur
with sockets & other sources ie error messages failing to be printable etc
etc. Since bytes in Python 3 are not equivalent to the old str (Python 3
bytes != Python 2 str) using bytes everywhere has its own problems.

Sockets send and receive bytes, and pipes created by the subprocess
module are opened in binary mode. Pipes inherited as stdin are still
assumed to be unicode, though.
 
G

Grant Edwards

I think I could make a very strong case that anything sent to a public
forum with the intention of being broadcast has been placed into the
public domain by this action.

At least in the US, there doesn't seem to be such a thing as "placing
a work into the public domain". The copyright holder can transfer
ownershipt to soembody else, but there is no "public domain" to which
ownership can be trasferred. IIRC, there is a way under Germain
copyright law to release certain rights. The mere act of widely
widely distributing something does not in any way relinquish
copyrights.
 
S

Steven D'Aprano

At least in the US, there doesn't seem to be such a thing as "placing a
work into the public domain". The copyright holder can transfer
ownershipt to soembody else, but there is no "public domain" to which
ownership can be trasferred.

That's factually incorrect. In the US, sufficiently old works, or works
of a certain age that were not explicitly registered for copyright, are
in the public domain. Under a wide range of circumstances, works created
by the federal government go immediately into the public domain.

It is true that under the Mickey Mouse Copyright Grab Act[1] of <insert
years here>, every time Mickey Mouse is about to reach the end of
copyright, Congress retroactively extends copyright terms for another few
decades, but that's another story.




[1] Not the real name of the act.
 
M

Marko Rauhamaa

Steven D'Aprano said:
That's factually incorrect. In the US, sufficiently old works, or works
of a certain age that were not explicitly registered for copyright, are
in the public domain. Under a wide range of circumstances, works created
by the federal government go immediately into the public domain.

Steven, you're not disputing Grant. I am. The sole copyright holder can
simply state: "this work is in the Public Domain," or: "all rights
relinquished," or some such. Ultimately, everything is decided by the
courts, of course.


Marko
 
M

Mark Lawrence

The sole copyright holder can
simply state: "this work is in the Public Domain," or: "all rights
relinquished," or some such. Ultimately, everything is decided by the
courts, of course.

For examples see all the Python PEPs.
 
R

Robert Kern

That's factually incorrect. In the US, sufficiently old works, or works
of a certain age that were not explicitly registered for copyright, are
in the public domain. Under a wide range of circumstances, works created
by the federal government go immediately into the public domain.

There is such a thing as the public domain in the US, and there are works in it,
but there isn't really such a thing as "placing a work" there voluntarily, as
Grant says. A work either is or isn't in the public domain. The author has no
choice in the matter.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
C

Chris Angelico

There is such a thing as the public domain in the US, and there are works in
it, but there isn't really such a thing as "placing a work" there
voluntarily, as Grant says. A work either is or isn't in the public domain.
The author has no choice in the matter.

Then what's copyright status on PEPs?

The nearest thing to "assigning to public domain" that works across
legislatures is probably CC0:

http://creativecommons.org/about/cc0

ChrisA
 
R

Robert Kern

Steven, you're not disputing Grant. I am. The sole copyright holder can
simply state: "this work is in the Public Domain," or: "all rights
relinquished," or some such. Ultimately, everything is decided by the
courts, of course.

One can state many things, but that doesn't mean they have legal effect. The US
Code has provisions for how works become copyrighted automatically, how they
leave copyright automatically at the end of specific time periods, how some
works automatically enter the public domain on their creation (i.e. works of the
US federal government), but has nothing at all for how a private creator can
voluntarily place their work into the public domain when it would otherwise not
be. It used to, but does not any more.

For a private individual to say about a work they just created that "this work
is in the Public Domain" is, under US law, merely an erroneous statement of
fact, not a speech act that effects a change in the legal status of the work.
For another example of this distinction, saying "I am married" when I have not
applied for, received, and solemnified a valid marriage license is just an
erroneous statement of fact and does not make me legally married.

Relinquishing your rights can have some effect, but not all rights can be
relinquished, and this is not the same as putting your work into the public
domain. Among other things, your heirs can sometimes reclaim those rights in
some circumstances if you are not careful (and if they are valuable enough to
bother reclaiming).

If you wish to do something like this, I highly recommend (though IANAL and
TINLA) using the CC0 Waiver from Creative Commons. It has thorough legalese for
relinquishing all the rights that one can relinquish for the maximum terms that
one can do so in as many jurisdictions as possible and acts as a license to
use/distribute/etc. without restriction even if some rights cannot be
relinquished. Even if US law were to change to provide for dedicating works to
the public domain, I would probably still use the CC0 anyways to account for the
high variability in how different jurisdictions around the world treat their own
public domains.

http://creativecommons.org/about/cc0
http://wiki.creativecommons.org/CC0_FAQ

Note how they distinguish the CC0 Waiver from their Public Domain Mark: the
Public Domain Mark is just a label for things that are known to be free of
copyright worldwide but does not make a work so. The CC0 *does* have an
operative effect that is substantially similar to the work being in the public
domain.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,562
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top