Using filepath method to identify an .html page

M

Michael Torrie

I'm sorry you are getting so frustrated. There's obviously a language
barrier here, but also your frustration is preventing you from thinking
clearly. You need to take a step back, breath, and re-read everything
that's been written to you on this thread. All your questions that can
be answered have been answered. If you are being paid to develop this,
then we don't want to do your work for you since we're not the ones
being paid. If you're doing this for a class assignment, then again we
don't want to do it for you because that would defeat the purpose of
your education. But if you're willing to learn, then I think the others
have said on this thread will help you learn it.

You can't learn Python by developing CGI scripts and running them on the
server. You need to try out snippets of code in an interactive way.
You've been told how to do this, and you don't need IDLE. Although
nothing prevents you from installing IDLE on your local machine. I hope
you have the python interpreter on your local workstation. If not,
download it and install it. You will need it. Use the python standard
library reference online (or download it). You will need it.

Why the hell

pin = int ( '/home/nikos/public_html/index.html' )

fails? because it has slashes in it?

That line fails because the string you passed it simply cannot be parsed
into a number. Just for simplicity's sake here, suppose we define a
number as any number of digits, 0-9, followed by a '.' or a ','
depending on locale, and some more digits, 0-9. This is very simplistic
but it will server our purpose for this example. So given that a number
is defined as above, we expect that we can parse the following strings:

int('123.433') == int(123.433) == 123

but '/home/nikos/public_html/index.html' contains nothing that is
recognizable as a number. There are no 0-9 digits in it, no periods or
commas. So rather than returning 0, which would be absolutely
incorrect, int() throws an exception because you've passed it a string
which does not contain a recognizable number.

If you really want to get a number to identify a string you'll have to
create a hash of some kind. You were on the right track with hashlib.
Except that int() again cannot work with the hash object because nothing
in the hash object's string representation looks like a number. If you
would follow the link you've already been given on the documentation for
hashlib you'd find that the object returned by md5 has methods you can
call to give you the hash in different forms, including a large number.
 
J

John Gordon

In said:
And the .html files are not even close 10.000

You said you wanted a 4-digit number. There are 10,000 different 4-digit
numbers.

0000
0001
0002
....
9999
 
M

Michael Torrie

a) I'am a reseller, i have unlimited ftp quota, hence database space

Space doesn't even come into the equation. There's virtually no
difference between a 4-digit number and a 100-character string. Yes
there is an absolute difference in storage space, but the difference is
so miniscule that there's no point even thinking about it. Especially
if you are dealing with less than a million database rows.
b) I'am feeling compelled to do it this way

Why? Who's compelling you? Your boss?
c) i DO NOT want to use BIG absolute paths to identify files, just
small numbers , shich they are easier to maintain.

No it won't be easier to maintain. I've done my share of web
development over the years. There's no difference between using a
string index and some form of number index. And if you have to go over
the database by hand, having a string is infinitely easier for your
brain to comprehend than a magic number. Now don't get me wrong. I've
done plenty of tables linked by index numbers, but it's certainly harder
to fix the data by hand since an index number only has meaning in the
context of a query with another table.
Your solution i know it works and i thank you very much for
providing it to me!

Can you help please on the errors that http://superhost.gr gives?

Sorry I cannot, since I don't have access to your site's source code, or
your database.
 
M

MRAB

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 6:11:20 μ.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

Either you are deliberately trolling, or you have a major
comprehension problem. Please go back and read, carefully, all the
remarks you've been offered in this thread. Feel free to ask for
clarification of anything that doesn't make sense, but be sure to read
all of it. You are asking something that is fundamentally
impossible[1]. There simply are not enough numbers to go around.

ChrisA

[1] Well, impossible in decimal. If you work in base 4294967296, you

could do what you want in four "digits".

Fundamentally impossible?
Yes.

Well....

OK: How about this in Perl:

$ cat testMD5.pl
use strict;

foreach my $url(qw@ /index.html /about/time.html @){
hashit($url);
}

sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;

foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash

}


which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547
That shortens the int to 4 digits.

A hash isn't guaranteed to be unique. A hash is an attempt to make an
int which is highly sensitive to a change in the data so that a small
change in the data will result in a different int. If the change is big
enough it _could_ give the same int, but the hope is that it probably
won't. (Ideally, if the hash has 4 decimal digits, you'd hope that the
chance of different data giving the same hash would be about 1 in
10000.)
 
M

Michael Torrie

which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547

Well do it the same with in python then. Just read the docs on the
hashlib so you know what kind of object it returns and how to call
methods on that object to return a big number that you can then do %
10000 on it. Note that your perl code is guaranteed to have collisions
in the final number generated.

If you're comfortable with perl, maybe you should use it rather than
fight a language that you are not comfortable with and not understanding.
 
D

Dave Angel

sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;

foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash

}


which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547

If you use that algorithm to get a 4 digit number, it'll look good for
the first few files. But if you try 100 files, you've got almost 40%
chance of a collision, and if you try 10001, you've got a 100% chance.


So is it really okay to reuse the same integer for different files?

I tried to help you when you were using the md5 algorithm. By using
enough digits/characters, you can cut the likelihood of a collision
quite small. But 4 digits, don't be ridiculous.
 
P

Peter Otten

Ferrous said:
Τη ΤÏίτη, 22 ΙανουαÏίου 2013 6:11:20 μ.μ. UTC+2, ο χÏήστης Chris Angelico
έγÏαψε:
all of it. You are asking something that is fundamentally
impossible[1]. There simply are not enough numbers to go around.
Fundamentally impossible?

Well....

OK: How about this in Perl:

$ cat testMD5.pl
use strict;

foreach my $url(qw@ /index.html /about/time.html @){
hashit($url);
}

sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;

foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash

}


which yields:
$ perl testMD5.pl
/index.html: 1066
/about/time.html: 1547

$ cat clashes.pl
use strict;

foreach my $url(qw@
/public/fails.html
/large/cannot.html
/number/being.html
/hope/already.html
/being/really.html
/index/breath.html
/can/although.html
@){
hashit($url);
}

sub hashit {
my $url=shift;
my @ltrs=split(//,$url);
my $hash = 0;

foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
}
printf "%s: %0.4d\n",$url,$hash

}
$ perl clashes.pl
/public/fails.html: 1743
/large/cannot.html: 1743
/number/being.html: 1743
/hope/already.html: 1743
/being/really.html: 1743
/index/breath.html: 1743
/can/although.html: 1743

Hm, I must be holding it wrong...
 
D

Dennis Lee Bieber

I cannot.
I don't even know yet if hashing needs to be used for what i need.

The only thing i know is that:

a) i only need to get a number out of string(being an absolute path)
b) That number needs to be unique, because "that" number is an indicator to the actual html file.

So you want to generate a number that is used to look up another
file?

How many files are you referring to here? Do you mean "indicator to
the actual html file" is "indicator to the COUNTER". Otherwise it is
just an indicator to the same path you started with.

How is this "indicator" to be used? Is it PART OF "the actual html
file"? Is it a key in a database used to retrieve "the actual html
file"?

Pseudo-code:

con = DB.connection()
cur = con.cursor()

cur.execute("select ID from PATHLINK where PATH = %s", thePath)
data = cur.fetchone() #there should only be ONE
if not data: #thePath doesn't exist, must be new file
cur.execute("insert into PATHLINK (PATH) values (%s)", thePath)
id = cur.lastrowid
else:
id = data[0]

con.commit()


Would you help me write this in python?

Why the hell

pin = int ( '/home/nikos/public_html/index.html' )

fails? because it has slashes in it?

It fails because it is NOT A DECIMAL INTEGER STRING
Traceback (most recent call last):
Traceback (most recent call last):

All something you could work out by running a Python interpreter in
a command shell and entering one-liner statements.

E:\UserData\Wulfraed\My Documents>python
ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
Python 2.7.2 (default, Jun 24 2011, 12:21:10) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
 
D

Dennis Lee Bieber

No, because i DO NOT WANT to store LOTS OF BIGS absolute paths in the database.
Why not? What is "BIG"...

10,000 paths of 255 characters is (presume ASCII 1-byte per
character) means you have 2,550,000 characters -- That's LESS THAN
THREE MB for all the file paths. Add in a 2-byte short integer ID and
you've got 20,000 bytes of IDs. Creating unique indices (ID should
already be a unique auto-increment column) double the data usage plus
maybe 160,000 bytes for the pointers from the index to the data record.

2,550,000 + 20,000 => 2,570,000 raw data
2,570,000 + 160,000 => 2,730,000 indices

2,570,000 + 2,730,000 => 5,300,000 5MB maximum

I could store all that on my ancient PDA!

We've probably generated that much text in the two discussion
threads alone!


The safest way to generate your four digit integer, without running
the risk of collision from hashing, is a simple database table with
unique ID column and unique filepath column.
 
S

Steven D'Aprano

No, because i DO NOT WANT to store LOTS OF BIGS absolute paths in the
database.

They are not big. They are tiny.

Please stop being too arrogant to listen to advice from people who have
been programming for years or decades.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 
S

Steven D'Aprano

I'm sorry you are getting so frustrated. There's obviously a language
barrier here,

I don't think there is. The OP's posts have been written in excellent
English.

I think we've been well and truly trolled, by somebody who even uses the
name of a troll as his user name.

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm


Thanks to Alan Spence for linking to the "Flame Warriors" web site. I
can't believe that it took so long for anyone to realise that we were
being trolled.

I hate to admit it, but I kind of have to admire somebody who can play
dumb so well for so long for the lulz. Well played Ferrous Cranus, well
played. Now please go and play your silly games elsewhere.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,073
Messages
2,570,539
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top