URL Connection Keeps Rereading the same page

H

Hal Vaughan

I'm working on a program that will scan the names of old time radio shows at
Archive.org and give me a listing of the shows and later, a listing of the
episodes available. It's the Java version of a quick version I wrote in
Perl.

To get a list of the pages, I create a URL from the String form of the page
and add the parameters that are to be POSTed. Here's a typical URL for
this search:

<http://www.archive.org/search.php?page=2&query=collection:oldtimeradio&sort=title>

In this case I specify page=2, but the page starts at 1 and currently ends
at 25 (which would change if there were more search results). I check the
data that gets POSTed each time and the page number keeps increasing, but I
keep getting the first page in the search.

I have a loop that goes through each page and calls a method in another
class that creates the URLConnection, then uses a PrintWriter to post the
parameters. I know that's working because I do get the first page of the
search instead of a default search page. The only problem is that even
when I change the POST parameters, I still get page 1.

The URLConnection is made, as I said, with a method in a different class so
it should be creating new buffers and i/o streams each time it's called.

I've included the code of the method that does the URL connection work at
the end of this post. My guess is I'm not re-initializing something or
changing something each time through so it's likely a glaring error I've
just overlooked. I can include the loop that specifies which page, but
it's just a simple loop that keeps increasing the page number until it sees
there is no "Next" link on the page.

Thanks for any help on this!

Hal
-----------------
Code of URL Connection and Reading Method:

//webPage is a String defined elsewhere since the class keeps track
//of the data from the last page.
//formKeys is a HashMap subclass that works only with strings (created
//well before generics were introduced). It's just used to get key/value
//pairs to be used for parameters.

private String getURL(String pageURL, boolean postData) {
int x;
String sLine;
String[] formKeys = formData.keySet();
URL uPage;
URLConnection ucPage = null;
PrintWriter outPrint;
BufferedReader inRead;
StringBuffer sBuff;

sLine = "";
for (x = 0; x < formKeys.length; x++) {
sLine = sLine + formKeys[x] + "=" + formData.get(formKeys[x]);
if (x < formKeys.length - 1) {
sLine = sLine + "&";
}
}

try {
System.out.println("URL: " + pageURL);
uPage = new URL(pageURL);
ucPage = uPage.openConnection();
ucPage.setDoOutput(postData);
ucPage.setDoInput(true);
if (postData) {
outPrint = new PrintWriter(ucPage.getOutputStream());
System.out.println("\tOutgoing data: " + sLine);
outPrint.print(sLine);
outPrint.close();
outPrint.flush();
}
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
sBuff = new StringBuffer();
webPage = "";
while ((sLine = inRead.readLine()) != null) {
sBuff.append(sLine + "\n");
}
inRead.close();
webPage = sBuff.toString();
} catch (Exception e) {
webPage = "error: connection incomplete";
e.printStackTrace();
}
return webPage;
}
 
H

Hal Vaughan

I should add to this that when I try using the URLs I'm using in Java in
Firefox, it pulls up the correct page each time instead of only the same
page over and over.

Hal said:
I'm working on a program that will scan the names of old time radio shows
at Archive.org and give me a listing of the shows and later, a listing of
the
episodes available. It's the Java version of a quick version I wrote in
Perl.

To get a list of the pages, I create a URL from the String form of the
page
and add the parameters that are to be POSTed. Here's a typical URL for
this search:
In this case I specify page=2, but the page starts at 1 and currently ends
at 25 (which would change if there were more search results). I check the
data that gets POSTed each time and the page number keeps increasing, but
I keep getting the first page in the search.

I have a loop that goes through each page and calls a method in another
class that creates the URLConnection, then uses a PrintWriter to post the
parameters. I know that's working because I do get the first page of the
search instead of a default search page. The only problem is that even
when I change the POST parameters, I still get page 1.

The URLConnection is made, as I said, with a method in a different class
so it should be creating new buffers and i/o streams each time it's
called.

I've included the code of the method that does the URL connection work at
the end of this post. My guess is I'm not re-initializing something or
changing something each time through so it's likely a glaring error I've
just overlooked. I can include the loop that specifies which page, but
it's just a simple loop that keeps increasing the page number until it
sees there is no "Next" link on the page.

Thanks for any help on this!

Hal
-----------------
Code of URL Connection and Reading Method:

//webPage is a String defined elsewhere since the class keeps track
//of the data from the last page.
//formKeys is a HashMap subclass that works only with strings (created
//well before generics were introduced). It's just used to get key/value
//pairs to be used for parameters.

private String getURL(String pageURL, boolean postData) {
int x;
String sLine;
String[] formKeys = formData.keySet();
URL uPage;
URLConnection ucPage = null;
PrintWriter outPrint;
BufferedReader inRead;
StringBuffer sBuff;

sLine = "";
for (x = 0; x < formKeys.length; x++) {
sLine = sLine + formKeys[x] + "=" +
formData.get(formKeys[x]); if (x < formKeys.length - 1) {
sLine = sLine + "&";
}
}

try {
System.out.println("URL: " + pageURL);
uPage = new URL(pageURL);
ucPage = uPage.openConnection();
ucPage.setDoOutput(postData);
ucPage.setDoInput(true);
if (postData) {
outPrint = new
PrintWriter(ucPage.getOutputStream());
System.out.println("\tOutgoing data: " + sLine);
outPrint.print(sLine); outPrint.close();
outPrint.flush();
}
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
sBuff = new StringBuffer();
webPage = "";
while ((sLine = inRead.readLine()) != null) {
sBuff.append(sLine + "\n");
}
inRead.close();
webPage = sBuff.toString();
} catch (Exception e) {
webPage = "error: connection incomplete";
e.printStackTrace();
}
return webPage;
}
 
A

Andrew Thompson

Hal Vaughan wrote:
...
Code of URL Connection and Reading Method:

Care to make that an SSCCE that others can experiment with
easily? Or are you only interested in answers from people that
a) know the answer off the top of thier head.
b) would be willing to write in the 10-15 lines you could not
be bothered posting?

(I fit category "c) - give us an SSCCE, or stop wastin' our
bandwidth with code snippets".)

--
Andrew Thompson
http://www.physci.org/

Message posted via JavaKB.com
http://www.javakb.com/Uwe/Forums.aspx/java-general/200712/1
 
H

Hal Vaughan

Andrew said:
Hal Vaughan wrote:
..

Care to make that an SSCCE that others can experiment with
easily? Or are you only interested in answers from people that
a) know the answer off the top of thier head.
b) would be willing to write in the 10-15 lines you could not
be bothered posting?

(I fit category "c) - give us an SSCCE, or stop wastin' our
bandwidth with code snippets".)

I've often found that, due to a learning disability I'm not going to go into
details about, when I've gone over code a number of times and think I've
included ever step possible or gone over the details, there's a good chance
it's something quite obvious that and I essentially missed an association.
Many times that's the case and when that happens, someone who wants to help
instead of just give people a hard time (as I've noticed you like to do)
can often tell me what I've missed at a glance. It's not "common
sensical", but generally the more detailed the problem or the more obscure
the issue, the more likely it is I've found out what I was doing wrong.

Hal
 
S

Steven Simpson

Hal said:
To get a list of the pages, I create a URL from the String form of the page
and add the parameters that are to be POSTed.
Here's a typical URL for
this search:

<http://www.archive.org/search.php?page=2&query=collection:oldtimeradio&sort=title>

This appears to work, so why not just GET it instead of POSTing?
In this case I specify page=2, but the page starts at 1 and currently ends
at 25 (which would change if there were more search results). I check the
data that gets POSTed each time and the page number keeps increasing, but I
keep getting the first page in the search.

Maybe the server only expects POSTs from the initial search form, and so
assumes that 'page' is always '1'...?
 
H

Hal Vaughan

Steven said:
This appears to work, so why not just GET it instead of POSTing?

I had never tried using a GET with URLConnection before. I had always
thought there was a reason why the examples I had looked at were done the
way they were. I altered my routine so I could specify whether to append
the parameters and use the full URL as a GET or just use it as a POST and
now it's working fine. Since I could use the URLs in Firefox with no
problem, then you may be right. It may be expecting a GET instead.
Maybe the server only expects POSTs from the initial search form, and so
assumes that 'page' is always '1'...?

I guess, since the "regular" pages in the browser always have the parameters
in the URL, you may be right. It's working just fine now. I don't have
but so much time to work on it, but soon I'll have a program that makes it
easy to download a lot of old time radio shows without having to click on
links and wait for downloads to complete (their server only allows 2
downloads at a time).

Thanks for your help! I was sure it was something more or less obvious that
I just wasn't seeing. It's frustrating when that happens.

Hal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top