H
Hal Vaughan
I'm working on a program that will scan the names of old time radio shows at
Archive.org and give me a listing of the shows and later, a listing of the
episodes available. It's the Java version of a quick version I wrote in
Perl.
To get a list of the pages, I create a URL from the String form of the page
and add the parameters that are to be POSTed. Here's a typical URL for
this search:
<http://www.archive.org/search.php?page=2&query=collection:oldtimeradio&sort=title>
In this case I specify page=2, but the page starts at 1 and currently ends
at 25 (which would change if there were more search results). I check the
data that gets POSTed each time and the page number keeps increasing, but I
keep getting the first page in the search.
I have a loop that goes through each page and calls a method in another
class that creates the URLConnection, then uses a PrintWriter to post the
parameters. I know that's working because I do get the first page of the
search instead of a default search page. The only problem is that even
when I change the POST parameters, I still get page 1.
The URLConnection is made, as I said, with a method in a different class so
it should be creating new buffers and i/o streams each time it's called.
I've included the code of the method that does the URL connection work at
the end of this post. My guess is I'm not re-initializing something or
changing something each time through so it's likely a glaring error I've
just overlooked. I can include the loop that specifies which page, but
it's just a simple loop that keeps increasing the page number until it sees
there is no "Next" link on the page.
Thanks for any help on this!
Hal
-----------------
Code of URL Connection and Reading Method:
//webPage is a String defined elsewhere since the class keeps track
//of the data from the last page.
//formKeys is a HashMap subclass that works only with strings (created
//well before generics were introduced). It's just used to get key/value
//pairs to be used for parameters.
private String getURL(String pageURL, boolean postData) {
int x;
String sLine;
String[] formKeys = formData.keySet();
URL uPage;
URLConnection ucPage = null;
PrintWriter outPrint;
BufferedReader inRead;
StringBuffer sBuff;
sLine = "";
for (x = 0; x < formKeys.length; x++) {
sLine = sLine + formKeys[x] + "=" + formData.get(formKeys[x]);
if (x < formKeys.length - 1) {
sLine = sLine + "&";
}
}
try {
System.out.println("URL: " + pageURL);
uPage = new URL(pageURL);
ucPage = uPage.openConnection();
ucPage.setDoOutput(postData);
ucPage.setDoInput(true);
if (postData) {
outPrint = new PrintWriter(ucPage.getOutputStream());
System.out.println("\tOutgoing data: " + sLine);
outPrint.print(sLine);
outPrint.close();
outPrint.flush();
}
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
sBuff = new StringBuffer();
webPage = "";
while ((sLine = inRead.readLine()) != null) {
sBuff.append(sLine + "\n");
}
inRead.close();
webPage = sBuff.toString();
} catch (Exception e) {
webPage = "error: connection incomplete";
e.printStackTrace();
}
return webPage;
}
Archive.org and give me a listing of the shows and later, a listing of the
episodes available. It's the Java version of a quick version I wrote in
Perl.
To get a list of the pages, I create a URL from the String form of the page
and add the parameters that are to be POSTed. Here's a typical URL for
this search:
<http://www.archive.org/search.php?page=2&query=collection:oldtimeradio&sort=title>
In this case I specify page=2, but the page starts at 1 and currently ends
at 25 (which would change if there were more search results). I check the
data that gets POSTed each time and the page number keeps increasing, but I
keep getting the first page in the search.
I have a loop that goes through each page and calls a method in another
class that creates the URLConnection, then uses a PrintWriter to post the
parameters. I know that's working because I do get the first page of the
search instead of a default search page. The only problem is that even
when I change the POST parameters, I still get page 1.
The URLConnection is made, as I said, with a method in a different class so
it should be creating new buffers and i/o streams each time it's called.
I've included the code of the method that does the URL connection work at
the end of this post. My guess is I'm not re-initializing something or
changing something each time through so it's likely a glaring error I've
just overlooked. I can include the loop that specifies which page, but
it's just a simple loop that keeps increasing the page number until it sees
there is no "Next" link on the page.
Thanks for any help on this!
Hal
-----------------
Code of URL Connection and Reading Method:
//webPage is a String defined elsewhere since the class keeps track
//of the data from the last page.
//formKeys is a HashMap subclass that works only with strings (created
//well before generics were introduced). It's just used to get key/value
//pairs to be used for parameters.
private String getURL(String pageURL, boolean postData) {
int x;
String sLine;
String[] formKeys = formData.keySet();
URL uPage;
URLConnection ucPage = null;
PrintWriter outPrint;
BufferedReader inRead;
StringBuffer sBuff;
sLine = "";
for (x = 0; x < formKeys.length; x++) {
sLine = sLine + formKeys[x] + "=" + formData.get(formKeys[x]);
if (x < formKeys.length - 1) {
sLine = sLine + "&";
}
}
try {
System.out.println("URL: " + pageURL);
uPage = new URL(pageURL);
ucPage = uPage.openConnection();
ucPage.setDoOutput(postData);
ucPage.setDoInput(true);
if (postData) {
outPrint = new PrintWriter(ucPage.getOutputStream());
System.out.println("\tOutgoing data: " + sLine);
outPrint.print(sLine);
outPrint.close();
outPrint.flush();
}
inRead = new BufferedReader(new
InputStreamReader(ucPage.getInputStream()));
sBuff = new StringBuffer();
webPage = "";
while ((sLine = inRead.readLine()) != null) {
sBuff.append(sLine + "\n");
}
inRead.close();
webPage = sBuff.toString();
} catch (Exception e) {
webPage = "error: connection incomplete";
e.printStackTrace();
}
return webPage;
}