T
ThePants
Hi, given the following code, I've been successful in grabbing pages
for parsing, but for a certain page template (containing a particular
piece of code) the stream always ends right after that code. If you try
this with just about any type of url (incuding urls from the same site
without that piece of code) it works fine, but with urls containing the
piece of code, the stream is returned only up to that point.
Dim sURL as String
' Works (along with 1000's of other sites/templates/servers):
sURL = "http://www.msnbc.msn.com/id/14191819"
' Doesn't work:
sURL =
"http://www.time.com/time/business/article/0,8599,1226309,00.html"
Dim oSR As StreamReader = getPageContent(sURL)
' If you do oSR.ReadToEnd here, you'll see the page broken at the
wrong place
Private Function getPageContent(ByVal URL As String) As StreamReader
Dim oResponse As HttpWebResponse = Nothing
Dim oSR As StreamReader = Nothing
Dim oRequest As HttpWebRequest
Try
oRequest = WebRequest.Create(URL)
oResponse = CType(oRequest.GetResponse, HttpWebResponse)
oSR = New StreamReader(oResponse.GetResponseStream())
Catch ex As Exception
End Try
Return oSR
End Function
The stream for the time.com pages ends *every time* right after:
<strong>SUBSCRIBE TO TIME MAGAZINE FOR JUST $1.99</strong></a>
.... and the number of characters varies depending on the story, but
each time the "Subscribe" link is there, the response stream dies right
after it. If you view the source of those pages, you'll see a single
blank character, and then an html comment ( <!--cm_searchtext end-->).
So I'm stuck, is it possible that the single character between the </a>
and the comment is breaking the stream? Could it be the server thinking
(correctly) that I'm parsing it and choosing that as the location each
time to cut me off? (Changing the UserAgent property of the
HttpWebRequest doesn't affect the outcome at all). I've played with
several properties of HttpWebRequest, including spoofing a UserAgent,
setting KeepAlive to true, SendChunked, and ProtocolVersion... but
nothing I do seems to keep this from happening.
Any help would be appreciated.
Thanks!
STA
for parsing, but for a certain page template (containing a particular
piece of code) the stream always ends right after that code. If you try
this with just about any type of url (incuding urls from the same site
without that piece of code) it works fine, but with urls containing the
piece of code, the stream is returned only up to that point.
Dim sURL as String
' Works (along with 1000's of other sites/templates/servers):
sURL = "http://www.msnbc.msn.com/id/14191819"
' Doesn't work:
sURL =
"http://www.time.com/time/business/article/0,8599,1226309,00.html"
Dim oSR As StreamReader = getPageContent(sURL)
' If you do oSR.ReadToEnd here, you'll see the page broken at the
wrong place
Private Function getPageContent(ByVal URL As String) As StreamReader
Dim oResponse As HttpWebResponse = Nothing
Dim oSR As StreamReader = Nothing
Dim oRequest As HttpWebRequest
Try
oRequest = WebRequest.Create(URL)
oResponse = CType(oRequest.GetResponse, HttpWebResponse)
oSR = New StreamReader(oResponse.GetResponseStream())
Catch ex As Exception
End Try
Return oSR
End Function
The stream for the time.com pages ends *every time* right after:
<strong>SUBSCRIBE TO TIME MAGAZINE FOR JUST $1.99</strong></a>
.... and the number of characters varies depending on the story, but
each time the "Subscribe" link is there, the response stream dies right
after it. If you view the source of those pages, you'll see a single
blank character, and then an html comment ( <!--cm_searchtext end-->).
So I'm stuck, is it possible that the single character between the </a>
and the comment is breaking the stream? Could it be the server thinking
(correctly) that I'm parsing it and choosing that as the location each
time to cut me off? (Changing the UserAgent property of the
HttpWebRequest doesn't affect the outcome at all). I've played with
several properties of HttpWebRequest, including spoofing a UserAgent,
setting KeepAlive to true, SendChunked, and ProtocolVersion... but
nothing I do seems to keep this from happening.
Any help would be appreciated.
Thanks!
STA