Parsing an html/aspx file

N

Neil.Smith

I can't seem to find any references to this, but here goes:

In there anyway to parse an html/aspx file within an asp.net
application to gather a collection of controls in the file. For
instance what I'm trying to do is upload a html file onto the web
server, convert it to aspx file and then parse it for input
tags/controls, which in turn will become fields in a newly created
database table.

Clearly when the aspx file is called the controls become available, but
I want to gather the collection from another page, basically parsing
the aspx file. I don't want to have to end up treating it like a text
file (reading it in and handling it char by char) but that may be my
only option unless anyone has any suggestions?

Cheers
 
M

Michael Lang

Well, I don't know of anything off hand that does anything like this other
than the .NET framework itself, Microsoft would have some clr code somewhere
in one of the .NET framework dlls that does the opposite process to what
you're describing i.e. they generate HTML from ASPX files, maybe there's
some underlying infrastructure there that you could reuse.

Alternatively I would use the System.Xml namespace to parse the HTML files,
you should have no problems reading well formed HTML as XML. If the HTML is
not well formed that might be another matter.

Using System.Xml namespace should be a lot easier than parsing it as a text
file manually.

Michael
http://www.mblmsoftware.com/
 
B

bruce barker \(sqlwork.com\)

very little html is well formed enough for an xml parser to read it. one
<br> instead of <br /> and your toast. also attributes need t be quoted
correctly. .net parses asp.net files by looking for well formed asp.net tags
(lax about quotes though). most of the html parses I've seen are really xml
parses and don't work well in the general case.

you could compile the page with the asp.net compiler, then load the dll.and
use reflection to walk the controls collection.

-- bruce (sqlwork.com)
 
M

Michael Lang

Compiling the ASPX and using reflection I guess that's one way to use the
underlying infrastructure. That sounds pretty tricky as well though, you
may have to play around with some compiler settings.

In 1.0 I have doubts of success as if my memory serves me correctly it just
compiles the code behind and not the HTML within the ASPX.

In 2.0 at least you have the option when publishing the site to compile not
just the code behind but the contents of the ASPX into the dll. I'd suggest
having a play with Ildasm to see what's available in your assembly before
you try writting a lot of code using reflection on it.

I did say it would be another matter if the HTML is not well formed. But I
thought perhaps you could tweak the XmlTextReader to read HTML by playing
with the XmlReaderSettings or something... I've never tried this so I didn't
know. A quick look at the API tells me that you're quite right, it's tricky,
but not completely hopeless.

So I did a search to see if anyone has attempted it and found this...

http://www.gotdotnet.com/Community/...mpleGuid=b90fddce-e60d-43f8-a5c4-c3bd760564bc

....which might be worth a look.

Cheers

Michael
http://www.mblmsoftware.com/
 
N

Neil.Smith

Thanks for the replies, I do find it interesting that there is no
obvious methods for doing something that the framework must do
everytime it loads a aspx page!

I will try out parsing it as xml and the SgmlReader API.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top