Counting lines/characters in an uploaded .DOC/.RTF file using ASP.NET

J

j

Hi,
I've been trying to do line/character counts on documents that are
being uploaded. As well as the "counting" I also have to remove
certain sections from the file.
So, firstly I was working with uploaded MS WORD .doc files. Using code
like that below:

strLine = sr.ReadLine
While Not IsNothing(strLine) 'Not eof
If Trim(strLine) <> "" Then 'Not blank
'increment counter & capture line text
lc += 1
sbFileContent.Append(strLine + vbCr) 'Put CR into string to mark
line break
End If
strLine = sr.ReadLine
End While
sr.Close()

and with a subsequent count on the number of vbCr in the
string-builder contents (sbFileContent) I was hoping to count the
number of "visible" & non-blank lines (and thus characters) in the
file.

My first problem:
If you type in WORD WITHOUT using any line break characters (vbCr,
vbLf, vbCrLf etc), the typing naturally wraps at the edge of the page
so that on visual inspection a document might have 1 paragraph
consisting of 8 lines BUT in fact what you actually have is 1
continuous string with no line breaks. I guess I'm wondering how can
you count lines in a WORD file like its native line counter but
without using WORD on the server!
How does WORD do it anyway? Does it calculate the number of lines by
dividing the total number of characters in the file by the width of
the page in characters????

My second problem:
I have to edit the file to remove some sections. I need to edit the
file and re-save it which, when the file is a MS WORD .doc file, is
problematic considering I don't have WORD on the server. The file just
gets corrupted and when I have to open it later I just get gibberish.

So, I thought about using an RTF file saved from WORD as the uploaded
document. Now, the benefits of RTF is that I can definitely do the
search & replace function and resave the docuemnt WITHOUT causing any
corruption of the document.
However, I have much the same "line counting" problems as I had with
WORD except that now I even have the RTF formatting markup do deal
with which is in the actual content of the file. So, I guess I'm
wondering how to do line counting of visible, non-blank lines in an
RTF while ignoring the RTF markup. Again I'm gonna have the same
problems with the counting of lines where word wrapping is what is
responsible for breaking of a continuous paragraph into a number of
lines.

So, I need a solution that will allow me to count the number of
visible lines in either a WORD or RTF file AND a suggestion of how to
edit (Search/Replace & Save) that file, after the counting process!!!

Would anyone have any suggestions, bright ideas, hacks, references,
code, sleep they'd like to give me I'd be very grateful!
Thanks for listening,
J
 
K

Kevin Spencer

I would suggest you post this question to the Word/Office newsgroups. This
is not an ASP.Net-related question.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
http://www.takempis.com
Big things are made up of
lots of little things.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,815
Latest member
treekmostly22

Latest Threads

Top