Problem with printing to a file

M

megaziomek

Hi,
I wrote a program in which the main part is a loop with a big number
of iterations (the program takes about 1 day to perform the
calculations i want) that imitates a flow of time ("professionally
speaking": time evolution). After each iteration of the loop I send
some data to files. To make the long story short, u can simply imagine
a two columns file with time values (proportional to the the loop
iterator) in the 1st column and some number/quantity (say, energy) in
the 2nd column.

So, at the very beginning of the code I open the file:

ofstream EnergyStream;
EnergyStream.open(OutpuStreamName(Results_Location,"EnergyValues.dat"),
ios::app);

And i run the loop:

for(int t=0; t<=Nt; t++){
/*Here i calculate a quantity t_Energy and i print it to the file; dt
and RealTimeUnit are some inputs:*/
EnergyStream<<setw(15)<<fixed<<setprecision(9)<<dt*t*RealTimeUnit<<"\t"<<fixed<<setprecision(9)<<t_Energy<<endl;
}

And close the file:
EnergyStream.close();

However by following this method I came across some problem. When I
open my file EnergyValues.dat while the program is running (same
phenomenon when i open the file only after it'll finish to run) what I
see are the two columns (as I expected) say for abt 5000-6000 lines
but then, out of nowhere, i see one very long line (takes abt 1/5 of
my emacs window when maximized to fullscreen) of "^@" signs, so I see
smth like:
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
Then, again several thousands of lines of the correct results and
again this long line of "^@". I had a closer look to that but there's
no periodicity at all in that. Different occurrences of "the" line
have different lengths, also the "bad lines" are separated by
different number of lines with the actual results.

I dont' have to tell u that all plotting programs are going crazy when
u load such a file as an input.

If, on contrary, I open, write and close the file separately for each
and single iteration of the loop, this problem does not occur and
everything is as it should be! Unfortunately, I expect that this
approach is way more time consuming (if u imagine opening, writing and
closing Nt=10^8 times). So, I would definitely go for the first
solution (i.e. opening only once at the beginning, closing once at the
very end) if only I could avoid this "^@" stuff.

I would be very grateful for any suggestions that would help me to
solve this problem.
Thank u in advance.
 
M

megaziomek

Well,

honestly, i'm not that pro bud... However, I run my programs in a
linux-based computer network that allows for a remote files access, so
I guess this sounds like NFS, doesn't it? What do u mean exactly by
switching to hard mounts? Unfortunately I cannot choose "a way" I run
my programs, that is I can run'em only over the network which settings
I can't modify. Any software approach?
In the meantime I was trying to position the file manually (using
tellp and seekp) but also doesn't work (same problem as described at
first).

Thanks for the idea.
Still hoping for any suggestions! :)
 
F

Francesco S. Carta

Well,

honestly, i'm not that pro bud... However, I run my programs in a
linux-based computer network that allows for a remote files access, so
I guess this sounds like NFS, doesn't it? What do u mean exactly by
switching to hard mounts? Unfortunately I cannot choose "a way" I run
my programs, that is I can run'em only over the network which settings
I can't modify. Any software approach?
In the meantime I was trying to position the file manually (using
tellp and seekp) but also doesn't work (same problem as described at
first).

Thanks for the idea.
Still hoping for any suggestions! :)

Note: You should include at least part of the message you're replying
to, as I am doing here, so that people can understand the context of
your reply without having to reach to previous posts.

[ I've just read Sam's good explanation about why you really have to
check your system configuration... my reply is somewhat about that too,
but whatever, the cut wouldn't be worth the cutting itself, take it as
it is ;-) ]

I believe that Sam is seeing it right, you should really try to do as he
suggested before hunting for a bug that maybe is not in your code.

Are you sure that you're not getting any error on EnergyStream?

The remote file service can very well misbehave and eat some data or
print nulls randomly, but maybe those errors are getting reported and
you're shutting them up somehow maybe checking the output stream and
automatically clearing its error state.

If you can, cut away the simulation code (the biggest part, I suppose)
from your program, and post here your main() along with the function (if
any) which contains the principal loop so that we can see how you're
implementing all the code that handles the output and the errors.

In the meantime you can print out the loop index along with each row of
data, so that you can check if you're losing some lines on the way -
that would be an even stronger hint of a misbehaving remote service.

If you realize you cannot change any setting in your environment, and
you see that you're not losing data, you can filter out the offending
lines postprocessing the output file.

Another thing you can do is composing in memory separate blocks of
output and dump them as separate files. In this way you can read the
files' sizes in order to spot the damaged ones - postprocessing only
them for the offending lines, if you really have a lot of data -
assuming you're not losing data on the way, once more [and if Sam is
right on your case, you're likely losing it].

Try to check it out, and check out C++ FAQ 5.8 too, once you're on the
checks.
 
M

megaziomek

The problem with that is twofold:

1) your code does not check for errors. With streams, it's actually a little
bit difficult. Given the underlying buffering in your std::eek:stream's
std::streambuf, "trying again" becomes somewhat difficult, since it's not
immediately apparent how much of the buffered std::eek:stream could not be
written, and has to be retried.

2) the kernel does its own buffering. Even after your application writes to
the file, the kernel may buffer it until some time later, and when it tries
to write it out to the NFS share, it gets an error, and your application
gets notified only when its subsequent write happens, and there's really no
way of telling whether your application's latest write failed, or some
previously buffered chunk.

Try to check it out, and check out C++ FAQ 5.8 too, once you're on the
checks.

Thank you very much for the explanations. You gave me some good ideas
abt what is going on.
It's 2:30 a.m. in Europe now, so time to go home I guess. I will try
to implement your suggestions tomorrow and give here some feedback.
Really appreciated.
 
F

Francesco S. Carta

Guess so.


NFS shares can be mounted as "soft mounts" or "hard mounts". The
difference is what happens when there's a network error. With "soft
mounts" the application that's writing to an NFS share will get an
error, and the application is responsible for handling it and trying again.

The problem with that is twofold:

1) your code does not check for errors. With streams, it's actually a
little bit difficult. Given the underlying buffering in your
std::eek:stream's std::streambuf, "trying again" becomes somewhat
difficult, since it's not immediately apparent how much of the buffered
std::eek:stream could not be written, and has to be retried.

2) the kernel does its own buffering. Even after your application writes
to the file, the kernel may buffer it until some time later, and when it
tries to write it out to the NFS share, it gets an error, and your
application gets notified only when its subsequent write happens, and
there's really no way of telling whether your application's latest write
failed, or some previously buffered chunk.

With "hard mounts", the kernel itself traps the error, and keeps trying
to write to the NFS share until it goes back up. Note that until the
write succeeds, the process is effectively hung, and cannot even be kill
-9-ed.


No. This is a system configuration issue. Contact your system
administrator. Ask your admin to reconfigure your server to use hard NFS
mounts. Actually, ask your admin to check the log files and tell you why
you're getting NFS errors.


This is insufficient. When NFS enters the picture, reliably writing to
files and recovering from write errors becomes rather tricky.

If the OP realizes that the problem is exactly this and needs to work
around it (read, impossible to change the system setting) because some
data is actually getting lost, I have some ideas that could eventually fit.

These ideas involve dumping the file in chunks while keeping those
chunks in memory in order to read the output files again and check them
back, and this obviously involves some memory and performance overhead.

I suppose this is a quite obvious approach and at the same time I'd like
to learn about other approaches, if the readers decide to point them out
- I know this kind of issues can arise in different contexts (I once had
problems like this with USB external hard disks connected to USB hubs
not really working fine under WinXP, for example)
 
F

Francesco S. Carta

Thank you very much for the explanations. You gave me some good ideas
abt what is going on.
It's 2:30 a.m. in Europe now, so time to go home I guess. I will try
to implement your suggestions tomorrow and give here some feedback.
Really appreciated.

You're welcome, btw, here is just the same late hour ;-)

Let us know about the outcome of your tests and of your inquiries to the
network admin.
 
M

megaziomek

Let us know about the outcome of your tests and of your inquiries to the
network admin.

So it is the file system problem.

I asked my admin (actually it was sufficient to use mount command but
anyway) and yes, we have here the NFS. However, on contrary to your
suggestion we have "hard mounts". Still, it is definitely the problem
of communication between a file server and hard disk of the computer
standing on my desk (where I "physically" write the data). Situation
is even more complicated coz I run my programs in a queuing system
(Sun Grid Engine). I did a series of tests but the most convincing
one, as far as I'm concerned, was the following:
I ran my program directly on the machine standing on my desk (i.e. NOT
using the queuing mechanism) and I sent all the data I mentioned
yesterday to my personal external hard disk (type vfat). Doing so I
eliminated possible "communication" issues, everything was happening
locally and the file server was not involved anyhow. Then the problem
did NOT occur.

Still don't know (neither the admin) what is the reason for that but
since it is not my intention to do some crash investigation on that
but to calculate something I decided to implement software kinda
solution:
I told you that when i open, write and close a file in each single
iteration then everything is ok but i was afraid about time
efficiency. So, to avoid "^@" and to save some time I created a
stringstream to which I write for a number of iterations (every single
one of them) and only "from time to time", i.e. every some N_Printout
iterations, I open my file, flush the stringstream and then close the
file. Following this method I avoid the null characters and I save
some time. Obviously, Id prefer rather not doing that in such a
"complicated" way but I guess there are just some things that we
cannot mend.

Ok, I guess my question has been kinda answered. Thank you very much
for your suggestions. Found them very helpful, appreciated.
 
M

megaziomek

Forgot to write:
yes, some tiny part of the actual data was missing. Say, I have
something like (first column is time, second column energy):

1.0011 12.227
1.0012 12.224
1.0013 12.219
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^^@^@^@014
12.221
1.0015 12.223

I hope u see what I mean, this is just an example. Sometimes a time
was printed properly but the energy value was cut. As I wrote
yesterday, really no pattern for that.
 
F

Francesco S. Carta

Humor everyone, and show the contents of the files /etc/fstab and all
the /etc/auto.* files.

Heck... what if he doesn't post them here? Would I lose some good fun?

What you wrote makes me think that those folders should be full of
files... but, what kind? recovered chunks of data? error logs?

I'm just an average Windows user, under these aspects, you might guess I
have no real knowledge about the stuff you're speaking about :)
 
F

Francesco S. Carta

No -- those files should be relatively small and specify what gets
mounted on the server, and how.

I just distrust the admin's claim that the NFS mounts are hard mounts.
The given description matches exactly, word for word, an issue I had a
number of years ago, with soft mounts.

Whoa, that admin wouldn't be all that clever or competent if s/he is
making a false claim so easily verifiable... now I'm really curious to
see the OP's response to that.
 
M

megaziomek

Humor everyone, and show the contents of the files /etc/fstab and all the
/etc/auto.* files.

Ok, wanted to ask my admin for the permission. Here u go:

@fstab:
***********************

# /etc/fstab: static file system information.
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
/dev/sda5 / ext3 errors=remount-ro,acl 0
1
/dev/sda2 /boot ext3 defaults 0 2
/dev/mapper/system_vg-scr_lv /export/scr ext3 defaults
0 2
/dev/mapper/system_vg-tmp_lv /tmp ext3 defaults
0 2
/dev/mapper/system_vg-usr_lv /usr ext3 defaults
0 2
/dev/mapper/system_vg-var_lv /var ext3 defaults
0 2
/dev/sda1 none swap sw 0 0
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto 0 0

@auto.local:
***********************

ap -ro,suid,nodev,hard,intr yoda:/export/local/&
var -rw,suid,nodev,hard,intr yoda:/export/local/&
crm114 -ro,nosuid,nodev,hard,intr yoda:/export/local/&
include -ro,nosuid,nodev,hard,intr yoda:/export/local/&
info -ro,nosuid,nodev,hard,intr yoda:/export/local/&
intel -ro,nosuid,nodev,hard,intr yoda:/export/local/&
man -ro,nosuid,nodev,hard,intr yoda:/export/local/&
mathematica -ro,nosuid,nodev,hard,intr yoda:/export/local/&
Numeric -ro,nosuid,nodev,hard,intr yoda:/export/local/&
share -ro,nosuid,nodev,hard,intr yoda:/export/local/&
TeX -ro,nosuid,nodev,hard,intr yoda:/export/local/&
bin -ro,suid,nodev,hard,intr yoda:/export/local/
Linux/&
lib -ro,suid,nodev,hard,intr yoda:/export/local/
Linux/&
libexec -ro,suid,nodev,hard,intr yoda:/export/local/
Linux/&

@auto.misc:
***********************

#
# $Id: auto.misc,v 1.2 2003/09/29 08:22:35 raven Exp $
#
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# Details may be found in the autofs(5) manpage

cd -fstype=iso9660,ro,nosuid,nodev :/dev/cdrom

# the following entries are samples to pique your imagination
#linux -ro,soft,intr ftp.example.org:/pub/linux
#boot -fstype=ext2 :/dev/hda1
#floppy -fstype=auto :/dev/fd0
#floppy -fstype=ext2 :/dev/fd0
#e2floppy -fstype=ext2 :/dev/fd0
#jaz -fstype=ext2 :/dev/sdc1
#removable -fstype=ext2 :/dev/hdd


@auto.net:
***********************

#!/bin/bash

# $Id: auto.net,v 1.8 2005/04/05 13:02:09 raven Exp $

# This file must be executable to work! chmod 755!

# Look at what a host is exporting to determine what we can mount.
# This is very simple, but it appears to work surprisingly well

key="$1"

# add "nosymlink" here if you want to suppress symlinking local
filesystems
# add "nonstrict" to make it OK for some filesystems to not mount
opts="-fstype=nfs,hard,intr,nodev,nosuid,nonstrict,async"

# Showmount comes in a number of names and varieties. "showmount" is
# typically an older version which accepts the '--no-headers' flag
# but ignores it. "kshowmount" is the newer version installed with
knfsd,
# which both accepts and acts on the '--no-headers' flag.
#SHOWMOUNT="kshowmount --no-headers -e $key"
#SHOWMOUNT="showmount -e $key | tail -n +2"

for P in /bin /sbin /usr/bin /usr/sbin
do
for M in showmount kshowmount
do
if [ -x $P/$M ]
then
SMNT=$P/$M
break
fi
done
done

[ -x "$SMNT" ] || exit 1

# Newer distributions get this right
SHOWMOUNT="$SMNT --no-headers -e $key"

$SHOWMOUNT | LC_ALL=C sort -k 1 | \
awk -v key="$key" -v opts="$opts" -- '
BEGIN { ORS=""; first=1 }
{ if (first) { print opts; first=0 }; print " \\\n\t"
$1, key ":" $1 }
END { if (!first) print "\n"; else exit 1 }
' | sed 's/#/\\#/g'


@auto.not.master:
***********************

#
# $Id: auto.master,v 1.4 2005/01/04 14:36:54 raven Exp $
#
# Sample auto.master file
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# For details of the format look at autofs(5).
#/misc /etc/auto.misc --timeout=60
#/smb /etc/auto.smb
#/misc /etc/auto.misc
#/net /etc/auto.net

@auto.scr:
***********************

mail -rw,nosuid,nodev,hard,intr mail:/var/spool/mail
news -rw,nosuid,nodev,hard,intr yoda:/var/news
WWW -fstype=nfs4,sec=sys,rw,nosuid,nodev,hard,intr www:/userdir
mathematica_maia -fstype=nfs4,sec=sys,ro,nosuid,nodev,hard,intr
maia:/mathematica
aglein -fstype=nfs4,sec=sys,rw,nosuid,nodev,hard,intr mimas:/aglein
* -rw,nosuid,nodev,hard,intr &:/export/scr

@auto.smb:
***********************

#!/bin/bash

# $Id: auto.smb,v 1.3 2005/04/05 13:02:09 raven Exp $

# This file must be executable to work! chmod 755!

key="$1"
mountopts="-fstype=cifs"
smbopts=""
credfile="/etc/auto.smb.$key"

for P in /bin /sbin /usr/bin /usr/sbin
do
if [ -x $P/smbclient ]
then
SMBCLIENT=$P/smbclient
break
fi
done

[ -x $SMBCLIENT ] || exit 1

if [ -e $credfile ]; then
mountopts="$mountopts,credentials=$credfile"
smbopts="-A $credfile"
else
smbopts="-N"
fi

$SMBCLIENT $smbopts -gL $key 2>/dev/null| awk -v key="$key" -v
opts="$mountopts" -F'|' -- '
BEGIN { ORS=""; first=1 }
/Disk/ { if (first) { print opts; first=0 }; sub(/ /, "\\ ",
$2); print " \\\n\t /" $2, "://" key "/" $2 }
END { if (!first) print "\n"; else exit 1 }
'
 
F

Francesco S. Carta

Ok, wanted to ask my admin for the permission. Here u go:

@fstab:
***********************

<snip>

Ah, nearly missed this, thanks for posting it.

Now I just have to wait for someone to post a follow-up drawing the
relative - possibly intelligible - conclusions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top