How can you make idle processors pick up java work?

Q

qwertmonkey

From: "qwertmonkey" <qwertmonkey@1:261/38.remove-dpk-this>

From: (e-mail address removed)
Use multiple threads?
~
a) I need to actually scan large text files (10+ million lines).
b) On each line there is a NL sentence.
c) That processing should be run only once, but as fast as possible.
~
d) If you go:
d.1) int iPrx = Runtime.getRuntime().availableProcessors();
d.2) count all lines
d.3) split the file in (total lines)/iPrx
d.4) then run iPrx threads (or executable instances using a batch script)
the time you waste on d.2) and d.3) will make all that strat senseless
~
I have no way to influence how those large files are generated
~
e) because of the large sizes of the files you can't even go
~
FIS = new FileInputStream(IFl);
FileChannel IFlChnl = FIS.getChannel();
int iChnlSz = (int)IFlChnl.size();
MappedByteBuffer MptBytBfr = IFlChnl.map(FileChannel.MapMode.READ_ONLY, 0,
iChnlSz);
~
so, apparently, the only option I have is:
~
BfR = Files.newBufferedReader(DirPth, ChrStUTF8);
String aSx = BfR.readLine();
while(aSx != null){

aSx = BfR.readLine();
}
~
do you know of a faster way to go about this?
~
lbrtchx

-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
D

David Lamb

To: qwertmonkey
From: "David Lamb" <david.lamb@1:261/38.remove-dpk-this>

To: qwertmonkey
From: David Lamb <[email protected]>

~
a) I need to actually scan large text files (10+ million lines).
b) On each line there is a NL sentence.
c) That processing should be run only once, but as fast as possible.
~
d) If you go:
d.1) int iPrx = Runtime.getRuntime().availableProcessors();
d.2) count all lines
d.3) split the file in (total lines)/iPrx
d.4) then run iPrx threads (or executable instances using a batch script)
the time you waste on d.2) and d.3) will make all that strat senseless

How slow is the NL processing? Does it make any sense to read lines in one
thread and pass each off to one of the iPrx-1 other threads that might run on
separate processors?

-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
P

Patricia Shanahan

To: qwertmonkey
From: "Patricia Shanahan" <patricia.shanahan@1:261/38.remove-dpk-this>

To: qwertmonkey
From: Patricia Shanahan <[email protected]>

~
a) I need to actually scan large text files (10+ million lines).
b) On each line there is a NL sentence.
c) That processing should be run only once, but as fast as possible.
~
d) If you go:
d.1) int iPrx = Runtime.getRuntime().availableProcessors();
d.2) count all lines
d.3) split the file in (total lines)/iPrx
d.4) then run iPrx threads (or executable instances using a batch script)
the time you waste on d.2) and d.3) will make all that strat senseless

Why worry about splitting by actual line count, rather than by byte position in
file?

Patricia

-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
J

Joshua Cranmer

To: qwertmonkey
From: "Joshua Cranmer" <joshua.cranmer@1:261/38.remove-dpk-this>

To: qwertmonkey
From: Joshua Cranmer <[email protected]>

[Gah, your newsreader is incapable of threading posts correctly. Please find a
non-broken one.]

~
a) I need to actually scan large text files (10+ million lines).
b) On each line there is a NL sentence.
c) That processing should be run only once, but as fast as possible.

Only 10M-line files?

The easiest way to do this is to just make a ThreadPoolExecutor and have your
main thread dispatch requests as fast as possible to the pool. Or you can do
the work pooling yourself, which may be faster since you're not continually
posting Runnable's, but timing results would be necessary to convince me.

There are other options, but chances are, your disk drive is going to saturate
first (in short, it involves reading non-consecutive pages of the file, which
is generally a recipe for disaster).

--
Beware of bugs in the above code; I have only proved it correct, not tried it.
-- Donald E. Knuth

-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 
J

John B. Matthews

To: qwertmonkey
From: "John B. Matthews" <john.b..matthews@1:261/38.remove-dpk-this>

To: qwertmonkey
From: "John B. Matthews" <[email protected]>

d.2) count all lines

Maybe ask ProcessBuilder to `wc -l`, or similar?

--
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

-+- BBBS/Li6 v4.10 Dada-1
+ Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
* Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top