Juergen von Hagen wrote:
> many times commercial codes are not really better in the
> computational core, but rather in the GUI: that's what sells and
> that's what is seen.
>
> On a C240+ our FDTD code written by a guy here, a
> 79 x 94 x 184 cells makes about 1 s / iteration or
> 731 ns / iteration / cell (pretty close to Jos' value
> for psufdtd if it was also on a C240+).
My value of 250 ns/cell/iteration was for a PA8500/440, whatever that
may be. But I would actually think that it can be made substantially
faster by hand-optimized code for the internal FDTD loops. Counting 36
floating point operations for the two curl-equations in vacuum, the
speed is only 8 percent of maximum machine speed. My findings: (all
for large jobs)
NEC2D+Lapack: 1200 Mflops 71% of max. machine speed
plain NEC2D: 400 Mflops 24% of max. machine speed
psuFDTD (L&K) 144 Mflops 8% of max. machine speed
commercial FDTD 35 Mflops 2% of max. machine speed
Max speed of 1700 Mflops can really be obtained for these machines by
certain test programs. My NEC2D+Lapack uses the optimized lapack-blas
veclib from Convex corp.
Since there are no BLAS routines that can actually do the internal
FDTD loops, one would have to design them by hand. It would be
interesting to see how far the speed could be improved. Has anyone
heard of such an approach?
(Jos)
-- Dr. Jozef R. Bergervoet Electromagnetism and EMC Philips Research Laboratories, Eindhoven, The Netherlands Building WS01 FAX: +31-40-2742224 E-mail: bergervo_at_natlab.research.philips.com Phone: +31-40-2742403Received on Thu Feb 17 2000 - 10:02:44 EST
This archive was generated by hypermail 2.2.0 : Sat Oct 02 2010 - 00:10:40 EDT