In response to questions from Paul Elliot of AJK Technology and Keith
Lysiak of SWRI regarding my recent posting of the PC NEC4.1
Benchmarks:
1. Of the compiled NEC4.1 codes tested, the DEC Visual Fortran
Compiler produced the fastest executing NEC4.1 code. This can be
seen in tables 11 and 12. The DEC VFC NEC4.1 executable is a true
Windows 32-bit application. And thus it will NOT run under DOS.
The Lahey LF90 V3.5 produced NEC4.1 code can be executed as an
application under real DOS, from an NT Command Prompt or from NT's
"START/RUN" menu.
The Lahey LF77 V5.10 produced NEC4.1 code could only be executed
under real DOS (and presumably Windows-95). It could not be
executed under Windows-NT at all. It appears that the version of
the Phar Lap DOS Extender in that version of LF77 is incompatible
with NT.
Note that in Table-12 there is the annotation NT/CP to designate
those tests that were started from the NT (not DOS!) Command
Prompt, rather than from the Windows-NT "START/RUN." Everything
annotated "NT" was started from the NT "START/RUN" menu item. I
have seen significantly different execution times under NT for the
two different methods of starting a Lahey LF90 compiled executable
image. E.g., look at Table-12 and compare the Lahey F90 entries
for TEST299.NEC, TEST300.NEC, etc on the GA-686KX motherboard at
266-MHz. Although the matrix factor times are close, the matrix
fill times are quite a bit different. In the TEST600.NEC test the
fill time was 13.418 seconds for the START/RUN method and 19.059
seconds for the Command Prompt method.
Note that in Lahey LF90 V3.5 they were still using the Phar Lap
DOS extender. The DEC DVF compiler produces true 32-bit code
without the DOS extender. I believe (but have not verified) that
Lahey does that in their LF90 Version-4 compiler. I don't have
it. The DOS extender in LF90 V3.5 probably has something to do
with the squirrelly execution times under NT.
2. I did not know anything about the NEC4.1 Numerical Green's
Function problem under Windows-95/NT. Thanks for bringing this to
my attention. I wonder if caching affects this?
3. Sorry for the lack of a summary write up of the test results.
This is a background project and I wanted to get the data out
sooner rather than later... Here are a few words:
PC performance on 80x86 chips has certainly exploded over the last
few years. I originally started using NEC2 and then NEC4.1 a few
years ago as a PC reliability, compatibility and performance test.
Also, radio engineering friends were frequently asking me what is
the best PC and/or motherboard to buy for their radio and antenna
engineering work.
Several years ago there was a greater difference in reliability,
compatibility and performance among commercially manufactured name
brand PCs, manufactured clones and homebrewed clones using buyer
selected motherboards. You will note from the evolution of my
testing over time that I'm biased towards Gigabyte motherboards,
though they have not always been trouble-free. Particularly with
the Triton-II chipset in the Pentiums and the Klamath (KX) chipset
in the Pentium-IIs. So far I've not had any problems with the LX
and BX chipset based Gigabyte Pentium-II motherboards.
The problems with the Triton-II and KX chipset based motherboards
usually involved memory data errors, particularly under
Windows-NT. Likely due to very tight (marginally reliable) memory
timing. Over time BIOS upgrades (and a downgrade in the case of
the KX chipset) and selected memory resolved these problems.
Twelve years ago I bought a Convex C-1 vector mini-supercomputer
for use in a physics professor's research group. That C-1, if my
feeble memory is any good, had 40-MFLOPS/sec single precision and
20-MFLOPS/sec double-precision theoretical performance. It came
with a vectorizing Fortran compiler, 16-MB (2-MW of 64-bit) ECC
RAM, 500-MByte hard disk, Ethernet interface, 9-track tape drive,
RS-232 mux, etc in two cabinets for $400K! Now you can see from
Table-1 that a Pentium 200-MHz/MMX PC has about the same
MFLOPS/sec performance in DP and yet the cost was only about $2K a
year or so ago! Wow!
In Table-6 you can see the effects of three different types of
memory and Intel support chips on Pentium-II performance. The
reference GA-686LX motherboard with the Intel LX chipset has the
10-nsec SDRAM memory. The older Intel KX chipset supporting FPM
type memory (60-nsec, e.g.) has much poorer performance. Thus the
10-nsec SDRAM memory and supporting LX chipset provide reasonable
performance with the 266 and 300-MHz Pentium-IIs shown. Note how
the Dell D300 at 300-MHz is not significantly better than a good
266-MHz Gigabyte GA-686LX motherboard! This is an important point
when making a PC purchasing decision. Just because the CPU clock
speed is higher doesn't necessarily mean that you are going to get
significantly higher performance. Intel keeps raising the CPU
clock speeds but the supporting chipset and memory often are not
fast enough to get the performance out of the higher CPU clock
speed. And the primary L1 CPU cache size often seems inadequate.
Now look at the 350-MHz BX chipset motherboard, Gigabyte GA-686BA
performance in Table-6. The PC100 memory bus with 8-nsec SDRAM
and BX chipset appear to make quite an improvement in NEC4.1
performance. Although I do not have 333-MHz LX chipset data in
Table-6 for a single CPU motherboard, looking at the 300-MHz and
DL2 motherboard data would suggest that the LX chipset and memory
at 333-MHz would fall far short of the desired performance and it
is worth the jump to the BX chipset with the PC100 memory bus.
The differences between compilers is even more dramatic. From
Table-11 and Table-12 you can see quite a difference in compiled
code performance. I have older data on the Microsoft Power
Station Fortran that shows it to be rather poor. Microsoft sold
the compiler product line to DEC. DEC apparently improved it with
their own technology producing a superior product. The Lahey LF90
Version-4 compiler is claimed to have significantly faster
executing compiled code than its V3.5 predecessor, but I've no
experience with it nor have I received any user reports on it.
The two products have comparable educational pricing. Note that
the matrix factor times are often close for the two compilers --
but the matrix fill times are often 1.5 to 3 times slower for the
Lahey LF90 under NT.
As usual, Caveat Emptor! Hopefully you will find the performance
data presented here useful in your purchasing decisions...
--Larry, W7JYJ
Received on Wed Oct 14 1998 - 18:13:03 EDT
This archive was generated by hypermail 2.2.0 : Sat Oct 02 2010 - 00:10:38 EDT