Performance
First remember that memory is the bottleneck not execution speed. If an application takes 10 or 5 clock cycles to execute an task this is trivial compared to the cost of a single cache miss.
Also, it is important to remember that the Java ™VM performs extensive runtime analysis to optimise performance. In many cases, this is superior to static compile time analysis because the code and memory that is most heavily exercise is optimized not that which the developer/compiler thinks is heavily used. The runtime analysis impacts the design of optimized code;some of which may appear counter-intuitive.
The LLVM optimization passes
Reduce size of functions
Most compilers will optimize function execution by doing things that increase the size of functions, inlining, loop-unrolling etc. These can be counter productive for the generation of high performance programs with Proteus for the follow reasons.
First, Java functions cannot have more than 64Kb of java byte code, functions that are larger can be executed in interpreted mode by Proteus but will a factor of 10 performance penalty.
Java functions that are larger that 8Kb are less likely to be compiled by Java VM leading to lower execution speeds.
Small commonly used functions are more likely to be in the code cache than larger functions, reducing cache misses.
Although, Proteus attempts to break up functions into smaller chunks, but this is less effective than not producing large functions.
LLVM option -disable-inlining
can be used during the generation of the ll file, to prevent the inlining of
smaller functions into larger ones.
Avoid using memory (even the stack)
All memory access whether stack or heap is equally as costly. Therefore, if a function takes a function pointer to an integer but only ever reads it there is the cost of
that read. By converting the argument to pass by value it is possible that the read may be further optimized (i.e. single memory read is re-used). The -mem2reg
Java functions that are larger that 8Kb are less likely to be compiled by Java VM leading to lower execution speeds.
Small commonly used functions are more likely to be in the code cache than larger functions, reducing cache misses.
Although, Proteus attempts to break up functions into smaller chunks, but this is less effective than not producing large functions.
LLVM option -disable-inlining
can be used during the generation of the ll file, to prevent the inlining of
smaller functions into larger ones.