Dotty Linker: Making your Scala applications smaller and faster
Dmitry Petrashko
About me:
- https://github.com/darkdimius/
- PhD at EPFL
- previously worked on ScalaBlitz
- for the last year working on Dotty. Implemented:
- mini-phases framework;
- pattern matcher;
- lazy vals, based on SIP-20 - Improved Lazy Vals Initialization;
- tailrec ;
- shared JVM8 backend based on GenBCode;
- supervising:
- Guillaume Martres: value classes and arrays of value classes
- Alex Sikiaridis: specialization
The language’s scalability is the result of a careful integration of
object-oriented and
functional language concepts.
But nothing comes for free.
Example: Boxing
13x slowdown due to boxing
But doesn't @specialize
solve this problem?
Problems with specialization and miniboxing
trait Function3[-T1, -T2, -T3 +R]
How many specializations are needed?
Specialization: 10000 classes
Miniboxing: 81 classes
Are they actually needed?
in your program?
Most likely no!
What is we could generate only specializations that your program needs?
How do we know which ones?
TASTY
- Tells how all methods are actually implemented
- Included for all classes emitted by dotc
- source code for dependencies
Call-graph analysis
Detect program entry points, mark them as reachable
For every reachable method:
- mark all methods called from it as reachable
- track how type params flow to other methods
- track which trait combinations were instantiated
Paper: Ali K., Rapoport M., Lhoták O., Dolby J., Tip F. Constructing Call Graphs of Scala Programs, ECOOP 2014
Call-graph analysis, example
Call-graph analysis, example
- marked
main
as entry point
map
is called from main
with tparams [String]
sum
is called from main
with tparams [Int]
foldLeft
is called from sum
with tparams [Int]
foldl
is called from foldLeft
with tparams [Int]
Function.apply
is called from foldl
with tparams [Int, Int, Int]
What can linker do?
- smart specialization
- smart dead code elimination. squeryl: 400 methods vs 1500 methods reachable by proguard
- convert classes to value classes
- infer more precise types
- eliminate virtual dispatch
- replace vars by vals
- remove duplicate vals
Smart dead code elimination
methods | original | linker | proguard |
squeryl | 3000 | 400 | 1500 |
kiama | 15000 | 7200 | 13000 |
dotty | 35000 | 18000 | 32000 |
Converting classes to value classes
Class needs to follow the restrictions:
- Does not reach a callsite that calls AnyRef methods
- Does not have side effects in constructor
- Is effectively final
Closing the world
But what if some other program has a class that inherits from class that was marked as a value class?
- No problem, it has our TASTY and will tune us for it's own use-case.
- Binary compatibility not guaranteed
- source- and TASTY-compatibility provided instead
Eliminating virtual dispatch: inlining
- does not handle recursive methods
- increases size of program if method is frequently used
- can hurt performance if methods become huge
- is already performed by JVM or JS backend
Eliminating virtual dispatch: clonning
- handles recursive methods
- small increase in size
- no huge methods
- is actually just specialization for singleton types
Measurements
| original | proguard | linker |
methods | 1036 | 823 | 196 |
bytecode size | 500KB | 330KB | 180KB |
JVM code cache size(1000 runs) | 1700KB | 1500KB | 1000KB |
objects allocated | 136K | 120K | 39K |
Future ideas
- De-lazify lazy vals
- use offheap memory(scala-offheap)
- deduplicate methods