If you have very small tasks that are definitely running after each other, it is useful to set this property to -1 (meaning that a spawned JVM will be reused unlimited times). So you just spawn (number of task in your cluster available to your job)-JVMs instead of (number of tasks)-JVMs.
This is a huge performance improvement. In long running jobs the percentage of the runtime in comparision to setup a new JVM is very low, so it doesn't give you a huge performance boost.
Also in long running tasks it is good to recreate the task process, because of issues like heap fragmentation degrading your performance.
The value of this is 1 for good reason, it's much safer. You're more likely to have problems in the state of a persisting jvm instance affecting subsequent tasks in that instance when using jvm reuse.
MR2 doesn't support jvm reuse at all.
I would not change this setting from 1 unless you have a very strong reason to use it and know exactly what you are doing and have all perfect MR jobs on your cluster (not likely!).
No comments:
Post a Comment