Tuesday, January 10, 2017

Zopfli on Odroid XU4

Lately I got myself Odroid XU4 microcomputer based on Exynoss 5422 octa core (4x 2Ghz + 4x 1.4Ghz). Surprisingly it can run all 8 cores simultaneously. Unfortunately its size is a bit bigger than Odroid U3 so if You made Yourself some custom made multi-odroid chassis, You would need to rebuild it if You plan to put XU4 in there as well. I mainly bought it to have headless OS with Zopfli KrzYmod running on, doing 999,999 iterations per block. I can confirm that it's around 50% faster than Odroid U3, with single threaded zopfli compression. In my small test of compressing a file using multi-threaded compression it was 55% faster than Odroid U3 O/Ced at 1.92Ghz (limited to 6 threads and various block sizes so it couldn't show its full potential, which I would believe could be up to 75% faster).



Unfortunately it's getting up to 95*C temperature when using all 4 big cores + 1-2 small cores, when small cores are not used the temperature goes to 93*C max. I'm now waiting for Grizzly Kryonaut thermal paste to arrive to apply it on all my RPis and Odroids (and later to apply better cooling solutions in case that won't help much) to make sure they won't get throttled much, especially during summer here.

There is also the same problem that occurs on Odroid U3 being SIGSEGVs with zopfli when long running certain blocks of data. The kernel seems newer on Odroid XU4, so I'm still confused by this error as it is not occurring with x86/x64 builds, might be GCC to generate bogus compilation, or one of the special switches I pass to it for faster builds.

Some tests:

                |       ORIGINAL -O2                   |  Mr_KrzYch00's Zopfli KrzYmod  |
       X        |-----------------------------------------------------------------------|
                | 2016.04.20 | 2016.05.19 | Makefile*  | v16.5.22 --t0 | v16.5.22 --t99 |
----------------------------------------------------------------------------------------|
           real | 15m35.201s | 14m6.836s  | 11m21.864s | 10m56.287s    | 7m11.551s      | - ARM Cortex-A9 - quad @ 1.92Ghz
Odroid U3  user | 14m40.965s | 14m6.480s  | 11m21.580s | 10m55.970s    | 19m30.025s     | (quad-core)
NEON       sys  | 0m53.925s  | 0m0.125s   | 0m0.095s   | 0m0.135s      | 0m0.805s       |
----------------------------------------------------------------------------------------|
           real |            |            |            | 7m21.517s     | 4m40.129s      | - ARM Cortex-A7 - quad @ 1.4Ghz
Odroid XU4 user | [missing]  | [missing]  | [missing]  | 7m21.345s     | 8m27.750s      | - ARM Cortex-A15 - quad @ 2.0Ghz
NEON+VFPV4 sys  |            |            |            | 0m0.080s      | 0m0.180s       | (octa-core)
----------------------------------------------------------------------------------------|
           real | 7m41.104s  | 5m50.010s  | 4m31.952s  | 4m1.706s      | 2m35.052s      | - Core i7-3630QM @ 2.7Ghz
Fedora x86 user | 7m20.247s  | 5m39.968s  | 4m24.474s  | 3m53.982s     | 6m27.671s      | (quad core)
rawhideAVX sys  | 0m9.306s   | 0m0.242s   | 0m0.071s   | 0m0.268s      | 0m2.843s       |
----------------------------------------------------------------------------------------|
           real | 5m59.811s  | 4m25.060s  | 3m38.561s  | 3m31.905s     | 2m17.897s      |
Fedora x64 user | 5m56.214s  | 4m24.860s  | 3m38.295s  | 3m31.776s     | 5m57.761s      |
rawhideAVX sys  | 0m3.543s   | 0m0.118s   | 0m0.208s   | 0m0.056s      | 0m1.636s       |


* - same original Zopfli 2016.05.19 with Zopfli KrzYmod's makefile + profile guided optimizations:
        https://github.com/MrKrzYch00/zopfli/blob/master/Makefile
  --t99 is to force all blocks compression at once - 6 threads used

No comments: