numactl --interleave=all ./testing_dgeev -RN -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.1  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
ndevices 3
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_dgeev [options] [-h|--help]

    N   CPU Time (sec)   GPU Time (sec)   |W_magma - W_lapack| / |W_lapack|
===========================================================================
  100     ---             0.0110
 1000     ---             0.8580
   10     ---             0.0003
   20     ---             0.0005
   30     ---             0.0011
   40     ---             0.0035
   50     ---             0.0038
   60     ---             0.0029
   70     ---             0.0049
   80     ---             0.0069
   90     ---             0.0076
  100     ---             0.0105
  200     ---             0.0511
  300     ---             0.0940
  400     ---             0.1513
  500     ---             0.2043
  600     ---             0.4025
  700     ---             0.4948
  800     ---             0.6284
  900     ---             0.7352
 1000     ---             0.8453
 2000     ---             2.7113
 3000     ---             8.3927
 4000     ---            13.1533
 5000     ---            19.8204
 6000     ---            35.3721
 7000     ---            47.4245
 8000     ---            60.2085
 9000     ---            72.0642
10000     ---            85.6638
12000     ---           127.5793
14000     ---           172.5071
16000     ---           233.0806
18000     ---           296.6084
20000     ---           377.1261

numactl --interleave=all ./testing_dgeev -RV -N 100 -N 1000 --range 10:90:10 --range 100:900:100 --range 1000:9000:1000 --range 10000:20000:2000
MAGMA 1.6.1  compiled for CUDA capability >= 3.5
CUDA runtime 7000, driver 7000. OpenMP threads 16. MKL 11.2.3, MKL threads 16. 
ndevices 3
device 0: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 1: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
device 2: Tesla K40c, 745.0 MHz clock, 11519.6 MB memory, capability 3.5
Usage: ./testing_dgeev [options] [-h|--help]

    N   CPU Time (sec)   GPU Time (sec)   |W_magma - W_lapack| / |W_lapack|
===========================================================================
  100     ---             0.0176
 1000     ---             1.0784
   10     ---             0.0020
   20     ---             0.0025
   30     ---             0.0030
   40     ---             0.0073
   50     ---             0.0083
   60     ---             0.0070
   70     ---             0.0108
   80     ---             0.0126
   90     ---             0.0139
  100     ---             0.0160
  200     ---             0.0743
  300     ---             0.1380
  400     ---             0.2666
  500     ---             0.3481
  600     ---             0.4816
  700     ---             0.6642
  800     ---             0.7633
  900     ---             0.9096
 1000     ---             1.1074
 2000     ---             3.7380
 3000     ---             9.7185
 4000     ---            16.8563
 5000     ---            25.8547
 6000     ---            44.7942
 7000     ---            58.4372
 8000     ---            77.0922
 9000     ---           100.3287
10000     ---           121.8376
12000     ---           189.8740
14000     ---           256.4798
16000     ---           366.9515
18000     ---           463.3130
20000     ---           611.5157
