4 AccFFT *execute* functions get a double timer of size 5,
5 where the timing for different parts of the algorithm is written to:
7 1. timer[0]: Total global transpose time.
8 2. timer[4]: Local FFT execution time.
10 It is recommended that you perform 1-2 warmup runs by calling the
11 corresponding *execute* function, before profiling