IDL Speed Tips
The following table shows the results of speed comparisons between different ways of implementing the same operations.
It should be used as a guide when attempting to optimise IDL code. Results for scalar quantities are shown in
YELLOW and results for array quantities are shown in RED.
The number quoted in the third column is that which was used for the values of the scalar quantities
or array elements in the test. Note that the tests were performed on a computer with 4 CPUs, which means that results
may differ for other computers when IDL operations that employ the IDL thread pool are involved.
Further general optimisation hints are listed after the table.
Variable Type |
No. Elements |
Value |
Slower Expression(s) |
Faster Expression |
Factor |
| | | | | |
INTEGER SCALAR |
1 |
1 |
x = a LE 1 |
x = a LT 2 |
1.020 |
INTEGER VECTOR |
1000 |
1 |
x = a LE 1 |
x = a LT 2 |
1.015 |
INTEGER VECTOR |
1000000 |
1 |
x = a LE 1 |
x = a LT 2 |
1.012 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = a^2.0 |
x = a^2 |
1.40 |
FLOAT SCALAR |
1 |
1.0 |
x = a^3.0 |
x = a^3 |
1.40 |
FLOAT SCALAR |
1 |
1.0 |
x = a^4.0 |
x = a^4 |
1.43 |
FLOAT SCALAR |
1 |
1.0 |
x = a^5.0 |
x = a^5 |
1.40 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^2.0 |
x = a^2 |
7.34 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^3.0 |
x = a^3 |
8.43 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^4.0 |
x = a^4 |
6.17 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^5.0 |
x = a^5 |
6.84 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = a^2 |
x = a*a |
1.27 |
FLOAT SCALAR |
1 |
1.0 |
x = a*a*a |
x = a^3 |
1.15 |
FLOAT SCALAR |
1 |
1.0 |
x = a*a*a*a |
x = a^4 |
1.75 |
FLOAT SCALAR |
1 |
1.0 |
x = a*a*a*a*a |
x = a^5 |
2.16 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^2 |
x = a*a |
4.73 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^3 |
x = a*a*a |
2.23 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^4 |
x = a*a*a*a |
2.24 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^5 |
x = a*a*a*a*a |
1.57 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^2 |
x = a*a |
1.85 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^3 |
x = a*a*a |
1.34 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^4 |
x = a*a*a*a |
1.51 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^5 |
x = a*a*a*a*a |
1.21 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = a^0.5 |
x = sqrt(a) |
1.94 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^0.5 |
x = sqrt(a) |
35.33 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^0.5 |
x = sqrt(a) |
16.76 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = a^(-1) |
x = 1.0/a |
1.87 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^(-1) |
x = 1.0/a |
1.80 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^(-1) |
x = 1.0/a |
1.30 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = a^(-2)
x = 1.0/(a*a) |
x = 1.0/(a^2) |
1.11
1.00 |
FLOAT SCALAR |
1 |
1.0 |
x = a^(-3)
x = 1.0/(a*a*a) |
x = 1.0/(a^3) |
1.12
1.25 |
FLOAT SCALAR |
1 |
1.0 |
x = a^(-4)
x = 1.0/(a*a*a*a) |
x = 1.0/(a^4) |
1.13
1.73 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^(-2)
x = 1.0/(a^2) |
x = 1.0/(a*a) |
1.48
2.17 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^(-3)
x = 1.0/(a^3) |
x = 1.0/(a*a*a) |
1.09
1.55 |
FLOAT VECTOR |
1000 |
1.0 |
x = a^(-4)
x = 1.0/(a^4) |
x = 1.0/(a*a*a*a) |
1.32
1.71 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^(-2)
x = 1.0/(a^2) |
x = 1.0/(a*a) |
1.17
1.32 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^(-3)
x = 1.0/(a^3) |
x = 1.0/(a*a*a) |
1.00
1.31 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a^(-4)
x = 1.0/(a^4) |
x = 1.0/(a*a*a*a) |
1.13
1.32 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = a/2.0 |
x = 0.5*a |
1.00 |
FLOAT VECTOR |
1000 |
1.0 |
x = a/2.0 |
x = 0.5*a |
2.05 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a/2.0 |
x = 0.5*a |
1.33 |
FLOAT SCALAR |
1 |
1.0 |
x = -1.0*a |
x = -a |
1.53 |
FLOAT VECTOR |
1000 |
1.0 |
x = -1.0*a |
x = -a |
1.93 |
FLOAT VECTOR |
1000000 |
1.0 |
x = -1.0*a |
x = -a |
1.47 |
| | | | | |
FLOAT VECTOR |
1000 |
1.0 |
a = a + 1.0 |
a = temporary(a) + 1.0 |
1.06 |
FLOAT VECTOR |
1000000 |
1.0 |
a = a + 1.0 |
a = temporary(a) + 1.0 |
4.01 |
FLOAT VECTOR |
1000 |
1.0 |
x = a |
x = temporary(a) |
2.00 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a |
x = temporary(a) |
1200.0
FLOAT VECTOR |
1000 |
1.0 |
x = a + 1.0 |
x = temporary(a) + 1.0 |
1.00 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a + 1.0 |
x = temporary(a) + 1.0 |
2.89 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = min([a1,a2]) |
x = a1 < a2 |
5.23 |
FLOAT SCALAR |
1 |
1.0 |
x = max([a1,a2]) |
x = a1 > a2 |
6.65 |
| | | | | |
DOUBLE VECTOR |
1000 |
1.0 |
x = total(vec1*vec2, /DOUBLE) |
x = calculate_dot_product_vector(vec1, vec2, status, /NO_PAR_CHECK) |
2.14 |
DOUBLE VECTOR |
1000000 |
1.0 |
x = total(vec1*vec2, /DOUBLE) |
x = calculate_dot_product_vector(vec1, vec2, status, /NO_PAR_CHECK) |
2.05 |
| | | | | |
FLOAT VECTOR |
1000 |
0.0 |
x = fltarr(1000L) |
x = fltarr(1000L, /NOZERO) |
1.06 |
FLOAT VECTOR |
1000000 |
0.0 |
x = fltarr(1000000L) |
x = fltarr(1000000L, /NOZERO) |
2.12 |
FLOAT VECTOR |
1000 |
1.0 |
x = fltarr(1000L, /NOZERO) & x[*] = 1.0 |
x = replicate(1.0, 1000L) |
2.22 |
FLOAT VECTOR |
1000000 |
1.0 |
x = fltarr(1000000L, /NOZERO) & x[*] = 1.0 |
x = replicate(1.0, 1000000L) |
3.57 |
FLOAT VECTOR |
1000 |
1.0 |
x[*] = 0.0 |
replicate_inplace, x, 0.0 |
2.39 |
FLOAT VECTOR |
1000000 |
1.0 |
x[*] = 0.0 |
replicate_inplace, x, 0.0 |
10.10 |
| | | | | |
FLOAT VECTOR |
1000 |
0.0 |
a[*] = 1.0 |
a[0] = replicate(1.0, 1000L) |
1.88 |
FLOAT VECTOR |
1000000 |
0.0 |
a[*] = 1.0 |
a[0] = replicate(1.0, 1000000L) |
2.40
FLOAT ARRAY |
(1000,1000) |
0.0 |
a[*,*] = 1.0 |
a[0,0] = replicate(1.0, 1000L, 1000L) |
1.45 |
FLOAT ARRAY |
(1000,1000) |
0.0 |
a[0,0] = replicate(1.0, 1L, 1000L) |
a[0,*] = 1.0 |
3.70 |
FLOAT ARRAY |
(1000,1000) |
0.0 |
a[*,0] = 1.0 |
a[0,0] = replicate(1.0, 1000L) |
2.28 |
FLOAT ARRAY |
(1000,1000) |
0.0 |
a[0,0] = reform(findgen(1000L), 1L, 1000L) |
a[0,*] = findgen(1000L) |
2.91 |
FLOAT ARRAY |
(1000,1000) |
0.0 |
a[*,0] = findgen(1000L) |
a[0,0] = findgen(1000L) |
2.46 |
FLOAT ARRAY |
(10,1000,1000) |
0.0 |
a[0,0,0] = reform(findgen(1000L,1000L), 1L, 1000L, 1000L) |
a[0,*,*] = findgen(1000L,1000L) |
3.35 |
| | | | | |
FLOAT VECTOR |
1000 |
1.0 |
subs = where(a EQ 1.0, n) |
n = long(total(a, /DOUBLE)) |
1.71 |
FLOAT VECTOR |
1000000 |
1.0 |
subs = where(a EQ 1.0, n) |
n = long(total(a, /DOUBLE)) |
2.87 |
FLOAT VECTOR |
1000 |
1.0 |
n = long(total(1.0 - a, /DOUBLE)) |
subs = where(a NE 1.0, n) |
1.32 |
FLOAT VECTOR |
1000000 |
1.0 |
n = long(total(1.0 - a, /DOUBLE)) |
subs = where(a NE 1.0, n) |
1.89 |
FLOAT VECTOR |
1000 |
1.0 |
n = long(total(1.0 - a, /DOUBLE)) |
subs = where(a EQ 0.0, n) |
1.32 |
FLOAT VECTOR |
1000000 |
1.0 |
n = long(total(1.0 - a, /DOUBLE)) |
subs = where(a EQ 0.0, n) |
1.89 |
| | | | | |
FLOAT SCALAR |
1 |
1.0 |
x = exp(complex(0.0,a)) |
x = complex(cos(a),sin(a)) |
1.03 |
FLOAT VECTOR |
1000 |
1.0 |
x = exp(complex(0.0,a)) |
x = complex(cos(a),sin(a)) |
3.39 |
FLOAT VECTOR |
1000000 |
1.0 |
x = exp(complex(0.0,a)) |
x = complex(cos(a),sin(a)) |
3.34 |
FLOAT SCALAR |
1 |
1.0 |
x = a*exp(complex(0.0,a))
x = complex(a*cos(a),a*sin(a)) |
x = a*complex(cos(a),sin(a)) |
1.01
1.16 |
FLOAT VECTOR |
1000 |
1.0 |
x = a*exp(complex(0.0,a))
x = a*complex(cos(a),sin(a)) |
x = complex(a*cos(a),a*sin(a)) |
3.28
1.03 |
FLOAT VECTOR |
1000000 |
1.0 |
x = a*exp(complex(0.0,a))
x = a*complex(cos(a),sin(a)) |
x = complex(a*cos(a),a*sin(a)) |
3.21
1.06 |
| | | | | |
| |
The following general optimisation hints may also be useful:
- Needless to say, one should use array operations instead of "for" loops where possible, since IDL is optimised for such calculations.
- Avoid unnecessary variable type conversions by using variables of the same type where possible. This way IDL does not have to do the
conversion itself.
- Ensure that operations on scalars are performed BEFORE the result is applied to an array, otherwise IDL will perform more array operations
than is strictly necessary. For example:
SLOWER (2000000 additions): a = fltarr(1000,1000) & b = 5.0 & c = 3.1 & x = (a + b) + c
FASTER (1000001 additions): a = fltarr(1000,1000) & b = 5.0 & c = 3.1 & x = a + (b + c)
- If an array is being operated on and the result is overwriting the original array, then use the "temporary" function in the expression
on the right-hand side, which avoids a copy of the original array being made before the operation is performed, resulting in a
faster execution and less memory being used. Also use the "temporary" function on the right-hand side of an expression for a
variable that will not be used again throughout the rest of the code, which again results in a faster execution and immediately
frees up memory for the rest of the program. See the above table for some examples. Note that the "temporary" function should not
be used on scalar quantities.
- When initialising an array with IDL functions like "intarr", "fltarr", etc., use the "/NOZERO" keyword if possible to leave the
array elements undefined, otherwise the array will be initialised with all elements set to zero, which may be unnecessary. See
the above table for an example.
- When inserting a sub-array into a larger array of the same dimensions, it is only necessary to specify the lowest subscript in
each dimension for where the sub-array is to be inserted, rather than specifying the full subscript range in each dimension. This
saves IDL from having to generate the subscripts for the relevant portion of the larger array before carrying out the insertion
operation. For example:
SLOWER: imdata[10:20,0:300] = imcutout
FASTER: imdata[10,0] = imcutout
- When extracting a sub-array from a larger array, use the IDL "reform" function to give the sub-array the required dimensions. This
avoids having to previously set up an array of the required dimensions to hold the sub-array, and then insert the sub-array into the
new array. For example:
SLOWER: vec = dblarr(n, /NOZERO) & vec[*] = arr[0,*]
FASTER: vec = reform(arr[0,*], n)
- When comparing a scalar with an array (e.g. using the greater than comparison, etc.), one should be aware that if the scalar and array are of
different number types, then IDL will convert the scalar and array to the more precise of the two number types. If it is the scalar
that is of the more precise number type than the array, then the type conversion takes place on the array, which can introduce a large
processing overhead. This processing overhead can be avoided by converting (or writing) the scalar to the number type of the array.
For example:
For an array "arr" of type BYTE and a scalar "val" of type INTEGER:
SLOWER: subs = where(arr EQ val, nsubs)
FASTER: subs = where(arr EQ byte(val), nsubs)
- Array concatenation such as:
arr = [arr, extra]
is very slow in IDL. It is faster to define an empty array of the correct size and insert the original array along with the
new information. However, this is not always possible, especially if the size of the new information is not known in advance.
- The IDL "array_equal" function is a fast way to compare data for equality in situations where the index of the elements that differ
are not of interest. This operation is much faster than using "total(a NE b)", because it stops the comparison as soon as the first
inequality is found, an intermediate array is not created, and only one pass is made through the data. For best speed, ensure that
the operands are of the same data type.
This site is © Copyright Daniel Bramich 2021, All Rights Reserved.
Free web templates
|