Losing your Loops Fast Numerical Computing with NumPy

By: PyCon 2015

436   3   21982

Uploaded on 04/11/2015

Comments (7):

By anonymous    2017-09-20

Instead of substituting your for-loops with lambdas, try substituting them with ufuncs.

Losing Your Loops: Fast Numerical Computation with Numpy is an excellent talk by Jake Vanderplass on the subject. Using universal functions and broadcasting instead of for-loops can dramatically improve the speed of your code.

Here is a basic example:

import numpy as np
from time import time

def timed(func):
    def inner(*args, **kwargs):
        t0 = time()
        result = func(*args, **kwargs)
        elapsed = time()-t0
        print(f'ran {func.__name__} in {elapsed} seconds)')
        return result
    return inner
# without broadcasting:

@timed
def sums():
    sums = np.zeros([500, 500])
    for a in range(500):
        for b in range(500):
            sums[a, b] = a+b
    return sums

@timed
def sums_broadcasted(): 
    a = np.arange(500)
    b = np.reshape(np.arange(500), [500, 1])
    return a+b

INPUT:

sums()
sums_broadcasted()
assert (a==b).all()

OUTPUT:

ran sums in 0.030008554458618164 seconds
ran sums_broadcasted in 0.0005011558532714844 seconds

Note by eliminating our loops we have a 60x speedup!

Original Thread

By anonymous    2017-09-20

You can do this with numpy built-ins using broadcasting. Broadcasting allows you to add together two arrays of different shapes without making excessive copies or looping excessively.

We can solve your problem by creating two vectors representing the row and column sums respectively, and 'multiplying' them together, which will broadcast them into a correctly sized and shaped array.

The best introduction to this topic I know of is the talk Losing Your Loops: Fast Numerical Computation with Numpy by Jake Vanderplass. It contains visual examples that I find essential for wrapping your head around broadcasting.

Here's a simple example:

IN

import numpy as np
a = np.arange(3)
b = np.reshape(np.arange(3), [3, 1])
print('a = ', a)
print('b = ')
print(b)
print('a+b = ')
print(a+b)

OUT:

a = [0 1 2]
b =
[[0]
 [1]
 [2]]
a+b =
[[0 1 2]
 [1 2 3]
 [2 3 4]]

We can solve your problem by creating two vectors representing the row and column sums respectively 'multiplying' them together, broadcasting them into a correctly sized and shaped array.

import numpy as np
def gen_expected(array: np.ndarray):
    col_sums = (np.sum(array, axis=0))
    row_sums = np.sum(array, axis=1)
    np.reshape(row_sums, [len(row_sums), 1])
    return (col_sums * row_sums)  / np.sum(array)
# NOTE: this result might be transposed! Check it yourself!

Original Thread

By anonymous    2017-11-27

I have some data that I want to "one-hot encode" and it is represented as a 1-dimensional vector of positions.

Is there any function in NumPy that can expand my x into my x_ohe?

I'm trying to avoid using for-loops in Python at all costs for operations like this after watching Jake Vanderplas's talk

x = np.asarray([0,0,1,0,2])
x_ohe = np.zeros((len(x), 3), dtype=int)
for i, pos in enumerate(x):
    x_ohe[i,pos] = 1
x_ohe
# array([[1, 0, 0],
#        [1, 0, 0],
#        [0, 1, 0],
#        [1, 0, 0],
#        [0, 0, 1]])

Original Thread

By anonymous    2018-01-07

I already did! Its more of an abstract question really, I saw this vid and it blew my mind https://www.youtube.com/watch?v=EEUXKG97YRw. I was just trying to apply it to my work. No worries tough, I will have a look at the pivot table post you linked to. Thanks for your help.

Original Thread

By anonymous    2018-08-01

Check [this *outstanding* video](https://www.youtube.com/watch?v=EEUXKG97YRw) from Jake Vanderplas (one of the core developers of scikit learn). Very simple language, very intuitive. Don't think can explain better

Original Thread

Submit Your Video

If you have some great dev videos to share, please fill out this form.