Removing outliers in Python

Posted by Francesco Gadaleta on August 7, 2015

In the attempt of removing outliers from a list of numbers without getting stuck on too much theory, I would like to share a fast solution that seems to work quite effectively (at least on the data at hand).

Here is a quite effective snippet that does the job. Don’t forget to comment, like, share, etc.

import matplotlib.pyplot as plt import scipy.signal as sps
raw = np.zeros(100) # create some outliers raw[54:57] = 1
raw[23:24] = 1 
raw[91] = -1 
raw[76:80] = -1 # too big to be outlier 
plt.plot(raw, 'bo', label='raw data') # filter outliers 
filtered = sps.medfilt(raw, kernel_size=7) 
plt.plot(filtered, 'r', label='Filtered profile') axis([0, len(raw), -1.5, 1.5]) 
plt.legend(loc=3, ncol=2, borderaxespad=0.) 

You can also download it from my Github account.

Happy filtering!

Before you go

If you enjoyed this post, you will love the newsletter of Data Science at Home. It’s my FREE digest of the best content in Artificial Intelligence, data science, predictive analytics and computer science. Subscribe!