Python/pandas: Column value in list
I’ve been using Python’s pandas library while exploring some CSV files and although for the most part I’ve found it intuitive to use, I had trouble filtering a data frame based on checking whether a column value was in a list.
A subset of one of the CSV files I’ve been working with looks like this:
$ cat foo.csv 'Foo' 1 2 3 4 5 6 7 8 9 10
Loading it into a pandas data frame is reasonably simple:
import pandas as pd df = pd.read_csv('foo.csv', index_col=False, header=0) >>> df Foo 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
If we want to find the rows which have a value of 1 we’d write the following:
>>> df[df['Foo'] == 1] Foo 0 1
Finding the rows with a value less than 7 is as you’d expect too:
>>> df[df['Foo'] < 7] Foo 0 1 1 2 2 3 3 4 4 5 5 6
Next I wanted to filter out the rows containing odd numbers which I initially tried to do like this:
odds = [i for i in range(1,10) if i % 2 <> 0] >>> odds [1, 3, 5, 7, 9] >>> df[df['Foo'] in odds] Traceback (most recent call last): File '<stdin>', line 1, in <module> File '/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/pandas/core/generic.py', line 698, in __nonzero__ .format(self.__class__.__name__)) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Unfortunately that doesn’t work and I couldn’t get any of the suggestions from the error message to work either. Luckily pandas has a special isin function for this use case which we can call like this:
>>> df[df['Foo'].isin(odds)] Foo 0 1 2 3 4 5 6 7 8 9
Much better!
Reference: | Python/pandas: Column value in list from our WCG partner Mark Needham at the Mark Needham Blog blog. |