J
A very simple option, since You're sure you only have one column worth 1 (apart from count obviously) by the previous filter, you can use http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.idxmax.html to obtain the name of the column containing that value:df["Tag"] = df[['tag_html', 'tag_css', 'tag_javascript']].idxmax(axis = 1)
Starting from your original DataFrame you can do something like this:In [1]: import pandas as pd
df = pd.DataFrame({"tag_html": [0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0],
"tag_css": [0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0],
"tag_javascript": [0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0],
"count": [8655, 141, 782, 107, 96, 20, 46, 153]
}, columns = ["tag_html", "tag_css",
"tag_javascript", "count"]
)
In [2]: mask = df[['tag_html', 'tag_css', 'tag_javascript']].eq(1).sum(axis = 1) == 1
In [3]: df["tag"] = df[mask][['tag_html', 'tag_css', 'tag_javascript']].idxmax(axis = 1)
In [4]: df[mask][["count", "tag"]]
Out[4]:
count tag
1 141 tag_html
2 782 tag_javascript
4 96 tag_css
If instead of a view you want a different DataFrame simply use
ctags = df[mask][["count", "tag"]].copy()Edition:I add a more widespread explanation of the two fundamental lines of the previous code:mask = df[['tag_html', 'tag_css', 'tag_javascript']].eq(1).sum(axis = 1) == 1df[['tag_html', 'tag_css', 'tag_javascript']].eq(1) simply itera on each of the values of the selected columns checking that values are equal to 1, that is, we get: tag_html tag_css tag_javascript
0 False False False
1 True False False
2 False False True
3 True False True
4 False True False
5 False True True
6 True True True
7 True True False
If we are sure that the DataFrame contains only 1 or 0 in these columns, this step is not necessary. Can be applied sum directly.With .sum(axis = 1) == 1 we create a bolean mask that will only validate the rows in which the number of values True Be 1. Remember that False/True They are in essence 0/1, so sum([True, False, True]) is 2.). With this we get the following mask:0 False
1 True
2 True
3 False
4 True
5 False
6 False
7 False
dtype: bool
This mask can be applied on df to get the rows that only contain one 1:In[1] : df[mask]
Out[1]:
tag_html tag_css tag_javascript count
1 1.0 0.0 0.0 141
2 0.0 0.0 1.0 782
4 0.0 1.0 0.0 96
df["tag"] = df[mask][['tag_html', 'tag_css', 'tag_javascript']].idxmax(axis = 1): We create a new column in the DataFrame (df["tag"]) with the values returned by idxmax when applied on rows that only have 1 (df[mask]) and on the three columns that interest us ([['tag_html', 'tag_css', 'tag_javascript']]). Indicate axis = 1 We make it apply on the rows. idxmax returns the index (axis = 0) or column (axis = 1) that has the maximum value. As we know for sure there is only 1 among the values, we will always get the owner column of 1 as it is the maximum value of the three. We finally get:In[2] : df[mask][['tag_html', 'tag_css', 'tag_javascript']].idxmax(axis = 1)
Out[3]:
1 tag_html
2 tag_javascript
4 tag_css
dtype: object