在这里插入图片描述

文章目录

一、布尔索引二、between()三、isin()1. 单列筛选2. 多列筛选3. 通过字典的形式传递多个条件4. 删除异常值所在行5. isnotin实现四、loc、iloc（重要）0. 创建DataFrame1. 提取行数据2. 提取列数据3. 提取多列数据4. 提取指定行、指定列数据5. 提取所有数据6. 提取指定数据行 ?参与抽粉丝送书啦

在数据分析清洗数据过程中，可能需要会滤掉、删除DataFrame中一些行，本文将介绍常用的筛选方法。

一、布尔索引

布尔索引可以用于判断和筛选

>>> import pandas as pd>>> import numpy as np>>>>>> df = pd.DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C'])>>> print(df)          A         B         C0 -0.595510 -1.349175 -0.3139181  1.130604 -2.094348 -0.4491822  1.745407 -0.136642 -0.943479>>>>>> # 布尔索引判断：A列大于1的数>>> print(df['A'] > 1)0    False1     True2     TrueName: A, dtype: bool>>>>>> # 布尔索引筛选：A列中大于1的行>>> print(df[df['A'] > 1])          A         B         C1  1.130604 -2.094348 -0.4491822  1.745407 -0.136642 -0.943479

二、between()

between(left,right)，筛选指定区间的行

>>> import pandas as pd>>>>>> data = {'name': ['小红', '小明', '小白', '小黑'], 'age': [10, 20, 30, 25]}>>> df = pd.DataFrame(data)>>> print(df)  name  age0   小红   101   小明   202   小白   303   小黑   25>>>>>> # 判断年龄是否在20-30之间>>> print(df['age'].between(20, 30))0    False1     True2     True3     TrueName: age, dtype: bool>>> # 筛选年龄在20-30之间的行>>> print(df[df['age'].between(20, 30)])  name  age1   小明   202   小白   303   小黑   25

三、isin()

isin()接收一个列表，可以同时判断数据是否与多个值相等，若与其中的某个值相等则返回True，否则则为False

创建DataFrame：

>>> import pandas as pd>>> import numpy as np>>>>>> data = [['foo', 'one', 'small', 1], ['foo', 'one', 'large', 5],...         ['bar', 'one', 'small', 10], ['bar', 'two', 'samll', 10],...         ['bar', 'two', 'large', 50]]>>> df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])>>> print(df)     A    B      C   D0  foo  one  small   11  foo  one  large   52  bar  one  small  103  bar  two  samll  104  bar  two  large  50

1. 单列筛选

df[df[列名].isin([异常值])]

>>> # 1. 接收一个值：判断A列中的值是否为foo>>> df['A'].isin(['foo'])0     True1     True2    False3    False4    FalseName: A, dtype: bool>>>>>> # 2. 接收多个值：判断A列中的值是否为foo，bar>>> df['A'].isin(['foo','bar'])0    True1    True2    True3    True4    TrueName: A, dtype: bool

2. 多列筛选

同时满足用&连接，或的话用 | 连接

筛选出每列都有异常值的行：df[df[列名].isin([异常值])& df[列名].isin([异常值])]

>>> # 筛选中A列中等于bar，并且B列中等于one的行>>> df[df['A'].isin(['bar'])& df['B'].isin(['one'])]     A    B      C   D2  bar  one  small  10

筛选出至少有一列有异常值的行：df[df[列名].isin([异常值])| df[列名].isin([异常值])]

>>> # 筛选中A列中等于bar，或者B列中等于one的行>>> df[df['A'].isin(['bar']) | df['B'].isin(['one'])]     A    B      C   D0  foo  one  small   11  foo  one  large   52  bar  one  small  103  bar  two  samll  104  bar  two  large  50

3. 通过字典的形式传递多个条件

{‘某列’:[条件],‘某列’:[条件],}

# 这种方法不符合的位置都会显示NAN>>> df[df.isin({'A':['bar'],'C':['small']})]     A    B      C   D0  NaN  NaN  small NaN1  NaN  NaN    NaN NaN2  bar  NaN  small NaN3  bar  NaN    NaN NaN4  bar  NaN    NaN NaN

4. 删除异常值所在行

因为isin()返还的是boolean的DataFrame，在里面的是True，不在里面的是False，所以我们只需要对它进行异或取反即可。

# 删除A列中foo的行>>> df[True^df['A'].isin(['foo'])]     A    B      C   D2  bar  one  small  103  bar  two  samll  104  bar  two  large  50

5. isnotin实现

前面加上 ~

# 删除A列中foo的行>>> df[~(df['A']=='foo')]     A    B      C   D2  bar  one  small  103  bar  two  samll  104  bar  two  large  50

四、loc、iloc（重要）

loc()函数和iloc()函数的区别在于：

loc()函数是通过索引名称提取数据iloc()函数通过行和列的下标提取数据

0. 创建DataFrame

>>> import pandas as pd>>>>>> data = [['foo', 'one', 'small', 1], ['foo', 'one', 'large', 5],...         ['bar', 'one', 'small', 10], ['bar', 'two', 'samll', 10],...         ['bar', 'two', 'large', 50]]>>> df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'], index=['a', 'b', 'c', 'd', 'e'])>>> print(df)     A    B      C   Da  foo  one  small   1b  foo  one  large   5c  bar  one  small  10d  bar  two  samll  10e  bar  two  large  50

1. 提取行数据

>>> # loc取索引为a的行（第一行）>>> df.loc['a']A      fooB      oneC    smallD        1Name: a, dtype: object>>>>>> # iloc取索引为a的行（第一行）>>> df.iloc[0]A      fooB      oneC    smallD        1Name: a, dtype: object

2. 提取列数据

>>> # loc取A列所有行>>> df.loc[:, ['A']]     Aa  foob  fooc  bard  bare  bar>>>>>> # iloc取A列所有行>>> df.iloc[:,[0]]     Aa  foob  fooc  bard  bare  bar

3. 提取多列数据

（1）连续多列：

>>> # loc取A，B，C列所有行>>> df.loc[:, ['A', 'B', 'C']]     A    B      Ca  foo  one  smallb  foo  one  largec  bar  one  smalld  bar  two  samlle  bar  two  large>>>>>> # iloc取A，B，C列所有行>>> df.iloc[:, 0:3]     A    B      Ca  foo  one  smallb  foo  one  largec  bar  one  smalld  bar  two  samlle  bar  two  large

（2）不连续多列

>>> # loc取A，D列所有行>>> df.loc[:, ['A', 'D']]     A   Da  foo   1b  foo   5c  bar  10d  bar  10e  bar  50>>>>>> # iloc取A，D列所有行>>> df.iloc[:, [0,3]]     A   Da  foo   1b  foo   5c  bar  10d  bar  10e  bar  50

4. 提取指定行、指定列数据

>>> # loc取索引为a、d，并且列名也为A、D的行和列>>> df.loc[['a', 'd'], ['A', 'D']]     A   Da  foo   1d  bar  10>>>>>> # iloc取索引为a、d，并且列名也为A、D的行和列>>> df.iloc[[0, 3], [0, 3]]     A   Da  foo   1d  bar  10

5. 提取所有数据

>>> # loc取全部>>> df.loc[:,:]     A    B      C   Da  foo  one  small   1b  foo  one  large   5c  bar  one  small  10d  bar  two  samll  10e  bar  two  large  50>>>>>> # iloc取全部>>> df.iloc[:,:]     A    B      C   Da  foo  one  small   1b  foo  one  large   5c  bar  one  small  10d  bar  two  samll  10e  bar  two  large  50

6. 提取指定数据行

利用loc可以对值进行筛选

>>> # loc取A列值为foo的行>>> df.loc[df['A'] == 'foo']     A    B      C  Da  foo  one  small  1b  foo  one  large  5>>>>>> # loc取D值大于等于10的行>>> df.loc[df['D'] >= 10]     A    B      C   Dc  bar  one  small  10d  bar  two  samll  10e  bar  two  large  50

?参与抽粉丝送书啦

书籍展示：《贝叶斯算法与机器学习》

在这里插入图片描述

【书籍内容简介】

涵盖了贝叶斯概率、概率估计、贝叶斯分类、随机场、参数估计、机器学习、深度学习、贝叶斯网络、动态贝叶斯网络、贝叶斯深度学习等。本书涉及的应用领域包含机器学习、图像处理、语音识别、语义分析等。本书整体由易到难，逐步深入，内容以算法原理讲解和应用解析为主，每节内容辅以案例进行综合讲解。

也有不想靠抽，想自己买的同学可以参考下面的链接：

当当自营购买链接：http://product.dangdang.com/29478966.html

张士玉小黑屋

当前位置：首页 » 《随便一记》 » 正文

100天精通Python（数据分析篇）——第69天：Pandas常用数据筛选方法（between、isin、loc、iloc）

4 人参与 2023年01月12日 08:06 分类 : 《随便一记》评论

文章目录

一、布尔索引

二、between()

三、isin()

1. 单列筛选

2. 多列筛选

3. 通过字典的形式传递多个条件

4. 删除异常值所在行

5. isnotin实现

四、loc、iloc（重要）

0. 创建DataFrame

1. 提取行数据

2. 提取列数据

3. 提取多列数据

4. 提取指定行、指定列数据

5. 提取所有数据

6. 提取指定数据行

?参与抽粉丝送书啦

评论（0）

赞助本站

search zhannei

最新文章

张士玉小黑屋

当前位置：首页 » 《随便一记》 » 正文

100天精通Python（数据分析篇）——第69天：Pandas常用数据筛选方法（between、isin、loc、iloc）

4 人参与 2023年01月12日 08:06 分类 : 《随便一记》 评论

文章目录

一、布尔索引

二、between()

三、isin()

1. 单列筛选

2. 多列筛选

3. 通过字典的形式传递多个条件

4. 删除异常值所在行

5. isnotin实现

四、loc、iloc（重要）

0. 创建DataFrame

1. 提取行数据

2. 提取列数据

3. 提取多列数据

4. 提取指定行、指定列数据

5. 提取所有数据

6. 提取指定数据行

?参与抽粉丝送书啦

评论（0） 赞助本站

search zhannei

最新文章

4 人参与 2023年01月12日 08:06 分类 : 《随便一记》评论

评论（0）

赞助本站