当前位置:首页 » 《随便一记》 » 正文

torch.load() 、torch.load_state_dict() 详解

17 人参与  2023年04月05日 11:05  分类 : 《随便一记》  评论

点击全文阅读


?‍?个人简介: 深度学习图像领域工作者
?总结链接:
             链接中主要是个人工作的总结,每个链接都是一些常用demo,代码直接复制运行即可。包括:
                    ?1.工作中常用深度学习脚本
                    ?2.torch、numpy等常用函数详解
                    ?3.opencv 图片、视频等操作
                    ?4.个人工作中的项目总结(纯干活)
              链接: https://blog.csdn.net/qq_28949847/article/details/128552785
?视频讲解: 以上记录,通过B站等平台进行了视频讲解使用,可搜索 ‘Python图像识别’ 进行观看
              B站:Python图像识别
              抖音:Python图像识别
              西瓜视频:Python图像识别


1. torch.load()

函数格式为:torch.load(f, map_location=None, pickle_module=pickle, **pickle_load_args)一般我们使用的时候,基本只使用前两个参数。
map_location参数: 具体来说,map_location参数是用于重定向,比如此前模型的参数是在cpu中的,我们希望将其加载到cuda:0中。或者我们有多张卡,那么我们就可以将卡1中训练好的模型加载到卡2中,这在数据并行的分布式深度学习中可能会用到。

(1)map_location=None
不指定map_location,默认以训练保存模型时的位置加载,也就是训练在cuda:0,在不指定map_location时,load也是在cuda:0上,相应的训练在cuda:1,那么load也在cuda:1上

model = HighResolutionNet(base_channel=32, num_joints=17)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth")# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为:

cuda:0

此处结果为 cuda:0,是因为加载的模型是在cuda:0上训练的,所以加载进来也是。

(2)map_location=cpu

将模型参数加载在CPU上

model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location='cpu')print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为:

cpu

模型从cuda:0变成了cpu

(3)map_location={xx:xx}

model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location={'cuda:0':'cuda:1'})print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为:

cuda:1

模型从cuda:0变成了cuda:1

model = HighResolutionNet(base_channel=32, num_joints=17)print('model:', model)weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location={'cuda:1':'cuda:2'})print('weights_dict:', weights_dict)# 打印模型权重所在的位置print(weights_dict['conv1.weight'].device)print('weights_dict.keys():', weights_dict.keys())

结果为:

cuda:0

模型还是cuda:0,并没有变成cpu。因为这个map_location的映射是不对的,原始的模型就是cuda:0,而映射是cuda:2到cpu,是不对的。这种情况下,map_location返回None,也就是和不加map_location相同。


2. torch.load_state_dict()

在pytorch中构建好一个模型后,一般需要将torch.load()的预训练权重加载到自己的模型重。torch.load_state_dict()函数就是用于将预训练的参数权重加载到新的模型之中,操作方式如下所示:

# 模型初始化model = HighResolutionNet(base_channel=32, num_joints=17)# 读取官方的模型参数weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location='cpu')# 加载官方模型参数到模型中model.load_state_dict(weights_dict, strict=False)

在load_state_dict中,我们重点关注的是属性 strict,当strict=True,要求预训练权重层数的键值与新构建的模型中的权重层数名称完全吻合;如果新构建的模型在层数上进行了部分微调,则上述代码就会报错:说key对应不上。

此时,如果我们采用strict=False 就能够完美的解决这个问题。与训练权重中与新构建网络中匹配层的键值就进行使用,没有的就默认初始化。

完整测试代码:

import torchimport torch.nn as nnBN_MOMENTUM = 0.1class BasicBlock(nn.Module):    expansion = 1    def __init__(self, inplanes, planes, stride=1, downsample=None):        super(BasicBlock, self).__init__()        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)        self.relu = nn.ReLU(inplace=True)        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)        self.downsample = downsample        self.stride = stride    def forward(self, x):        residual = x        out = self.conv1(x)        out = self.bn1(out)        out = self.relu(out)        out = self.conv2(out)        out = self.bn2(out)        if self.downsample is not None:            residual = self.downsample(x)        out += residual        out = self.relu(out)        return outclass Bottleneck(nn.Module):    expansion = 4    def __init__(self, inplanes, planes, stride=1, downsample=None):        super(Bottleneck, self).__init__()        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,                               padding=1, bias=False)        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,                               bias=False)        self.bn3 = nn.BatchNorm2d(planes * self.expansion,                                  momentum=BN_MOMENTUM)        self.relu = nn.ReLU(inplace=True)        self.downsample = downsample        self.stride = stride    def forward(self, x):        residual = x        out = self.conv1(x)        out = self.bn1(out)        out = self.relu(out)        out = self.conv2(out)        out = self.bn2(out)        out = self.relu(out)        out = self.conv3(out)        out = self.bn3(out)        if self.downsample is not None:            residual = self.downsample(x)        out += residual        out = self.relu(out)        return outclass StageModule(nn.Module):    def __init__(self, input_branches, output_branches, c):        """        构建对应stage,即用来融合不同尺度的实现        :param input_branches: 输入的分支数,每个分支对应一种尺度        :param output_branches: 输出的分支数        :param c: 输入的第一个分支通道数        """        super().__init__()        self.input_branches = input_branches        self.output_branches = output_branches        self.branches = nn.ModuleList()        for i in range(self.input_branches):  # 每个分支上都先通过4个BasicBlock            w = c * (2 ** i)  # 对应第i个分支的通道数            branch = nn.Sequential(                BasicBlock(w, w),                BasicBlock(w, w),                BasicBlock(w, w),                BasicBlock(w, w)            )            self.branches.append(branch)        self.fuse_layers = nn.ModuleList()  # 用于融合每个分支上的输出        for i in range(self.output_branches):            self.fuse_layers.append(nn.ModuleList())            for j in range(self.input_branches):                if i == j:                    # 当输入、输出为同一个分支时不做任何处理                    self.fuse_layers[-1].append(nn.Identity())                elif i < j:                    # 当输入分支j大于输出分支i时(即输入分支下采样率大于输出分支下采样率),                    # 此时需要对输入分支j进行通道调整以及上采样,方便后续相加                    self.fuse_layers[-1].append(                        nn.Sequential(                            nn.Conv2d(c * (2 ** j), c * (2 ** i), kernel_size=1, stride=1, bias=False),                            nn.BatchNorm2d(c * (2 ** i), momentum=BN_MOMENTUM),                            nn.Upsample(scale_factor=2.0 ** (j - i), mode='nearest')                        )                    )                else:  # i > j                    # 当输入分支j小于输出分支i时(即输入分支下采样率小于输出分支下采样率),                    # 此时需要对输入分支j进行通道调整以及下采样,方便后续相加                    # 注意,这里每次下采样2x都是通过一个3x3卷积层实现的,4x就是两个,8x就是三个,总共i-j个                    ops = []                    # 前i-j-1个卷积层不用变通道,只进行下采样                    for k in range(i - j - 1):                        ops.append(                            nn.Sequential(                                nn.Conv2d(c * (2 ** j), c * (2 ** j), kernel_size=3, stride=2, padding=1, bias=False),                                nn.BatchNorm2d(c * (2 ** j), momentum=BN_MOMENTUM),                                nn.ReLU(inplace=True)                            )                        )                    # 最后一个卷积层不仅要调整通道,还要进行下采样                    ops.append(                        nn.Sequential(                            nn.Conv2d(c * (2 ** j), c * (2 ** i), kernel_size=3, stride=2, padding=1, bias=False),                            nn.BatchNorm2d(c * (2 ** i), momentum=BN_MOMENTUM)                        )                    )                    self.fuse_layers[-1].append(nn.Sequential(*ops))        self.relu = nn.ReLU(inplace=True)    def forward(self, x):        # 每个分支通过对应的block        x = [branch(xi) for branch, xi in zip(self.branches, x)]        # 接着融合不同尺寸信息        x_fused = []        for i in range(len(self.fuse_layers)):            x_fused.append(                self.relu(                    sum([self.fuse_layers[i][j](x[j]) for j in range(len(self.branches))])                )            )        return x_fusedclass HighResolutionNet(nn.Module):    def __init__(self, base_channel: int = 32, num_joints: int = 17):        super().__init__()        # Stem        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, bias=False)        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False)        self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)        self.relu = nn.ReLU(inplace=True)        # Stage1        downsample = nn.Sequential(            nn.Conv2d(64, 256, kernel_size=1, stride=1, bias=False),            nn.BatchNorm2d(256, momentum=BN_MOMENTUM)        )        self.layer1 = nn.Sequential(            Bottleneck(64, 64, downsample=downsample),            Bottleneck(256, 64),            Bottleneck(256, 64),            Bottleneck(256, 64)        )        self.transition1 = nn.ModuleList([            nn.Sequential(                nn.Conv2d(256, base_channel, kernel_size=3, stride=1, padding=1, bias=False),                nn.BatchNorm2d(base_channel, momentum=BN_MOMENTUM),                nn.ReLU(inplace=True)            ),            nn.Sequential(                nn.Sequential(  # 这里又使用一次Sequential是为了适配原项目中提供的权重                    nn.Conv2d(256, base_channel * 2, kernel_size=3, stride=2, padding=1, bias=False),                    nn.BatchNorm2d(base_channel * 2, momentum=BN_MOMENTUM),                    nn.ReLU(inplace=True)                )            )        ])        # Stage2        self.stage2 = nn.Sequential(            StageModule(input_branches=2, output_branches=2, c=base_channel)        )        # transition2        self.transition2 = nn.ModuleList([            nn.Identity(),  # None,  - Used in place of "None" because it is callable            nn.Identity(),  # None,  - Used in place of "None" because it is callable            nn.Sequential(                nn.Sequential(                    nn.Conv2d(base_channel * 2, base_channel * 4, kernel_size=3, stride=2, padding=1, bias=False),                    nn.BatchNorm2d(base_channel * 4, momentum=BN_MOMENTUM),                    nn.ReLU(inplace=True)                )            )        ])        # Stage3        self.stage3 = nn.Sequential(            StageModule(input_branches=3, output_branches=3, c=base_channel),            StageModule(input_branches=3, output_branches=3, c=base_channel),            StageModule(input_branches=3, output_branches=3, c=base_channel),            StageModule(input_branches=3, output_branches=3, c=base_channel)        )        # transition3        self.transition3 = nn.ModuleList([            nn.Identity(),  # None,  - Used in place of "None" because it is callable            nn.Identity(),  # None,  - Used in place of "None" because it is callable            nn.Identity(),  # None,  - Used in place of "None" because it is callable            nn.Sequential(                nn.Sequential(                    nn.Conv2d(base_channel * 4, base_channel * 8, kernel_size=3, stride=2, padding=1, bias=False),                    nn.BatchNorm2d(base_channel * 8, momentum=BN_MOMENTUM),                    nn.ReLU(inplace=True)                )            )        ])        # Stage4        # 注意,最后一个StageModule只输出分辨率最高的特征层        self.stage4 = nn.Sequential(            StageModule(input_branches=4, output_branches=4, c=base_channel),            StageModule(input_branches=4, output_branches=4, c=base_channel),            StageModule(input_branches=4, output_branches=1, c=base_channel)        )        # Final layer        self.final_layer = nn.Conv2d(base_channel, num_joints, kernel_size=1, stride=1)    def forward(self, x):        x = self.conv1(x)        x = self.bn1(x)        x = self.relu(x)        x = self.conv2(x)        x = self.bn2(x)        x = self.relu(x)        x = self.layer1(x)        x = [trans(x) for trans in self.transition1]  # Since now, x is a list        x = self.stage2(x)        x = [            self.transition2[0](x[0]),            self.transition2[1](x[1]),            self.transition2[2](x[-1])        ]  # New branch derives from the "upper" branch only        x = self.stage3(x)        x = [            self.transition3[0](x[0]),            self.transition3[1](x[1]),            self.transition3[2](x[2]),            self.transition3[3](x[-1]),        ]  # New branch derives from the "upper" branch only        x = self.stage4(x)        x = self.final_layer(x[0])        return xif __name__ == '__main__':    # 模型初始化    model = HighResolutionNet(base_channel=32, num_joints=17)    print('model:', model)    weights_dict = torch.load("./pose_hrnet_w32_256x192.pth", map_location='cpu')    print('weights_dict:', weights_dict)    # 打印模型权重所在的位置    print(weights_dict['conv1.weight'].device)    print('weights_dict.keys():', weights_dict.keys())    for k in list(weights_dict.keys()):        # 如果载入的是imagenet权重,就删除无用权重        if "head" in k or "fc" in k:            del weights_dict[k]        # 如果载入的是coco权重,17,如果不相等就删除        if "final_layer" in k:            if weights_dict[k].shape[0] != 17:                del weights_dict[k]    missing_keys, unexpected_keys = model.load_state_dict(weights_dict, strict=False)    if len(missing_keys) != 0:        print("missing_keys: ", missing_keys)

点击全文阅读


本文链接:http://zhangshiyu.com/post/58424.html

<< 上一篇 下一篇 >>

  • 评论(0)
  • 赞助本站

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。

关于我们 | 我要投稿 | 免责申明

Copyright © 2020-2022 ZhangShiYu.com Rights Reserved.豫ICP备2022013469号-1