Debug Tensorflow: 随着训练进行，内存消耗越来越大

其他 2021-12-12 02:45:37 阅读次数: 0

环境

ubuntu 18.04
Python 3.8
TensorFlow-gpu 2.3.1
CUDA 11.1
Tensorflow-yolov5

问题

内存泄漏。训练时消耗内存越来越多，直到内存完全被占用，服务器连接被迫断开。无报错。正常情况是训练时，内存不应该增加

解决方案

监控linux内存情况
安装memory-profiler
定位问题所在

from memory_profiler import profile
fp=open('memory_profiler.log','w+')

@profile(stream=fp)
def train_step2(self, image_data, target):
    with tf.GradientTape() as tape:
        # print(tf.reduce_mean(image_data).numpy(), 'matched', tf.reduce_sum(target[0][:,:,:,:,4]).numpy(), tf.reduce_sum(target[1][:,:,:,:,4]).numpy() , tf.reduce_sum(target[2][:,:,:,:,4]).numpy())
        image = image_data
        pred_result = self.model(image, training=True)

在这里插入图片描述

修改代码
我解决这个问题比较奇怪，删掉shuffle之后就好了。或者说解决了绝大部分，后面好像还是会内存泄漏，但只有一点点了，几十个epoch内存大概上升1%，可以接受。

# if shuffle:
        #     dataset = dataset.shuffle(buffer_size=1111, seed=1949)

其他情况

keras自带有问题：https://github.com/tensorflow/tensorflow/issues/32052

猜你喜欢

转载自blog.csdn.net/weixin_38812492/article/details/112178960

Debug Tensorflow: 随着训练进行，内存消耗越来越大

坑之 tensorflow使用sess.run处理图片时越来越慢，占用内存越来越大的问题

tensorflow debug

正确debug的TensorFlow的姿势

sqlserver占用内存越来越大

TensorFlow发布Eager，便于Debug!

Tensorflow官方debug--tfdbg

Tensorflow使用训练好的模型进行测试，发现计算速度越来越慢

如何在tensorflow中屏蔽Debug信息

tensorflow之debug和可视化

Tensorflow之调试(Debug) && tf.py_func() Tensorflow之调试(Debug)及打印变量

解决虚拟机Ubuntu占用内存越来越大的问题

visual code rg.exe或者git for window占用内存越来越大

python网络爬虫占用内存和CPU越来越大越跑越慢

jconsole,内存，debug端口。

Idea进行远程Debug

Eclipse进行debug

Debug Tensorflow ：Two checkpoint references resolved to different objects

Debug Tensorflow: Expected these arguments to match one of the following 4 option(s):

log 文件越来越大导致磁盘满问题

电脑COM口号越来越大，如何删除的办法

vlc播放rtp延时越来越大

同样做前端，为何差距越来越大？

圆柱模板行业的网站优化难度越来越大

你不努力差距只会越来越大

解决Mac版Chrome越来越大的问题

OBS 录制的视频声音越来越大

Debian为什么呼声越来越大？

【Tensorflow】【Python】训练自己的数据集——数据读取、处理、训练、测试、可视化、Debug（单机单卡、单机多卡、多机多卡）

Debug

今日推荐

周排行

Leetcode简单题61~80

解决zookeeper磁盘IO高的问题

多线程相关方法详解

Maven-setting.xml文件详解

Maven 项目的 classpath 理解

渊亭科技大数据笔试题

配置JVM内存分配

计算机网络个人学习笔记（三）网络层：第三部分连载

js中两个等号(==)和三个等号(===)的区别

用C程序自动打开电脑上的程序

每日归档

更多

2024-09-18(0)

2024-09-17(0)

2024-09-16(0)

2024-09-15(0)

2024-09-14(0)

2024-09-13(0)

2024-09-12(0)

2024-09-11(0)

2024-09-10(0)

2024-09-09(0)