python进行时间数据处理

用页面解析的方式从twitter爬下来的帖子时间有时候是中文的,如下:

这里写图片描述
由于时间处理的细节很多,所以在这里做一个小结,首先要明白处理的目标数据应该是24小时制,并且形式如下

format = "%Y-%m-%d %H:%M:%S"

也就是要将字符串转换为datetime.datetime类型
代码如下:

from datetime import datetime
format = "%Y-%m-%d %H:%M:%S"
def chineseTime2National(time):
    if time[0] == "上":
        time = time.replace(r'上午','').split(' ')
        houmin = time[0].split(':')
        if houmin[0] == '12': #要将凌晨12点换为00
            houmin = "00"+":"+houmin[1]
        else:
            houmin = time[0]
        time = time[2]+" "+houmin
        #print(time)
        time = time.replace(r'年','-').replace(r'月','-').replace(r'日','')
        #print(time)      #输出2017-04-27
        #print(type(time))      #<type 'str'>
        restime = datetime.strptime(time,'%Y-%m-%d %H:%M')
        #print (restime)      #输出结果:2017-04-27 00:00:00
        #print (type(restime))      #<type 'datetime.datetime'>
    elif time[0] == "下":
        time = time.replace(r'下午','').split(' ')
        houmin = time[0].split(':')
        if houmin[0] == '12':
            hour = '12'
        else:
            hour = int(houmin[0])+12#下午时间转换为24小时制
        houmin = str(hour)+":"+houmin[1]
        time = time[2]+" "+houmin
        #print(time)
        time = time.replace(r'年','-').replace(r'月','-').replace(r'日','')#连续替换年月日为‘-’
        #print(time)      #输出2017-04-27
        #print(type(time))      #<type 'str'>
        restime = datetime.strptime(time,'%Y-%m-%d %H:%M')#将字符串转为datetime用strptime
        #print (restime)      #输出结果:2017-04-27 00:00:00
    return restime

得到datetime类型时间以后,由于需要统计发帖的小时、星期,我们需要借助几个简单的函数,代码如下

with open('time_feature_of_user.json','w') as f:
    for name,group in an_traces_df.groupby(['screen_name']):
        dic = {}
        dic["screen_name"] = name
        hours = np.zeros(24)#统计小时的数组
        weekdays = np.zeros(7)#统计星期的数组
        for t in group["created_at"].values:

            t = chineseTime2National(t)#转为datetime
            day = t.date()#datetime类型数据的函数date()获取日期
            weekday = day.weekday()#通过日期获取星期:0代表monday以此类推
            hour = t.time().hour - 1#通过datetime的time()函数的hour属性获取小时

            hours[hour] += 1
            weekdays[weekday] += 1

        dic["hour_feature"] = (hours/len(group["created_at"].values)).tolist()
        dic["weekday_feature"] = (weekdays/len(group["created_at"].values)).tolist()
        f.write(json.dumps(dic)+'\n')

记一下笔记:datetime的strptime(),date(), date()返回对象的weekday()函数,time(),time()的hour属性

strptime

这里写图片描述

date() time()

这里写图片描述

weekday()

这里写图片描述

猜你喜欢

转载自blog.csdn.net/u014449866/article/details/80217360