贝叶斯分类之旧金山犯罪类型分类预测
学习七月算法朴素贝叶斯分类器中项目的一个例子,这也是一个Kaggle比赛的例子。通过训练来预测犯罪类型。
环境: win7 64位 python3.5
1、加载数据
该数据是旧金山12年的犯罪记录,数据文件是一个csv文件可以使用pandas来加载数据,数据内容摘录:
Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y
2015-05-13 23:53:00,WARRANTS,WARRANT ARREST,Wednesday,NORTHERN,”ARREST, BOOKED”,OAK ST / LAGUNA ST,-122.425891675136,37.7745985956747
2015-05-13 23:53:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,”ARREST, BOOKED”,OAK ST / LAGUNA ST,-122.425891675136,37.7745985956747
2015-05-13 23:33:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,”ARREST, BOOKED”,VANNESS AV / GREENWICH ST,-122.42436302145,37.8004143219856
2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,NORTHERN,NONE,1500 Block of LOMBARD ST,-122.42699532676599,37.80087263276921
2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,PARK,NONE,100 Block of BRODERICK ST,-122.438737622757,37.771541172057795
2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM UNLOCKED AUTO,Wednesday,INGLESIDE,NONE,0 Block of TEDDY AV,-122.40325236121201,37.713430704116
从上面的摘录可以看出有一下特征
Dates:犯罪的日期
Category:犯罪类型
Descript:犯罪描述
DayOfWeek:星期几
PdDistrict:所属警区
Resolution:处理结果
Address:发生街区
X and Y:GPS坐标
import pandas as pd
import numpy as np
train = pd.read_csv("C:\\data\\SanFrancisco\\train.csv",parse_dates=['Dates'])
test = pd.read_csv("C:\\data\\SanFrancisco\\test.csv",parse_dates=['Dates'])
- 1
- 2
- 3
- 4
- 5
train[0:6]
- 1
Dates Category Descript \
0 2015-05-13 23:53:00 WARRANTS WARRANT ARREST
1 2015-05-13 23:53:00 OTHER OFFENSES TRAFFIC VIOLATION ARREST
2 2015-05-13 23:33:00 OTHER OFFENSES TRAFFIC VIOLATION ARREST
3 2015-05-13 23:30:00 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO
4 2015-05-13 23:30:00 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO
5 2015-05-13 23:30:00 LARCENY/THEFT GRAND THEFT FROM UNLOCKED AUTO
DayOfWeek PdDistrict Resolution Address \
0 Wednesday NORTHERN ARREST, BOOKED OAK ST / LAGUNA ST
1 Wednesday NORTHERN ARREST, BOOKED OAK ST / LAGUNA ST
2 Wednesday NORTHERN ARREST, BOOKED VANNESS AV / GREENWICH ST
3 Wednesday NORTHERN NONE 1500 Block of LOMBARD ST
4 Wednesday PARK NONE 100 Block of BRODERICK ST
5 Wednesday INGLESIDE NONE 0 Block of TEDDY AV
X Y
0 -122.425892 37.774599
1 -122.425892 37.774599
2 -122.424363 37.800414
3 -122.426995 37.800873
4 -122.438738 37.771541
5 -122.403252 37.713431
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
2、特征预处理
上述数据中类别和文本类型非常多,所以要进行特征处理。因为我们要预测的是犯罪类型,
所以要尽可能的将犯罪相关因素的特征量化。
日期Dates:前5条记录发现几乎犯罪时间都是23点以后,这也符合常理。
犯罪类型Category:这个target,是需要量化的。
罪状Descript:这个特征都是犯罪以后的事了,没什么意义。
星期几DayOfWeek:这个与时间Dates关系还是挺强的,毕竟周末或者节假日户外活动的人多的话,也很容易招贼。
所属警区PdDistrict和处理结果Resolution:这两个特征与犯罪动因也没什么太大关系。
发生街区位置Address:对美国街区有一定了解的话,就知道美国有一些街区比如是低收入、非法移民等聚居的街区治安不是太好,犯罪比例也相对比较高。
接下来将对日期、犯罪类型、星期几、街区等特征进行预处理。
使用pandas的get_dummies()可以直接拿到一个二值化的01向量
使用pandas的LabelEncoder可以对类别编号
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
# pd.set_option('display.notebook_repr_html',False)
# pd.set_option('display.max_columns',None)
# pd.set_option('display.max_rows',150)
# pd.set_option('display.max_seq_items',None)
#用LabelEncoder对不同的犯罪类型编号
leCrime = preprocessing.LabelEncoder()
crime = leCrime.fit_transform(train.Category)
#因子化星期几,街区,小时等特征
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#组合特征
trainData = pd.concat([hour, days, district], axis=1)
trainData['crime']=crime
#对于测试数据做同样的处理
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)
hour = test.Dates.dt.hour
hour = pd.get_dummies(hour)
testData = pd.concat([hour, days, district], axis=1)
trainData
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
61 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
72 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
877974 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877975 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877976 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877977 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877978 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877979 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877980 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877981 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877982 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877983 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877984 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877985 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877986 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877987 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877988 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877989 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877990 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877991 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877992 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877993 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877994 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877995 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877996 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877997 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877998 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
877999 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878000 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878001 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878002 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878003 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878004 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878005 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878006 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878007 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878008 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878009 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878010 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878011 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878012 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878013 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878014 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878015 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878016 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878017 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878018 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878019 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878020 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878021 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878022 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878023 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878024 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878025 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878026 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878027 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878028 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878029 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878030 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878031 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878032 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878033 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878034 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878035 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878036 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878037 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878038 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878039 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878040 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878041 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878042 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878043 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878044 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878045 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878046 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878047 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878048 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 21 22 23 Friday Monday Saturday Sunday Thursday Tuesday \
0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 1 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0 0 0
7 0 0 0 1 0 0 0 0 0 0
8 0 0 0 1 0 0 0 0 0 0
9 0 0 0 1 0 0 0 0 0 0
10 0 0 1 0 0 0 0 0 0 0
11 0 0 1 0 0 0 0 0 0 0
12 0 0 1 0 0 0 0 0 0 0
13 0 0 1 0 0 0 0 0 0 0
14 0 0 1 0 0 0 0 0 0 0
15 0 0 1 0 0 0 0 0 0 0
16 0 0 1 0 0 0 0 0 0 0
17 0 1 0 0 0 0 0 0 0 0
18 0 1 0 0 0 0 0 0 0 0
19 0 1 0 0 0 0 0 0 0 0
20 0 1 0 0 0 0 0 0 0 0
21 0 1 0 0 0 0 0 0 0 0
22 0 1 0 0 0 0 0 0 0 0
23 0 1 0 0 0 0 0 0 0 0
24 0 1 0 0 0 0 0 0 0 0
25 0 1 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0 0
27 0 1 0 0 0 0 0 0 0 0
28 0 1 0 0 0 0 0 0 0 0
29 1 0 0 0 0 0 0 0 0 0
30 1 0 0 0 0 0 0 0 0 0
31 1 0 0 0 0 0 0 0 0 0
32 1 0 0 0 0 0 0 0 0 0
33 1 0 0 0 0 0 0 0 0 0
34 1 0 0 0 0 0 0 0 0 0
35 1 0 0 0 0 0 0 0 0 0
36 1 0 0 0 0 0 0 0 0 0
37 1 0 0 0 0 0 0 0 0 0
38 1 0 0 0 0 0 0 0 0 0
39 1 0 0 0 0 0 0 0 0 0
40 1 0 0 0 0 0 0 0 0 0
41 1 0 0 0 0 0 0 0 0 0
42 1 0 0 0 0 0 0 0 0 0
43 1 0 0 0 0 0 0 0 0 0
44 1 0 0 0 0 0 0 0 0 0
45 1 0 0 0 0 0 0 0 0 0
46 1 0 0 0 0 0 0 0 0 0
47 1 0 0 0 0 0 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0
49 0 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0 0
51 0 0 0 0 0 0 0 0 0 0
52 0 0 0 0 0 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0 0
54 0 0 0 0 0 0 0 0 0 0
55 0 0 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0
57 0 0 0 0 0 0 0 0 0 0
58 0 0 0 0 0 0 0 0 0 0
59 0 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0 0
61 0 0 0 0 0 0 0 0 0 0
62 0 0 0 0 0 0 0 0 0 0
63 0 0 0 0 0 0 0 0 0 0
64 0 0 0 0 0 0 0 0 0 0
65 0 0 0 0 0 0 0 0 0 0
66 0 0 0 0 0 0 0 0 0 0
67 0 0 0 0 0 0 0 0 0 0
68 0 0 0 0 0 0 0 0 0 0
69 0 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 0 0 0
71 0 0 0 0 0 0 0 0 0 0
72 0 0 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0 0
74 0 0 0 0 0 0 0 0 0 0
... .. .. .. .. ... ... ... ... ... ...
877974 0 0 0 0 0 1 0 0 0 0
877975 0 0 0 0 0 1 0 0 0 0
877976 0 0 0 0 0 1 0 0 0 0
877977 0 0 0 0 0 1 0 0 0 0
877978 0 0 0 0 0 1 0 0 0 0
877979 0 0 0 0 0 1 0 0 0 0
877980 0 0 0 0 0 1 0 0 0 0
877981 0 0 0 0 0 1 0 0 0 0
877982 0 0 0 0 0 1 0 0 0 0
877983 0 0 0 0 0 1 0 0 0 0
877984 0 0 0 0 0 1 0 0 0 0
877985 0 0 0 0 0 1 0 0 0 0
877986 0 0 0 0 0 1 0 0 0 0
877987 0 0 0 0 0 1 0 0 0 0
877988 0 0 0 0 0 1 0 0 0 0
877989 0 0 0 0 0 1 0 0 0 0
877990 0 0 0 0 0 1 0 0 0 0
877991 0 0 0 0 0 1 0 0 0 0
877992 0 0 0 0 0 1 0 0 0 0
877993 0 0 0 0 0 1 0 0 0 0
877994 0 0 0 0 0 1 0 0 0 0
877995 0 0 0 0 0 1 0 0 0 0
877996 0 0 0 0 0 1 0 0 0 0
877997 0 0 0 0 0 1 0 0 0 0
877998 0 0 0 0 0 1 0 0 0 0
877999 0 0 0 0 0 1 0 0 0 0
878000 0 0 0 0 0 1 0 0 0 0
878001 0 0 0 0 0 1 0 0 0 0
878002 0 0 0 0 0 1 0 0 0 0
878003 0 0 0 0 0 1 0 0 0 0
878004 0 0 0 0 0 1 0 0 0 0
878005 0 0 0 0 0 1 0 0 0 0
878006 0 0 0 0 0 1 0 0 0 0
878007 0 0 0 0 0 1 0 0 0 0
878008 0 0 0 0 0 1 0 0 0 0
878009 0 0 0 0 0 1 0 0 0 0
878010 0 0 0 0 0 1 0 0 0 0
878011 0 0 0 0 0 1 0 0 0 0
878012 0 0 0 0 0 1 0 0 0 0
878013 0 0 0 0 0 1 0 0 0 0
878014 0 0 0 0 0 1 0 0 0 0
878015 0 0 0 0 0 1 0 0 0 0
878016 0 0 0 0 0 1 0 0 0 0
878017 0 0 0 0 0 1 0 0 0 0
878018 0 0 0 0 0 1 0 0 0 0
878019 0 0 0 0 0 1 0 0 0 0
878020 0 0 0 0 0 1 0 0 0 0
878021 0 0 0 0 0 1 0 0 0 0
878022 0 0 0 0 0 1 0 0 0 0
878023 0 0 0 0 0 1 0 0 0 0
878024 0 0 0 0 0 1 0 0 0 0
878025 0 0 0 0 0 1 0 0 0 0
878026 0 0 0 0 0 1 0 0 0 0
878027 0 0 0 0 0 1 0 0 0 0
878028 0 0 0 0 0 1 0 0 0 0
878029 0 0 0 0 0 1 0 0 0 0
878030 0 0 0 0 0 1 0 0 0 0
878031 0 0 0 0 0 1 0 0 0 0
878032 0 0 0 0 0 1 0 0 0 0
878033 0 0 0 0 0 1 0 0 0 0
878034 0 0 0 0 0 1 0 0 0 0
878035 0 0 0 0 0 1 0 0 0 0
878036 0 0 0 0 0 1 0 0 0 0
878037 0 0 0 0 0 1 0 0 0 0
878038 0 0 0 0 0 1 0 0 0 0
878039 0 0 0 0 0 1 0 0 0 0
878040 0 0 0 0 0 1 0 0 0 0
878041 0 0 0 0 0 1 0 0 0 0
878042 0 0 0 0 0 1 0 0 0 0
878043 0 0 0 0 0 1 0 0 0 0
878044 0 0 0 0 0 1 0 0 0 0
878045 0 0 0 0 0 1 0 0 0 0
878046 0 0 0 0 0 1 0 0 0 0
878047 0 0 0 0 0 1 0 0 0 0
878048 0 0 0 0 0 1 0 0 0 0
Wednesday BAYVIEW CENTRAL INGLESIDE MISSION NORTHERN PARK \
0 1 0 0 0 0 1 0
1 1 0 0 0 0 1 0
2 1 0 0 0 0 1 0
3 1 0 0 0 0 1 0
4 1 0 0 0 0 0 1
5 1 0 0 1 0 0 0
6 1 0 0 1 0 0 0
7 1 1 0 0 0 0 0
8 1 0 0 0 0 0 0
9 1 0 1 0 0 0 0
10 1 0 1 0 0 0 0
11 1 0 0 0 0 0 0
12 1 0 0 0 0 0 0
13 1 0 0 0 0 1 0
14 1 1 0 0 0 0 0
15 1 1 0 0 0 0 0
16 1 0 0 0 0 0 0
17 1 0 0 1 0 0 0
18 1 1 0 0 0 0 0
19 1 0 0 0 0 0 0
20 1 0 0 1 0 0 0
21 1 0 0 1 0 0 0
22 1 0 0 0 0 0 0
23 1 0 0 0 0 0 0
24 1 0 0 0 0 1 0
25 1 0 0 0 0 0 0
26 1 0 0 0 0 1 0
27 1 0 0 1 0 0 0
28 1 0 0 0 0 0 0
29 1 0 0 0 0 0 0
30 1 0 0 0 0 1 0
31 1 0 0 0 1 0 0
32 1 0 0 0 0 1 0
33 1 0 0 0 0 1 0
34 1 0 0 0 0 1 0
35 1 0 0 0 0 0 0
36 1 0 0 0 0 1 0
37 1 0 0 0 0 1 0
38 1 0 0 0 0 0 0
39 1 0 0 1 0 0 0
40 1 0 0 0 0 0 0
41 1 0 0 0 0 0 0
42 1 0 0 0 0 0 0
43 1 1 0 0 0 0 0
44 1 1 0 0 0 0 0
45 1 0 1 0 0 0 0
46 1 0 0 1 0 0 0
47 1 0 0 0 0 0 0
48 1 0 1 0 0 0 0
49 1 0 0 0 0 0 1
50 1 1 0 0 0 0 0
51 1 1 0 0 0 0 0
52 1 0 0 0 0 0 0
53 1 0 0 0 0 0 0
54 1 0 0 0 0 0 0
55 1 0 0 0 0 0 0
56 1 0 0 0 0 1 0
57 1 0 0 0 0 0 0
58 1 0 0 0 0 1 0
59 1 0 1 0 0 0 0
60 1 0 1 0 0 0 0
61 1 0 1 0 0 0 0
62 1 0 1 0 0 0 0
63 1 0 0 0 0 0 0
64 1 0 0 0 0 0 0
65 1 0 0 0 0 0 0
66 1 0 0 0 0 0 0
67 1 0 0 0 0 0 0
68 1 0 0 0 0 0 0
69 1 0 0 0 0 0 0
70 1 0 0 0 0 0 0
71 1 0 0 0 0 1 0
72 1 1 0 0 0 0 0
73 1 0 0 0 1 0 0
74 1 0 1 0 0 0 0
... ... ... ... ... ... ... ...
877974 0 0 0 0 0 0 1
877975 0 0 0 0 0 0 1
877976 0 0 1 0 0 0 0
877977 0 0 0 0 0 0 0
877978 0 0 0 0 0 0 0
877979 0 0 0 0 0 0 0
877980 0 0 0 0 0 0 0
877981 0 0 0 0 0 1 0
877982 0 0 0 0 0 0 0
877983 0 0 0 0 1 0 0
877984 0 0 1 0 0 0 0
877985 0 0 0 0 0 0 0
877986 0 1 0 0 0 0 0
877987 0 0 0 1 0 0 0
877988 0 0 0 0 0 0 0
877989 0 1 0 0 0 0 0
877990 0 0 0 0 0 1 0
877991 0 0 0 0 0 0 0
877992 0 0 0 0 0 0 1
877993 0 0 0 0 0 0 0
877994 0 0 0 1 0 0 0
877995 0 0 0 0 1 0 0
877996 0 0 0 0 1 0 0
877997 0 1 0 0 0 0 0
877998 0 0 0 0 0 0 1
877999 0 1 0 0 0 0 0
878000 0 1 0 0 0 0 0
878001 0 0 0 0 0 0 0
878002 0 0 0 0 0 0 0
878003 0 0 1 0 0 0 0
878004 0 0 0 0 0 1 0
878005 0 0 0 0 0 0 0
878006 0 0 0 0 0 0 0
878007 0 0 0 0 0 0 0
878008 0 0 0 1 0 0 0
878009 0 0 0 1 0 0 0
878010 0 0 0 0 0 0 0
878011 0 0 0 0 0 1 0
878012 0 0 0 0 0 0 0
878013 0 0 0 0 0 0 0
878014 0 0 0 0 0 1 0
878015 0 0 0 0 0 1 0
878016 0 1 0 0 0 0 0
878017 0 0 1 0 0 0 0
878018 0 0 1 0 0 0 0
878019 0 0 0 0 0 0 0
878020 0 0 0 0 0 1 0
878021 0 0 0 0 0 1 0
878022 0 0 0 0 1 0 0
878023 0 0 0 0 0 0 0
878024 0 0 0 0 0 0 1
878025 0 1 0 0 0 0 0
878026 0 1 0 0 0 0 0
878027 0 0 0 0 0 0 0
878028 0 0 0 0 0 0 0
878029 0 0 0 0 0 0 0
878030 0 0 0 0 0 0 0
878031 0 1 0 0 0 0 0
878032 0 0 0 0 0 1 0
878033 0 0 0 0 0 0 0
878034 0 0 0 0 0 0 0
878035 0 0 0 0 0 1 0
878036 0 0 0 0 0 1 0
878037 0 0 0 0 0 1 0
878038 0 0 0 0 0 0 0
878039 0 0 0 0 0 1 0
878040 0 0 0 0 1 0 0
878041 0 0 0 0 0 0 0
878042 0 1 0 0 0 0 0
878043 0 1 0 0 0 0 0
878044 0 0 0 0 0 0 0
878045 0 0 0 1 0 0 0
878046 0 0 0 0 0 0 0
878047 0 0 0 0 0 0 0
878048 0 1 0 0 0 0 0
RICHMOND SOUTHERN TARAVAL TENDERLOIN crime
0 0 0 0 0 37
1 0 0 0 0 21
2 0 0 0 0 21
3 0 0 0 0 16
4 0 0 0 0 16
5 0 0 0 0 16
6 0 0 0 0 36
7 0 0 0 0 36
8 1 0 0 0 16
9 0 0 0 0 16
10 0 0 0 0 16
11 0 0 1 0 21
12 0 0 0 1 35
13 0 0 0 0 16
14 0 0 0 0 20
15 0 0 0 0 20
16 0 0 0 1 25
17 0 0 0 0 1
18 0 0 0 0 21
19 0 0 0 1 20
20 0 0 0 0 16
21 0 0 0 0 25
22 0 0 0 1 37
23 0 0 0 1 20
24 0 0 0 0 16
25 0 0 0 1 20
26 0 0 0 0 16
27 0 0 0 0 16
28 0 0 1 0 16
29 0 0 1 0 21
30 0 0 0 0 16
31 0 0 0 0 20
32 0 0 0 0 35
33 0 0 0 0 16
34 0 0 0 0 35
35 0 1 0 0 16
36 0 0 0 0 16
37 0 0 0 0 16
38 0 0 1 0 38
39 0 0 0 0 35
40 0 1 0 0 20
41 0 1 0 0 16
42 0 0 0 1 16
43 0 0 0 0 21
44 0 0 0 0 21
45 0 0 0 0 21
46 0 0 0 0 36
47 0 0 1 0 16
48 0 0 0 0 20
49 0 0 0 0 4
50 0 0 0 0 25
51 0 0 0 0 1
52 0 1 0 0 16
53 0 1 0 0 16
54 0 1 0 0 32
55 0 1 0 0 16
56 0 0 0 0 16
57 0 1 0 0 16
58 0 0 0 0 16
59 0 0 0 0 36
60 0 0 0 0 36
61 0 0 0 0 8
62 0 0 0 0 32
63 0 1 0 0 20
64 0 1 0 0 16
65 0 0 1 0 16
66 0 0 0 1 37
67 0 0 0 1 37
68 0 0 0 1 21
69 0 1 0 0 16
70 0 1 0 0 16
71 0 0 0 0 16
72 0 0 0 0 16
73 0 0 0 0 36
74 0 0 0 0 16
... ... ... ... ... ...
877974 0 0 0 0 36
877975 0 0 0 0 36
877976 0 0 0 0 20
877977 0 1 0 0 21
877978 0 0 1 0 21
877979 0 0 1 0 36
877980 0 0 1 0 36
877981 0 0 0 0 32
877982 0 1 0 0 21
877983 0 0 0 0 21
877984 0 0 0 0 16
877985 0 1 0 0 21
877986 0 0 0 0 21
877987 0 0 0 0 4
877988 0 1 0 0 34
877989 0 0 0 0 21
877990 0 0 0 0 20
877991 0 1 0 0 21
877992 0 0 0 0 16
877993 0 1 0 0 21
877994 0 0 0 0 36
877995 0 0 0 0 37
877996 0 0 0 0 21
877997 0 0 0 0 21
877998 0 0 0 0 19
877999 0 0 0 0 36
878000 0 0 0 0 36
878001 0 1 0 0 21
878002 0 1 0 0 16
878003 0 0 0 0 1
878004 0 0 0 0 1
878005 0 1 0 0 21
878006 0 1 0 0 35
878007 0 1 0 0 34
878008 0 0 0 0 30
878009 0 0 0 0 21
878010 1 0 0 0 4
878011 0 0 0 0 35
878012 1 0 0 0 13
878013 0 1 0 0 4
878014 0 0 0 0 21
878015 0 0 0 0 30
878016 0 0 0 0 35
878017 0 0 0 0 25
878018 0 0 0 0 21
878019 0 1 0 0 21
878020 0 0 0 0 21
878021 0 0 0 0 35
878022 0 0 0 0 36
878023 0 0 0 1 16
878024 0 0 0 0 21
878025 0 0 0 0 21
878026 0 0 0 0 37
878027 0 1 0 0 37
878028 0 1 0 0 1
878029 0 0 0 1 21
878030 0 0 0 1 28
878031 0 0 0 0 1
878032 0 0 0 0 21
878033 1 0 0 0 35
878034 1 0 0 0 34
878035 0 0 0 0 1
878036 0 0 0 0 16
878037 0 0 0 0 35
878038 0 0 0 1 37
878039 0 0 0 0 21
878040 0 0 0 0 1
878041 1 0 0 0 21
878042 0 0 0 0 1
878043 0 0 0 0 21
878044 0 0 1 0 25
878045 0 0 0 0 16
878046 0 1 0 0 16
878047 0 1 0 0 35
878048 0 0 0 0 12
[878049 rows x 42 columns]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 579
- 580
- 581
- 582
- 583
- 584
- 585
- 586
- 587
- 588
- 589
- 590
- 591
- 592
- 593
- 594
- 595
- 596
- 597
- 598
- 599
- 600
- 601
- 602
- 603
- 604
- 605
- 606
- 607
- 608
- 609
- 610
- 611
- 612
- 613
- 614
我们可以快速地筛出一部分重要的特征,搭建一个baseline系统,再考虑步步优化。比如我们这里
简单一点,就只取星期几和街区作为分类器输入特征,我们用scikit-learn中的train_test_split
函数拿到训练集和交叉验证集,用朴素贝叶斯和逻辑回归都建立模型,对比一下它们的表现:
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
import time
# 只取星期几和街区作为分类器输入特征
features = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
# 分割训练集(3/5)和测试集(2/5)
training, validation = train_test_split(trainData, train_size=.60)
# 朴素贝叶斯建模,计算log_loss
model = BernoulliNB()
nbStart = time.time()
model.fit(training[features], training['crime'])
nbCostTime = time.time() - nbStart
predicted = np.array(model.predict_proba(validation[features]))
print("朴素贝叶斯建模耗时 %f 秒" %(nbCostTime))
print("朴素贝叶斯log损失为 %f " %(log_loss(validation['crime'],predicted)))
#逻辑回归建模,计算log_loss
model = LogisticRegression(C=.01)
lrStart = time.time()
model.fit(training[features],training['crime'])
lrCostTime = time.time() - lrStart
predicted = np.array(model.predict_proba(validation[features]))
print("逻辑回归建模耗时 %f 秒" %(lrCostTime))
print("逻辑回归log损失为 %f" %(log_loss(validation['crime'], predicted)))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
朴素贝叶斯建模耗时 0.477027 秒
朴素贝叶斯log损失为 2.614108 秒
逻辑回归建模耗时 58.954372 秒
逻辑回归log损失为 2.621150
- 1
- 2
- 3
- 4
- 5
我们可以看到目前的特征和参数设定下,朴素贝叶斯的log损失还低一些,另外我们可以明显看到,
朴素贝叶斯建模消耗的时间远小于逻辑回归建模时间。
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
import time
features = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
hourFea = [x for x in range(0,24)]
features = features + hourFea
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
import time
# 只取星期几和街区作为分类器输入特征
features = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
# 分割训练集(3/5)和测试集(2/5)
training, validation = train_test_split(trainData, train_size=.60)
# 朴素贝叶斯建模,计算log_loss
model = BernoulliNB()
nbStart = time.time()
model.fit(training[features], training['crime'])
nbCostTime = time.time() - nbStart
predicted = np.array(model.predict_proba(validation[features]))
print("朴素贝叶斯建模耗时 %f 秒" %(nbCostTime))
print("朴素贝叶斯log损失为 %f 秒" %(log_loss(validation['crime'],predicted)))
#逻辑回归建模,计算log_loss
model = LogisticRegression(C=.01)
lrStart = time.time()
model.fit(training[features],training['crime'])
lrCostTime = time.time() - lrStart
predicted = np.array(model.predict_proba(validation[features]))
print("逻辑回归建模耗时 %f 秒" %(lrCostTime))
print("逻辑回归log损失为 %f" %(log_loss(validation['crime'], predicted)))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
朴素贝叶斯建模耗时 0.478027 秒
朴素贝叶斯log损失为 2.613777 秒
逻辑回归建模耗时 58.734359 秒
逻辑回归log损失为 2.621033
- 1
- 2
- 3
- 4
- 5
可以看到在这三个类别特征下,朴素贝叶斯相对于逻辑回归,依旧有一定的优势(log损失更小),
同时训练时间很短,这意味着模型虽然简单,但是效果依旧强大。
参考文献:
http://blog.csdn.net/han_xiaoyang/article/details/50629608
<link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/markdown_views-ea0013b516.css">
</div>
贝叶斯分类之旧金山犯罪类型分类预测
学习七月算法朴素贝叶斯分类器中项目的一个例子,这也是一个Kaggle比赛的例子。通过训练来预测犯罪类型。
环境: win7 64位 python3.5
1、加载数据
该数据是旧金山12年的犯罪记录,数据文件是一个csv文件可以使用pandas来加载数据,数据内容摘录:
Dates,Category,Descript,DayOfWeek,PdDistrict,Resolution,Address,X,Y
2015-05-13 23:53:00,WARRANTS,WARRANT ARREST,Wednesday,NORTHERN,”ARREST, BOOKED”,OAK ST / LAGUNA ST,-122.425891675136,37.7745985956747
2015-05-13 23:53:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,”ARREST, BOOKED”,OAK ST / LAGUNA ST,-122.425891675136,37.7745985956747
2015-05-13 23:33:00,OTHER OFFENSES,TRAFFIC VIOLATION ARREST,Wednesday,NORTHERN,”ARREST, BOOKED”,VANNESS AV / GREENWICH ST,-122.42436302145,37.8004143219856
2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,NORTHERN,NONE,1500 Block of LOMBARD ST,-122.42699532676599,37.80087263276921
2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM LOCKED AUTO,Wednesday,PARK,NONE,100 Block of BRODERICK ST,-122.438737622757,37.771541172057795
2015-05-13 23:30:00,LARCENY/THEFT,GRAND THEFT FROM UNLOCKED AUTO,Wednesday,INGLESIDE,NONE,0 Block of TEDDY AV,-122.40325236121201,37.713430704116
从上面的摘录可以看出有一下特征
Dates:犯罪的日期
Category:犯罪类型
Descript:犯罪描述
DayOfWeek:星期几
PdDistrict:所属警区
Resolution:处理结果
Address:发生街区
X and Y:GPS坐标
import pandas as pd
import numpy as np
train = pd.read_csv("C:\\data\\SanFrancisco\\train.csv",parse_dates=['Dates'])
test = pd.read_csv("C:\\data\\SanFrancisco\\test.csv",parse_dates=['Dates'])
- 1
- 2
- 3
- 4
- 5
train[0:6]
- 1
Dates Category Descript \
0 2015-05-13 23:53:00 WARRANTS WARRANT ARREST
1 2015-05-13 23:53:00 OTHER OFFENSES TRAFFIC VIOLATION ARREST
2 2015-05-13 23:33:00 OTHER OFFENSES TRAFFIC VIOLATION ARREST
3 2015-05-13 23:30:00 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO
4 2015-05-13 23:30:00 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO
5 2015-05-13 23:30:00 LARCENY/THEFT GRAND THEFT FROM UNLOCKED AUTO
DayOfWeek PdDistrict Resolution Address \
0 Wednesday NORTHERN ARREST, BOOKED OAK ST / LAGUNA ST
1 Wednesday NORTHERN ARREST, BOOKED OAK ST / LAGUNA ST
2 Wednesday NORTHERN ARREST, BOOKED VANNESS AV / GREENWICH ST
3 Wednesday NORTHERN NONE 1500 Block of LOMBARD ST
4 Wednesday PARK NONE 100 Block of BRODERICK ST
5 Wednesday INGLESIDE NONE 0 Block of TEDDY AV
X Y
0 -122.425892 37.774599
1 -122.425892 37.774599
2 -122.424363 37.800414
3 -122.426995 37.800873
4 -122.438738 37.771541
5 -122.403252 37.713431
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
2、特征预处理
上述数据中类别和文本类型非常多,所以要进行特征处理。因为我们要预测的是犯罪类型,
所以要尽可能的将犯罪相关因素的特征量化。
日期Dates:前5条记录发现几乎犯罪时间都是23点以后,这也符合常理。
犯罪类型Category:这个target,是需要量化的。
罪状Descript:这个特征都是犯罪以后的事了,没什么意义。
星期几DayOfWeek:这个与时间Dates关系还是挺强的,毕竟周末或者节假日户外活动的人多的话,也很容易招贼。
所属警区PdDistrict和处理结果Resolution:这两个特征与犯罪动因也没什么太大关系。
发生街区位置Address:对美国街区有一定了解的话,就知道美国有一些街区比如是低收入、非法移民等聚居的街区治安不是太好,犯罪比例也相对比较高。
接下来将对日期、犯罪类型、星期几、街区等特征进行预处理。
使用pandas的get_dummies()可以直接拿到一个二值化的01向量
使用pandas的LabelEncoder可以对类别编号
import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
# pd.set_option('display.notebook_repr_html',False)
# pd.set_option('display.max_columns',None)
# pd.set_option('display.max_rows',150)
# pd.set_option('display.max_seq_items',None)
#用LabelEncoder对不同的犯罪类型编号
leCrime = preprocessing.LabelEncoder()
crime = leCrime.fit_transform(train.Category)
#因子化星期几,街区,小时等特征
days = pd.get_dummies(train.DayOfWeek)
district = pd.get_dummies(train.PdDistrict)
hour = train.Dates.dt.hour
hour = pd.get_dummies(hour)
#组合特征
trainData = pd.concat([hour, days, district], axis=1)
trainData['crime']=crime
#对于测试数据做同样的处理
days = pd.get_dummies(test.DayOfWeek)
district = pd.get_dummies(test.PdDistrict)
hour = test.Dates.dt.hour
hour = pd.get_dummies(hour)
testData = pd.concat([hour, days, district], axis=1)
trainData
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 \
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
53 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
58 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
61 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
72 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
877974 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877975 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877976 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877977 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877978 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877979 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877980 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877981 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877982 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877983 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877984 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877985 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877986 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877987 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
877988 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877989 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877990 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877991 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877992 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877993 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877994 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877995 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877996 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877997 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
877998 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
877999 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878000 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878001 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878002 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
878003 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878004 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878005 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878006 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878007 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878008 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878009 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878010 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878011 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878012 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878013 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878014 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878015 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878016 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878017 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878018 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878019 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878020 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878021 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878022 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878023 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878024 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878025 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878026 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878027 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878028 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878029 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878030 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878031 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878032 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878033 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878034 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878035 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878036 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878037 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878038 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878039 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878040 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878041 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878042 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878043 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878044 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878045 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878046 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878047 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
878048 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20 21 22 23 Friday Monday Saturday Sunday Thursday Tuesday \
0 0 0 0 1 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0 0
3 0 0 0 1 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 1 0 0 0 0 0 0
6 0 0 0 1 0 0 0 0 0 0
7 0 0 0 1 0 0 0 0 0 0
8 0 0 0 1 0 0 0 0 0 0
9 0 0 0 1 0 0 0 0 0 0
10 0 0 1 0 0 0 0 0 0 0
11 0 0 1 0 0 0 0 0 0 0
12 0 0 1 0 0 0 0 0 0 0
13 0 0 1 0 0 0 0 0 0 0
14 0 0 1 0 0 0 0 0 0 0
15 0 0 1 0 0 0 0 0 0 0
16 0 0 1 0 0 0 0 0 0 0
17 0 1 0 0 0 0 0 0 0 0
18 0 1 0 0 0 0 0 0 0 0
19 0 1 0 0 0 0 0 0 0 0
20 0 1 0 0 0 0 0 0 0 0
21 0 1 0 0 0 0 0 0 0 0
22 0 1 0 0 0 0 0 0 0 0
23 0 1 0 0 0 0 0 0 0 0
24 0 1 0 0 0 0 0 0 0 0
25 0 1 0 0 0 0 0 0 0 0
26 0 1 0 0 0 0 0 0 0 0
27 0 1 0 0 0 0 0 0 0 0
28 0 1 0 0 0 0 0 0 0 0
29 1 0 0 0 0 0 0 0 0 0
30 1 0 0 0 0 0 0 0 0 0
31 1 0 0 0 0 0 0 0 0 0
32 1 0 0 0 0 0 0 0 0 0
33 1 0 0 0 0 0 0 0 0 0
34 1 0 0 0 0 0 0 0 0 0
35 1 0 0 0 0 0 0 0 0 0
36 1 0 0 0 0 0 0 0 0 0
37 1 0 0 0 0 0 0 0 0 0
38 1 0 0 0 0 0 0 0 0 0
39 1 0 0 0 0 0 0 0 0 0
40 1 0 0 0 0 0 0 0 0 0
41 1 0 0 0 0 0 0 0 0 0
42 1 0 0 0 0 0 0 0 0 0
43 1 0 0 0 0 0 0 0 0 0
44 1 0 0 0 0 0 0 0 0 0
45 1 0 0 0 0 0 0 0 0 0
46 1 0 0 0 0 0 0 0 0 0
47 1 0 0 0 0 0 0 0 0 0
48 0 0 0 0 0 0 0 0 0 0
49 0 0 0 0 0 0 0 0 0 0
50 0 0 0 0 0 0 0 0 0 0
51 0 0 0 0 0 0 0 0 0 0
52 0 0 0 0 0 0 0 0 0 0
53 0 0 0 0 0 0 0 0 0 0
54 0 0 0 0 0 0 0 0 0 0
55 0 0 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0
57 0 0 0 0 0 0 0 0 0 0
58 0 0 0 0 0 0 0 0 0 0
59 0 0 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0 0 0
61 0 0 0 0 0 0 0 0 0 0
62 0 0 0 0 0 0 0 0 0 0
63 0 0 0 0 0 0 0 0 0 0
64 0 0 0 0 0 0 0 0 0 0
65 0 0 0 0 0 0 0 0 0 0
66 0 0 0 0 0 0 0 0 0 0
67 0 0 0 0 0 0 0 0 0 0
68 0 0 0 0 0 0 0 0 0 0
69 0 0 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 0 0 0
71 0 0 0 0 0 0 0 0 0 0
72 0 0 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0 0
74 0 0 0 0 0 0 0 0 0 0
... .. .. .. .. ... ... ... ... ... ...
877974 0 0 0 0 0 1 0 0 0 0
877975 0 0 0 0 0 1 0 0 0 0
877976 0 0 0 0 0 1 0 0 0 0
877977 0 0 0 0 0 1 0 0 0 0
877978 0 0 0 0 0 1 0 0 0 0
877979 0 0 0 0 0 1 0 0 0 0
877980 0 0 0 0 0 1 0 0 0 0
877981 0 0 0 0 0 1 0 0 0 0
877982 0 0 0 0 0 1 0 0 0 0
877983 0 0 0 0 0 1 0 0 0 0
877984 0 0 0 0 0 1 0 0 0 0
877985 0 0 0 0 0 1 0 0 0 0
877986 0 0 0 0 0 1 0 0 0 0
877987 0 0 0 0 0 1 0 0 0 0
877988 0 0 0 0 0 1 0 0 0 0
877989 0 0 0 0 0 1 0 0 0 0
877990 0 0 0 0 0 1 0 0 0 0
877991 0 0 0 0 0 1 0 0 0 0
877992 0 0 0 0 0 1 0 0 0 0
877993 0 0 0 0 0 1 0 0 0 0
877994 0 0 0 0 0 1 0 0 0 0
877995 0 0 0 0 0 1 0 0 0 0
877996 0 0 0 0 0 1 0 0 0 0
877997 0 0 0 0 0 1 0 0 0 0
877998 0 0 0 0 0 1 0 0 0 0
877999 0 0 0 0 0 1 0 0 0 0
878000 0 0 0 0 0 1 0 0 0 0
878001 0 0 0 0 0 1 0 0 0 0
878002 0 0 0 0 0 1 0 0 0 0
878003 0 0 0 0 0 1 0 0 0 0
878004 0 0 0 0 0 1 0 0 0 0
878005 0 0 0 0 0 1 0 0 0 0
878006 0 0 0 0 0 1 0 0 0 0
878007 0 0 0 0 0 1 0 0 0 0
878008 0 0 0 0 0 1 0 0 0 0
878009 0 0 0 0 0 1 0 0 0 0
878010 0 0 0 0 0 1 0 0 0 0
878011 0 0 0 0 0 1 0 0 0 0
878012 0 0 0 0 0 1 0 0 0 0
878013 0 0 0 0 0 1 0 0 0 0
878014 0 0 0 0 0 1 0 0 0 0
878015 0 0 0 0 0 1 0 0 0 0
878016 0 0 0 0 0 1 0 0 0 0
878017 0 0 0 0 0 1 0 0 0 0
878018 0 0 0 0 0 1 0 0 0 0
878019 0 0 0 0 0 1 0 0 0 0
878020 0 0 0 0 0 1 0 0 0 0
878021 0 0 0 0 0 1 0 0 0 0
878022 0 0 0 0 0 1 0 0 0 0
878023 0 0 0 0 0 1 0 0 0 0
878024 0 0 0 0 0 1 0 0 0 0
878025 0 0 0 0 0 1 0 0 0 0
878026 0 0 0 0 0 1 0 0 0 0
878027 0 0 0 0 0 1 0 0 0 0
878028 0 0 0 0 0 1 0 0 0 0
878029 0 0 0 0 0 1 0 0 0 0
878030 0 0 0 0 0 1 0 0 0 0
878031 0 0 0 0 0 1 0 0 0 0
878032 0 0 0 0 0 1 0 0 0 0
878033 0 0 0 0 0 1 0 0 0 0
878034 0 0 0 0 0 1 0 0 0 0
878035 0 0 0 0 0 1 0 0 0 0
878036 0 0 0 0 0 1 0 0 0 0
878037 0 0 0 0 0 1 0 0 0 0
878038 0 0 0 0 0 1 0 0 0 0
878039 0 0 0 0 0 1 0 0 0 0
878040 0 0 0 0 0 1 0 0 0 0
878041 0 0 0 0 0 1 0 0 0 0
878042 0 0 0 0 0 1 0 0 0 0
878043 0 0 0 0 0 1 0 0 0 0
878044 0 0 0 0 0 1 0 0 0 0
878045 0 0 0 0 0 1 0 0 0 0
878046 0 0 0 0 0 1 0 0 0 0
878047 0 0 0 0 0 1 0 0 0 0
878048 0 0 0 0 0 1 0 0 0 0
Wednesday BAYVIEW CENTRAL INGLESIDE MISSION NORTHERN PARK \
0 1 0 0 0 0 1 0
1 1 0 0 0 0 1 0
2 1 0 0 0 0 1 0
3 1 0 0 0 0 1 0
4 1 0 0 0 0 0 1
5 1 0 0 1 0 0 0
6 1 0 0 1 0 0 0
7 1 1 0 0 0 0 0
8 1 0 0 0 0 0 0
9 1 0 1 0 0 0 0
10 1 0 1 0 0 0 0
11 1 0 0 0 0 0 0
12 1 0 0 0 0 0 0
13 1 0 0 0 0 1 0
14 1 1 0 0 0 0 0
15 1 1 0 0 0 0 0
16 1 0 0 0 0 0 0
17 1 0 0 1 0 0 0
18 1 1 0 0 0 0 0
19 1 0 0 0 0 0 0
20 1 0 0 1 0 0 0
21 1 0 0 1 0 0 0
22 1 0 0 0 0 0 0
23 1 0 0 0 0 0 0
24 1 0 0 0 0 1 0
25 1 0 0 0 0 0 0
26 1 0 0 0 0 1 0
27 1 0 0 1 0 0 0
28 1 0 0 0 0 0 0
29 1 0 0 0 0 0 0
30 1 0 0 0 0 1 0
31 1 0 0 0 1 0 0
32 1 0 0 0 0 1 0
33 1 0 0 0 0 1 0
34 1 0 0 0 0 1 0
35 1 0 0 0 0 0 0
36 1 0 0 0 0 1 0
37 1 0 0 0 0 1 0
38 1 0 0 0 0 0 0
39 1 0 0 1 0 0 0
40 1 0 0 0 0 0 0
41 1 0 0 0 0 0 0
42 1 0 0 0 0 0 0
43 1 1 0 0 0 0 0
44 1 1 0 0 0 0 0
45 1 0 1 0 0 0 0
46 1 0 0 1 0 0 0
47 1 0 0 0 0 0 0
48 1 0 1 0 0 0 0
49 1 0 0 0 0 0 1
50 1 1 0 0 0 0 0
51 1 1 0 0 0 0 0
52 1 0 0 0 0 0 0
53 1 0 0 0 0 0 0
54 1 0 0 0 0 0 0
55 1 0 0 0 0 0 0
56 1 0 0 0 0 1 0
57 1 0 0 0 0 0 0
58 1 0 0 0 0 1 0
59 1 0 1 0 0 0 0
60 1 0 1 0 0 0 0
61 1 0 1 0 0 0 0
62 1 0 1 0 0 0 0
63 1 0 0 0 0 0 0
64 1 0 0 0 0 0 0
65 1 0 0 0 0 0 0
66 1 0 0 0 0 0 0
67 1 0 0 0 0 0 0
68 1 0 0 0 0 0 0
69 1 0 0 0 0 0 0
70 1 0 0 0 0 0 0
71 1 0 0 0 0 1 0
72 1 1 0 0 0 0 0
73 1 0 0 0 1 0 0
74 1 0 1 0 0 0 0
... ... ... ... ... ... ... ...
877974 0 0 0 0 0 0 1
877975 0 0 0 0 0 0 1
877976 0 0 1 0 0 0 0
877977 0 0 0 0 0 0 0
877978 0 0 0 0 0 0 0
877979 0 0 0 0 0 0 0
877980 0 0 0 0 0 0 0
877981 0 0 0 0 0 1 0
877982 0 0 0 0 0 0 0
877983 0 0 0 0 1 0 0
877984 0 0 1 0 0 0 0
877985 0 0 0 0 0 0 0
877986 0 1 0 0 0 0 0
877987 0 0 0 1 0 0 0
877988 0 0 0 0 0 0 0
877989 0 1 0 0 0 0 0
877990 0 0 0 0 0 1 0
877991 0 0 0 0 0 0 0
877992 0 0 0 0 0 0 1
877993 0 0 0 0 0 0 0
877994 0 0 0 1 0 0 0
877995 0 0 0 0 1 0 0
877996 0 0 0 0 1 0 0
877997 0 1 0 0 0 0 0
877998 0 0 0 0 0 0 1
877999 0 1 0 0 0 0 0
878000 0 1 0 0 0 0 0
878001 0 0 0 0 0 0 0
878002 0 0 0 0 0 0 0
878003 0 0 1 0 0 0 0
878004 0 0 0 0 0 1 0
878005 0 0 0 0 0 0 0
878006 0 0 0 0 0 0 0
878007 0 0 0 0 0 0 0
878008 0 0 0 1 0 0 0
878009 0 0 0 1 0 0 0
878010 0 0 0 0 0 0 0
878011 0 0 0 0 0 1 0
878012 0 0 0 0 0 0 0
878013 0 0 0 0 0 0 0
878014 0 0 0 0 0 1 0
878015 0 0 0 0 0 1 0
878016 0 1 0 0 0 0 0
878017 0 0 1 0 0 0 0
878018 0 0 1 0 0 0 0
878019 0 0 0 0 0 0 0
878020 0 0 0 0 0 1 0
878021 0 0 0 0 0 1 0
878022 0 0 0 0 1 0 0
878023 0 0 0 0 0 0 0
878024 0 0 0 0 0 0 1
878025 0 1 0 0 0 0 0
878026 0 1 0 0 0 0 0
878027 0 0 0 0 0 0 0
878028 0 0 0 0 0 0 0
878029 0 0 0 0 0 0 0
878030 0 0 0 0 0 0 0
878031 0 1 0 0 0 0 0
878032 0 0 0 0 0 1 0
878033 0 0 0 0 0 0 0
878034 0 0 0 0 0 0 0
878035 0 0 0 0 0 1 0
878036 0 0 0 0 0 1 0
878037 0 0 0 0 0 1 0
878038 0 0 0 0 0 0 0
878039 0 0 0 0 0 1 0
878040 0 0 0 0 1 0 0
878041 0 0 0 0 0 0 0
878042 0 1 0 0 0 0 0
878043 0 1 0 0 0 0 0
878044 0 0 0 0 0 0 0
878045 0 0 0 1 0 0 0
878046 0 0 0 0 0 0 0
878047 0 0 0 0 0 0 0
878048 0 1 0 0 0 0 0
RICHMOND SOUTHERN TARAVAL TENDERLOIN crime
0 0 0 0 0 37
1 0 0 0 0 21
2 0 0 0 0 21
3 0 0 0 0 16
4 0 0 0 0 16
5 0 0 0 0 16
6 0 0 0 0 36
7 0 0 0 0 36
8 1 0 0 0 16
9 0 0 0 0 16
10 0 0 0 0 16
11 0 0 1 0 21
12 0 0 0 1 35
13 0 0 0 0 16
14 0 0 0 0 20
15 0 0 0 0 20
16 0 0 0 1 25
17 0 0 0 0 1
18 0 0 0 0 21
19 0 0 0 1 20
20 0 0 0 0 16
21 0 0 0 0 25
22 0 0 0 1 37
23 0 0 0 1 20
24 0 0 0 0 16
25 0 0 0 1 20
26 0 0 0 0 16
27 0 0 0 0 16
28 0 0 1 0 16
29 0 0 1 0 21
30 0 0 0 0 16
31 0 0 0 0 20
32 0 0 0 0 35
33 0 0 0 0 16
34 0 0 0 0 35
35 0 1 0 0 16
36 0 0 0 0 16
37 0 0 0 0 16
38 0 0 1 0 38
39 0 0 0 0 35
40 0 1 0 0 20
41 0 1 0 0 16
42 0 0 0 1 16
43 0 0 0 0 21
44 0 0 0 0 21
45 0 0 0 0 21
46 0 0 0 0 36
47 0 0 1 0 16
48 0 0 0 0 20
49 0 0 0 0 4
50 0 0 0 0 25
51 0 0 0 0 1
52 0 1 0 0 16
53 0 1 0 0 16
54 0 1 0 0 32
55 0 1 0 0 16
56 0 0 0 0 16
57 0 1 0 0 16
58 0 0 0 0 16
59 0 0 0 0 36
60 0 0 0 0 36
61 0 0 0 0 8
62 0 0 0 0 32
63 0 1 0 0 20
64 0 1 0 0 16
65 0 0 1 0 16
66 0 0 0 1 37
67 0 0 0 1 37
68 0 0 0 1 21
69 0 1 0 0 16
70 0 1 0 0 16
71 0 0 0 0 16
72 0 0 0 0 16
73 0 0 0 0 36
74 0 0 0 0 16
... ... ... ... ... ...
877974 0 0 0 0 36
877975 0 0 0 0 36
877976 0 0 0 0 20
877977 0 1 0 0 21
877978 0 0 1 0 21
877979 0 0 1 0 36
877980 0 0 1 0 36
877981 0 0 0 0 32
877982 0 1 0 0 21
877983 0 0 0 0 21
877984 0 0 0 0 16
877985 0 1 0 0 21
877986 0 0 0 0 21
877987 0 0 0 0 4
877988 0 1 0 0 34
877989 0 0 0 0 21
877990 0 0 0 0 20
877991 0 1 0 0 21
877992 0 0 0 0 16
877993 0 1 0 0 21
877994 0 0 0 0 36
877995 0 0 0 0 37
877996 0 0 0 0 21
877997 0 0 0 0 21
877998 0 0 0 0 19
877999 0 0 0 0 36
878000 0 0 0 0 36
878001 0 1 0 0 21
878002 0 1 0 0 16
878003 0 0 0 0 1
878004 0 0 0 0 1
878005 0 1 0 0 21
878006 0 1 0 0 35
878007 0 1 0 0 34
878008 0 0 0 0 30
878009 0 0 0 0 21
878010 1 0 0 0 4
878011 0 0 0 0 35
878012 1 0 0 0 13
878013 0 1 0 0 4
878014 0 0 0 0 21
878015 0 0 0 0 30
878016 0 0 0 0 35
878017 0 0 0 0 25
878018 0 0 0 0 21
878019 0 1 0 0 21
878020 0 0 0 0 21
878021 0 0 0 0 35
878022 0 0 0 0 36
878023 0 0 0 1 16
878024 0 0 0 0 21
878025 0 0 0 0 21
878026 0 0 0 0 37
878027 0 1 0 0 37
878028 0 1 0 0 1
878029 0 0 0 1 21
878030 0 0 0 1 28
878031 0 0 0 0 1
878032 0 0 0 0 21
878033 1 0 0 0 35
878034 1 0 0 0 34
878035 0 0 0 0 1
878036 0 0 0 0 16
878037 0 0 0 0 35
878038 0 0 0 1 37
878039 0 0 0 0 21
878040 0 0 0 0 1
878041 1 0 0 0 21
878042 0 0 0 0 1
878043 0 0 0 0 21
878044 0 0 1 0 25
878045 0 0 0 0 16
878046 0 1 0 0 16
878047 0 1 0 0 35
878048 0 0 0 0 12
[878049 rows x 42 columns]
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 579
- 580
- 581
- 582
- 583
- 584
- 585
- 586
- 587
- 588
- 589
- 590
- 591
- 592
- 593
- 594
- 595
- 596
- 597
- 598
- 599
- 600
- 601
- 602
- 603
- 604
- 605
- 606
- 607
- 608
- 609
- 610
- 611
- 612
- 613
- 614
我们可以快速地筛出一部分重要的特征,搭建一个baseline系统,再考虑步步优化。比如我们这里
简单一点,就只取星期几和街区作为分类器输入特征,我们用scikit-learn中的train_test_split
函数拿到训练集和交叉验证集,用朴素贝叶斯和逻辑回归都建立模型,对比一下它们的表现:
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
import time
# 只取星期几和街区作为分类器输入特征
features = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
# 分割训练集(3/5)和测试集(2/5)
training, validation = train_test_split(trainData, train_size=.60)
# 朴素贝叶斯建模,计算log_loss
model = BernoulliNB()
nbStart = time.time()
model.fit(training[features], training['crime'])
nbCostTime = time.time() - nbStart
predicted = np.array(model.predict_proba(validation[features]))
print("朴素贝叶斯建模耗时 %f 秒" %(nbCostTime))
print("朴素贝叶斯log损失为 %f " %(log_loss(validation['crime'],predicted)))
#逻辑回归建模,计算log_loss
model = LogisticRegression(C=.01)
lrStart = time.time()
model.fit(training[features],training['crime'])
lrCostTime = time.time() - lrStart
predicted = np.array(model.predict_proba(validation[features]))
print("逻辑回归建模耗时 %f 秒" %(lrCostTime))
print("逻辑回归log损失为 %f" %(log_loss(validation['crime'], predicted)))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
朴素贝叶斯建模耗时 0.477027 秒
朴素贝叶斯log损失为 2.614108 秒
逻辑回归建模耗时 58.954372 秒
逻辑回归log损失为 2.621150
- 1
- 2
- 3
- 4
- 5
我们可以看到目前的特征和参数设定下,朴素贝叶斯的log损失还低一些,另外我们可以明显看到,
朴素贝叶斯建模消耗的时间远小于逻辑回归建模时间。
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
import time
features = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
hourFea = [x for x in range(0,24)]
features = features + hourFea
from sklearn.cross_validation import train_test_split
from sklearn import preprocessing
from sklearn.metrics import log_loss
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
import time
# 只取星期几和街区作为分类器输入特征
features = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday', 'BAYVIEW', 'CENTRAL', 'INGLESIDE', 'MISSION',
'NORTHERN', 'PARK', 'RICHMOND', 'SOUTHERN', 'TARAVAL', 'TENDERLOIN']
# 分割训练集(3/5)和测试集(2/5)
training, validation = train_test_split(trainData, train_size=.60)
# 朴素贝叶斯建模,计算log_loss
model = BernoulliNB()
nbStart = time.time()
model.fit(training[features], training['crime'])
nbCostTime = time.time() - nbStart
predicted = np.array(model.predict_proba(validation[features]))
print("朴素贝叶斯建模耗时 %f 秒" %(nbCostTime))
print("朴素贝叶斯log损失为 %f 秒" %(log_loss(validation['crime'],predicted)))
#逻辑回归建模,计算log_loss
model = LogisticRegression(C=.01)
lrStart = time.time()
model.fit(training[features],training['crime'])
lrCostTime = time.time() - lrStart
predicted = np.array(model.predict_proba(validation[features]))
print("逻辑回归建模耗时 %f 秒" %(lrCostTime))
print("逻辑回归log损失为 %f" %(log_loss(validation['crime'], predicted)))
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
朴素贝叶斯建模耗时 0.478027 秒
朴素贝叶斯log损失为 2.613777 秒
逻辑回归建模耗时 58.734359 秒
逻辑回归log损失为 2.621033
- 1
- 2
- 3
- 4
- 5
可以看到在这三个类别特征下,朴素贝叶斯相对于逻辑回归,依旧有一定的优势(log损失更小),
同时训练时间很短,这意味着模型虽然简单,但是效果依旧强大。
参考文献:
http://blog.csdn.net/han_xiaoyang/article/details/50629608
<link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/markdown_views-ea0013b516.css">
</div>