自己重写了一遍<机器学习实战>里的代码。只想吐槽<机器学习实战>书里样例代码写的跟屎一样,虽然书是好书。。。
ch2
2-1
from import和import https://www.zhihu.com/question/38857862
tile https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html 类似broadcast
.sum(axis=1) 对二维array的行求和,axis=0是对列
.get(key,default value) 从字典中取对应key的value,若key不存在则返回default value
sorted https://docs.python.org/3/howto/sorting.html 搞清楚如何对tuple,对象,字典排序
.items() https://docs.python.org/3/library/stdtypes.html#mapping-types-dict 默认情况下,dict迭代的是key。如果要迭代value,可以用for value in d.values(),如果要同时迭代key和value,可以用for k, v in d.items()
为防止两个模块互相导入的问题,Python默认所有的模块都只导入一次,如果需要重新导入模块,
Python2.7可以直接用reload(),Python3可以用下面几种方法:
方法一:基本方法
from imp import reload
reload(module)
2-2
open 返回一个file对象,默认文件只读,记得用完关闭 https://docs.python.org/3/tutorial/inputoutput.html
.readlines() 用于读取所有行(直到结束符 EOF)并返回列表,该列表可以由 Python 的 for... in ... 结构进行处理
zeros https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.zeros.html
.strip https://www.tutorialspoint.com/python/string_strip.htm
.split http://python-reference.readthedocs.io/en/latest/docs/str/split.html
map http://www.runoob.com/python/python-func-map.html
.add_subplot http://blog.csdn.net/You_are_my_dream/article/details/53439518 推荐使用面向对象方式
.scatter https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html
2-3
.min()全局 .min(0)对列 .min(1)对行
2-4
\ 表示行不够,另起一行写
print('%d and %f' % (a, b)) 格式化打印
2-5
input https://www.ibm.com/developerworks/cn/linux/l-python3-1/index.html
.readline http://www.runoob.com/python/file-readline.html
2-6
listdir http://www.runoob.com/python/os-listdir.html
ch3
3-1
.keys http://www.runoob.com/python/att-dictionary-keys.html
if currentLabel not in labelCounts.keys():
log https://docs.python.org/3.6/library/math.html
3-2
.extend http://www.runoob.com/python/att-list-extend.html
.append http://www.runoob.com/python/att-list-append.html
3-4
.count http://www.runoob.com/python/att-list-count.html
del(注意和.remove区别) http://www.jb51.net/article/35012.htm
subLabels = labels[:] 拷贝复制
3-5
dict http://www.runoob.com/python/python-func-dict.html
.figure https://matplotlib.org/api/_as_gen/matplotlib.pyplot.figure.html
.clf https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure
.subplot https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplot.html 注意与.add_subplot的区别,推荐使用后者
.annotate https://matplotlib.org/users/annotations.html#using-connectorpatch
3-6
.keys() http://www.runoob.com/python/att-dictionary-keys.html python3.6后返回的是迭代器,不支持索引(需将迭代器转换为list后,方可索引)
.values() http://www.runoob.com/python/att-dictionary-values.html python3.6后返回的是迭代器,不支持索引(需将迭代器转换为list后,方可索引)
type http://www.runoob.com/python/python-func-type.html
._name_ 类的默认静态变量
Spyder快捷键
Ctrl + 1: 注释/反注释
Ctrl + 4/5: 块注释/块反注释
Ctrl + s: 保存
3-7
.text https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.text.html#matplotlib.axes.Axes.text
3-8
.index http://www.runoob.com/python/att-list-index.html
3-9
pickle.dump & pickle.load https://blog.oldj.net/2010/05/26/python-pickle/ (代码有些过时) https://docs.python.org/3/library/pickle.html(新版说明)
open http://www.runoob.com/python/python-func-open.html
ch4
4-1
set http://blog.csdn.net/business122/article/details/7541486
list*5 把list扩展5倍
4-2
list.append([1,2,3]) list.extend([1,2,3])
4-5
python中字符串前的r表示不转义,常用于正则表达式 s = r'test\tddd' print(s) 输出: test\tddd
python正则表达式(贪婪匹配) http://www.runoob.com/python/python-reg-expressions.html
.lower() 字符串转为小写 .upper() 大写
.read() 返回一个字符串,包含文件里的所有内容
range(1,26) 1-25
random.shuffle() http://www.runoob.com/python/func-number-shuffle.html
python字符串编解码 https://www.cnblogs.com/evening/archive/2012/04/19/2457440.html
# -*- coding: UTF-8 -*- http://www.runoob.com/python/python-chinese-encoding.html
windows下的ANSI编码 https://mozillazg.com/2013/09/python-windows-ansi.html
python调试(使用pdb包,很强大) https://www.ibm.com/developerworks/cn/linux/l-cn-pythondebugger/
python调试(使用spyderIDE工具) http://blog.csdn.net/qq_33039859/article/details/54645465
4-6
如何获取Craigslist的RSS feed http://brittanyherself.com/cgg/tutorial-how-to-subscribe-to-craigslists-rss-feeds/
feedparser(主要搞懂字段feed,entries) http://pythonhosted.org/feedparser/common-rss-elements.html http://blog.topspeedsnail.com/archives/8156
机器学习实战作者使用的是这个网址的RSS源: https://newyork.craigslist.org/search/stp https://sfbay.craigslist.org/search/stp(ny.feed.link可以找到)
ny.feed.title ny.feed.title_detail(RSS地址) ny.feed.link(源网站地址) ny.entries ny.entries.title ny.entries.summary ny.entries.link ny.entries.published
min http://www.runoob.com/python/func-number-min.html
.remove(注意和del区别) http://www.runoob.com/python/att-list-remove.html
python取整 https://www.cnblogs.com/lipijin/p/3714312.html 另外int()也可以取整
新的stopwords list https://www.ranks.nl/stopwords
ch5
指示函数 http://www.cnblogs.com/xiaoxuesheng993/p/7977629.html
5-1
int() float()可以直接转字符串为数字 http://www.runoob.com/python/python-func-int.html
math.exp() http://www.runoob.com/python/func-number-exp.html
5-2
ax.scatter https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html
arange https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html
Pyplot tutorial(Pyplot入门,前者调用函数方式要搞清楚current figure和current axes的概念,而后者使用Artist方式调用函数(推荐)要搞清楚figure container和axes container)
https://matplotlib.org/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py
https://matplotlib.org/tutorials/intermediate/artists.html#sphx-glr-tutorials-intermediate-artists-py
numpy提供了很多math函数的副本,可以接受数组或者矩阵作为输入,返回相同大小的数组或者矩阵 numpy.exp numpy.cos
ch6
6-1
random.randrange https://docs.python.org/3/library/random.html
6-2
multiply https://docs.scipy.org/doc/numpy/reference/generated/numpy.multiply.html
python逻辑运算符 not or and http://www.runoob.com/python/python-operators.html
python直接赋值,浅拷贝,深拷贝 http://www.runoob.com/w3cnote/python-understanding-dict-copy-shallow-or-deep.html
matrix和二维array对列切片时,matrix返回的是列向量,二维array返回的是一维array
python后导入的包会覆盖前面导入的重名包 例如from numpy import *;import random python自带的random包会覆盖numpy.random
indexing with Boolean Arrays(数组过滤) https://docs.scipy.org/doc/numpy-dev/user/quickstart.html#indexing-with-boolean-arrays
6-3
matrix.A https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.matrix.A.html
nonzero https://docs.scipy.org/doc/numpy/reference/generated/numpy.nonzero.html 与indexing with boolean arrays配合使用有奇效
shape适合用于matrix,len适合用于array