实验6-1 用拉格朗日插值法
题目描述:用拉格朗日插值法对missing_data.xls中表格的空值进行填补。
# p1, lab6
# Fill all of the null values with Lagrange's interpolation
# Data file name is "missing_data.xls"
import pandas as pd
from scipy.interpolate import lagrange
dir = 'F:/Data Mining/codes/ch6/lab6_1' # dir is a built-in name, will be shadowed if is distinctly defined
data = pd.read_excel(dir + '/data/missing_data.xls', header=None) # header=None indicates that the table does not have header
def lagrange_interpolate(s, n, k=5):
y = s[list(range(n-k, n)) + list(range(n+1, n+1+k))] # may create indexes out of bound, which are defined as null values
y = y[y.notnull()] # y.notnull() returns a Series object in boolean type
return lagrange(y.index, list(y))(n)
# method lagrange(x, w) in module scipy.interpolate
# param x is an array like object, represents the x-coordinates of a set of points
# param w is an array like object, represents the y-coordinates of a set of points
# return a numpy.lib.polynomial.poly1d object (polynomial type) represents the Lagrange interpolating polynomial
# WARNING: this implementation is unstable, do not expect to be able to use more than 20 points
# (poly1d)(n) gets the result of the polynomial when x=n
for col in data.columns:
for i in range(len(data)):
if data[col].isnull()[i]: # Series.isnull() returns a Series object in boolean type
data[col][i] = lagrange_interpolate(data[col], i) # DataFrame[column][index] can locate elements in the DataFrame object
# error ever made: in the conditional statement, miss [col] so that returns a DataFrame object rather than a Series object
data.to_excel(dir + '/data/result.xls', header=None, index=False) # the last two params construct a table without header and index
missing_data.xls
result.xls
我学到了什么?
- 拉格朗日插值法 https://blog.csdn.net/xidiancoder/article/details/71244316
- excel表格导入和导出时表头和索引的控制
df.read_excel(header=None) 说明读入的表格没有表头,否则missing_data.xls的首行会被当作表头
df.to_excel(header=None, index=False) 指定导出的表格不含表头和索引,否则result.xls会有表头并在最左边显示索引
- isnull()和notnull()的返回对象
二者都是DataFrame或Series的方法,用于空值的判断,返回DataFrame或Series对象。isnull()方法在空值的位置记为True,否则记为False;notnull()方法在空值的位置记为False,否则记为True
- DataFrame对象的定位
data[column][index]可以定位到列名为column、索引名为index的位置
- 提取数据时越界
在上面的lagrange_interpolate()方法中,首行用于提取样本点,显然(n-k)和(n+k)都可能越界。但是通过调试观察发现,当发生越界时,越界的下标对应的位置值位空值,然后在配合下一条去除空值的语句将越界的取值剔除了
扫描二维码关注公众号,回复:
3622578 查看本文章
![](/qrcode.jpg)