0. zipline关于bundle的几个命令
zipline bundles可以获取到当前已经拥有的数据。
zipline bundles
zipline clean可以清除旧的没用的数据,下面的命令替换掉bundle和int即可:
zipline clean [-b bundle] –keep-last int
zipline ingest可以从指定位置取到数据,下面的命令替换掉yourkey和bundle即可:
QUANDL_API_KEY=yourkey zipline ingest [-b bundle]
1. zipline是怎么使用这些数据的
我们在_run
函数里面传递了bundle
、bundle_timestamp
参数,这两个参数可以指定一个数据集。在_run
里面我们看到这段代码:
if bundle is not None:
bundle_data = bundles.load(
bundle,
environ,
bundle_timestamp,
)
prefix, connstr = re.split(
r'sqlite:///',
str(bundle_data.asset_finder.engine.url),
maxsplit=1,
)
if prefix:
raise ValueError(
"invalid url %r, must begin with 'sqlite:///'" %
str(bundle_data.asset_finder.engine.url),
)
env = TradingEnvironment(asset_db_path=connstr, environ=environ)
first_trading_day =\
bundle_data.equity_minute_bar_reader.first_trading_day
data = DataPortal(
env.asset_finder,
trading_calendar=trading_calendar,
first_trading_day=first_trading_day,
equity_minute_reader=bundle_data.equity_minute_bar_reader,
equity_daily_reader=bundle_data.equity_daily_bar_reader,
adjustment_reader=bundle_data.adjustment_reader,
)
首先加载数据,然后构建了TradingEnvironment
,接着构建了DataPortal
。我们先看下load
函数:
def load(name, environ=os.environ, timestamp=None):
"""Loads a previously ingested bundle.
Parameters
----------
name : str
The name of the bundle.
environ : mapping, optional
The environment variables. Defaults of os.environ.
timestamp : datetime, optional
The timestamp of the data to lookup.
Defaults to the current time.
Returns
-------
bundle_data : BundleData
The raw data readers for this bundle.
"""
if timestamp is None:
timestamp = pd.Timestamp.utcnow()
timestr = most_recent_data(name, timestamp, environ=environ)
return BundleData(
asset_finder=AssetFinder(
asset_db_path(name, timestr, environ=environ),
),
equity_minute_bar_reader=BcolzMinuteBarReader(
minute_equity_path(name, timestr, environ=environ),
),
equity_daily_bar_reader=BcolzDailyBarReader(
daily_equity_path(name, timestr, environ=environ),
),
adjustment_reader=SQLiteAdjustmentReader(
adjustment_db_path(name, timestr, environ=environ),
),
)
加载了4个数据集:2个是用sqlite存储的,另外2个是用Bcolz存储的。至于为什么这么存,以后有时间再讨论。我们只要知道这里加载了asset数据、adjustment数据、equity_minute_bar数据和equity_daily_bar数据。加载完之后开始构建了TradingEnvironment
,再这个里面也保存了asset数据相关信息。代码如下:
class TradingEnvironment(object):
...
def __init__(
self,
load=None,
bm_symbol='SPY',
exchange_tz="US/Eastern",
trading_calendar=None,
asset_db_path=':memory:',
future_chain_predicates=CHAIN_PREDICATES,
environ=None,
):
...
if isinstance(asset_db_path, string_types):
asset_db_path = 'sqlite:///' + asset_db_path
self.engine = engine = create_engine(asset_db_path)
else:
self.engine = engine = asset_db_path
if engine is not None:
AssetDBWriter(engine).init_db()
self.asset_finder = AssetFinder(
engine,
future_chain_predicates=future_chain_predicates)
else:
self.asset_finder = None
最后我们重点关注下DataPortal
,zipline使用的数据都和它有关系。我们下节来具体介绍。