zipline是怎么加载数据的

0. zipline关于bundle的几个命令

zipline bundles可以获取到当前已经拥有的数据。

zipline bundles

zipline clean可以清除旧的没用的数据,下面的命令替换掉bundle和int即可:

zipline clean [-b bundle] –keep-last int

zipline ingest可以从指定位置取到数据,下面的命令替换掉yourkey和bundle即可:

QUANDL_API_KEY=yourkey zipline ingest [-b bundle]

1. zipline是怎么使用这些数据的

我们在_run函数里面传递了bundlebundle_timestamp参数,这两个参数可以指定一个数据集。在_run里面我们看到这段代码:

if bundle is not None:
    bundle_data = bundles.load(
        bundle,
        environ,
        bundle_timestamp,
    )

    prefix, connstr = re.split(
        r'sqlite:///',
        str(bundle_data.asset_finder.engine.url),
        maxsplit=1,
    )
    if prefix:
        raise ValueError(
            "invalid url %r, must begin with 'sqlite:///'" %
            str(bundle_data.asset_finder.engine.url),
        )
    env = TradingEnvironment(asset_db_path=connstr, environ=environ)
    first_trading_day =\
        bundle_data.equity_minute_bar_reader.first_trading_day
    data = DataPortal(
        env.asset_finder,
        trading_calendar=trading_calendar,
        first_trading_day=first_trading_day,
        equity_minute_reader=bundle_data.equity_minute_bar_reader,
        equity_daily_reader=bundle_data.equity_daily_bar_reader,
        adjustment_reader=bundle_data.adjustment_reader,
    )

首先加载数据,然后构建了TradingEnvironment,接着构建了DataPortal。我们先看下load函数:

def load(name, environ=os.environ, timestamp=None):
    """Loads a previously ingested bundle.

    Parameters
    ----------
    name : str
        The name of the bundle.
    environ : mapping, optional
        The environment variables. Defaults of os.environ.
    timestamp : datetime, optional
        The timestamp of the data to lookup.
        Defaults to the current time.

    Returns
    -------
    bundle_data : BundleData
        The raw data readers for this bundle.
    """
    if timestamp is None:
        timestamp = pd.Timestamp.utcnow()
    timestr = most_recent_data(name, timestamp, environ=environ)
    return BundleData(
        asset_finder=AssetFinder(
            asset_db_path(name, timestr, environ=environ),
        ),
        equity_minute_bar_reader=BcolzMinuteBarReader(
            minute_equity_path(name, timestr, environ=environ),
        ),
        equity_daily_bar_reader=BcolzDailyBarReader(
            daily_equity_path(name, timestr, environ=environ),
        ),
        adjustment_reader=SQLiteAdjustmentReader(
            adjustment_db_path(name, timestr, environ=environ),
        ),
    )

加载了4个数据集:2个是用sqlite存储的,另外2个是用Bcolz存储的。至于为什么这么存,以后有时间再讨论。我们只要知道这里加载了asset数据、adjustment数据、equity_minute_bar数据和equity_daily_bar数据。加载完之后开始构建了TradingEnvironment,再这个里面也保存了asset数据相关信息。代码如下:

class TradingEnvironment(object):
    ...

    def __init__(
        self,
        load=None,
        bm_symbol='SPY',
        exchange_tz="US/Eastern",
        trading_calendar=None,
        asset_db_path=':memory:',
        future_chain_predicates=CHAIN_PREDICATES,
        environ=None,
    ):

       ...

        if isinstance(asset_db_path, string_types):
            asset_db_path = 'sqlite:///' + asset_db_path
            self.engine = engine = create_engine(asset_db_path)
        else:
            self.engine = engine = asset_db_path

        if engine is not None:
            AssetDBWriter(engine).init_db()
            self.asset_finder = AssetFinder(
                engine,
                future_chain_predicates=future_chain_predicates)
        else:
            self.asset_finder = None

最后我们重点关注下DataPortal,zipline使用的数据都和它有关系。我们下节来具体介绍。

猜你喜欢

转载自blog.csdn.net/leel0330/article/details/80022229