spring xd 参考指南

http://docs.spring.io/spring-xd/docs/1.0.0.M2/reference/html/
参考指南
引言
概观

Spring XD is a unified, distributed, and extensible service for data ingestion, real time analytics, batch processing, and data export.

Spring XD是一个统一的,分布式,可扩展的系统用于 data ingestion,实时分析,批量处理和数据导出。
The Spring XD project is an open source Apache 2 License licenced project whose goal is to tackle big data complexity.
该项目的目标是简化大数据应用的开发。

Much of the complexity in building real-world big data applications is related to integrating many disparate systems into one cohesive solution across a range of use-cases.
建立真实世界的大数据的应用程序的大部分复杂性是在于将许多不同的系统为一个完整的解决方案,在一个范围内的使用情况。
创建一个综合的大数据解决方案中常见的用例是
  高吞吐量的分布式数据的从各种输入源为大数据存储诸如HDFS或splunk收集
  在收集时进行实时分析,例如采集数据和计算值
  通过批处理进行工作流程管理。 这些工作将通过标准企业系统(RDBMS)和Hadoop操作(MapReduce,HDFS,Pig,Hive or Cascading(流注)整合在一起。
   High throughput data export, e.g. from HDFS to a RDBMS or NoSQL database.

The Spring XD project aims to provide a one stop shop solution for these use-cases.

Getting Started
Requirements

To get started, make sure your system has as a minimum Java JDK 6 or newer installed. Java JDK 7 is recommended.

Download Spring XD

http://repo.spring.io/simple/libs-milestone-local/org/springframework/xd/spring-xd/1.0.0.M4/spring-xd-1.0.0.M4-dist.zip
解压,这将产生的安装目录spring-xd-1.0.0.m2。

All the commands below are executed from this directory, so change into it before proceeding(进行,进程;行动)。

cp spring-xd-1.0.0.M4-dist.zip /opt/
cd /opt/
unzip spring-xd-1.0.0.M4-dist.zip

drwxr-xr-x  7 root   root        4096 Nov 12 13:39 spring-xd-1.0.0.M4/

$ cd spring-xd-1.0.0.M2
设置环境变量
Set the environment variable XD_HOME to the installation directory <root-install-dir>\spring-xd\xd

vi /etc/profile
export XD_HOME=/opt/spring-xd-1.0.0.M4/xd

source /etc/profile
root@Master:/etc# echo $XD_HOME
/opt/spring-xd-1.0.0.M4/xd

安装 Spring XD


Spring XD can be run in two different modes.There’s a single-node runtime option for testing and development, and there’s a distributed runtime which supports distribution of processing tasks across multiple nodes.

This document will get you up and running quickly with a single-node runtime.

See Running Distributed Mode for details on setting up a distributed runtime.


Start the Runtime and the XD Shell

The single node option is the easiest to get started with.

It runs everything you need in a single process. To start it, you just need to cd to the xd directory and run the following command
启动命令
chmod -R 777 spring-xd-1.0.0.M4

xd/bin>$ ./xd-singlenode

启动后会看的
INFO: Starting Servlet Engine: Apache Tomcat/7.0.35
XD Configuration:
        XD_HOME=/opt/spring-xd-1.0.0.M4/xd
        XD_TRANSPORT=local
        XD_STORE=memory
        XD_ANALYTICS=memory
        XD_HADOOP_DISTRO=hadoop12

在一个单独的终端 cd into the shell directory and start the XD shell, which you can use to issue commands.

cd /opt/spring-xd-1.0.0.M4/shell/bin
shell/bin>$ ./xd-shell


The shell is a more user-friendly front end to the REST API which Spring XD exposes to clients. The URL of the currently targeted Spring XD server is shown at startup.

You should now be able to start using Spring XD.

Create a Stream
在spring XD中,基本流定义了事件驱动的源数据到一个接收器的摄取过程通过任意数量的处理器 
You can create a new stream by issuing(发布) a stream create command from the XD shell。Stream defintions are built from a simple DSL. For example, execute:

xd:>stream create --definition "time | log" --name ticktock
Created new stream 'ticktock

This defines a stream named ticktock based off the DSL expression time | log. The DSL uses the "pipe" symbol |, to connect a source to a sink.

在xd窗口返回
01:47:30,823  WARN task-scheduler-6 logger.ticktock:145 - 2013-12-27 01:47:30
01:47:31,825  WARN task-scheduler-9 logger.ticktock:145 - 2013-12-27 01:47:31
01:47:32,827  WARN task-scheduler-6 logger.ticktock:145 - 2013-12-27 01:47:32
01:47:33,830  WARN task-scheduler-9 logger.ticktock:145 - 2013-12-27 01:47:33
01:47:34,845  WARN task-scheduler-6 logger.ticktock:145 - 2013-12-27 01:47:34
01:47:35,849  WARN task-scheduler-4 logger.ticktock:145 - 2013-12-27 01:47:35
01:47:36,852  WARN task-scheduler-7 logger.ticktock:145 - 2013-12-27 01:47:36
01:47:37,854  WARN task-scheduler-4 logger.ticktock:145 - 2013-12-27 01:47:37
01:47:38,856  WARN task-scheduler-7 logger.ticktock:145 - 2013-12-27 01:47:38
01:47:39,858  WARN task-scheduler-4 logger.ticktock:145 - 2013-12-27 01:47:39
01:47:40,881  WARN task-scheduler-7 logger.ticktock:145 - 2013-12-27 01:47:40

time | log

In this simple example, the time source simply sends the current time as a message each second, and the log sink outputs it using the logging framework at the WARN logging level.


To stop the stream, and remove the definition completely, you can use the stream destroy command:

xd:>stream destroy --name ticktock
Destroyed stream 'ticktock'

It is also possibly to stop and restart the stream instead, using the undeploy and deploy commands. The shell supports command completion so you can hit the TAB key to see which commands and options are available.

Command 'tab' not found (for assistance press TAB)
xd:>

!                    //                   admin               
aggregatecounter     cls                  counter             
date                 exit                 fieldvaluecounter   
gauge                hadoop               help                
http                 job                  module              
richgauge            runtime              script              
stream               system               version             

xd:>


探索spring xd

Learn about the modules available in Spring XD in the Sources(源), Processors(处理器), and Sinks(接收器) sections of the documentation.



Running in Distributed Mode
Introduction

The Spring XD distributed runtime (DIRT) supports distribution of processing tasks across multiple nodes.

Spring XD can use several middlewares(中间软件) when running in distributed mode.
At the time of writing, Redis and RabbitMQ are available options.
在写的时候,Redis and RabbitMQ 是可用选项。

curl -d "multihttp --port=9001 --rulepath=passport | file --dir=/home/focusstat/log/passport --name=passport.log" http://127.0.0.1:8080/streams/passport



http://www.open-open.com/news/view/154055d

root@Master:/opt/spring-xd-1.0.0.M4/shell/bin# netstat -antup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      673/sshd       
tcp        0     52 10.1.78.49:22           10.1.77.40:57969        ESTABLISHED 1586/1         
tcp        0      0 10.1.78.49:22           10.1.77.40:56054        ESTABLISHED 1025/0         
tcp6       0      0 :::22                   :::*                    LISTEN      673/sshd       
tcp6       0      0 :::9101                 :::*                    LISTEN      1677/java      
tcp6       0      0 :::9393                 :::*                    LISTEN      1677/java      
tcp6       0      0 127.0.0.1:9101          127.0.0.1:45686         ESTABLISHED 1677/java      
tcp6       0      0 127.0.0.1:9101          127.0.0.1:45687         ESTABLISHED 1677/java      
tcp6       0      0 127.0.0.1:45686         127.0.0.1:9101          ESTABLISHED 1677/java      
tcp6       0      0 127.0.0.1:45687         127.0.0.1:9101          ESTABLISHED 1677/java      
udp        0      0 0.0.0.0:68              0.0.0.0:*                           640/dhclient3

xd:>stream create --name httptest --definition "http | file"
Created new stream 'httptest'
xd:> http post --target http://localhost:9000 --data "hello world"
> POST (text/plain;Charset=UTF-8) http://localhost:9000 hello world
> 200 OK

root@Master:/tmp/xd/output# tail -f httptest.out
hello world

root@Master:/tmp/xd/output# curl -d "test" http://localhost:9000
root@Master:/tmp/xd/output# tail -f httptest.out


Architecture(总体、层次)结构
Introduction(介绍)

Spring XD is a unified, distributed, and extensible service for data ingestion, real time analytics, batch processing, and data export.
Spring XD是一个统一的,分布式,可扩展的系统用于 data ingestion,实时分析,批量处理和数据导出。
The foundations of XD architecture are based on the over 100+ man years of work that have gone into the Spring Batch, Integration and Data projects.
xd架构的基础是基于超过100个人年的工作(在进入spring批量,和数据的集成项目).

Building upon these projects, Spring XD provides servers and a configuration DSL that you can immediately use to start processing data.
基于这些项目,spring xd 提供服务和一个定义DSL,这个你可以立即使用来开始处理数据。
 You do not need to build an application yourself from a collection of jars to start using Spring XD.
你不需要亲自创建一个带一组jars的应用来开始使用spring xd。

spring xd 有两种操作模式--单点和多点。第一种是一个单独处理过程来负责所有的处理和管理。这种模式助于你易于开始并且使你的应用程序开发和测试更加简单。
第二种模式是分布式模式,这种模式使得处理任务可以被一组集群分解并且一个管理服务器发送指令来控制处理任务在集群上运行。

Runtime Architecture 运行时架构

spring xd的关键组件是xd管理和xd容器服务器。使用一个高层次的DSL,你通过HTTP 来post所需要的处理任务的说明管理服务器。管理服务器将处理任务映射到处理模块。一个模块是一个执行单元并且是一个spring ApplicationContext的实现。

A simple distributed runtime is provided that will assign(分配) modules to execute across multiple XD Container servers.A single XD Container server can run multiple modules. 
When using the single node runtime, all modules are run in a single XD Container and the XD Admin server is run in the same process.

DIRT(distributed runtime) Runtime

The XD Admin server breaks up a processing task into individual module definitions and publishes them to a shared queue (backed by Redis or RabbitMQ depending upon the provided transport option)

The XD Admin server把一个任务处理成单独的模块定义和发布他们到一个共享queue(支持使用或RabbitMQ取决于所提供的传输选项)。
每个容器picks up一个模块定义从queue中,在一个类似manner的round-robin(轮叫调度) 中,然后创建一个spring applicationContext来运行这个模块。

减少通过中间件之间通讯的跳数,多个模块可以组合成更大的部署单位,作为一个单一的模块。

Single Node Runtime
For testing and development purposes, a single node runtime is provided that runs the Admin and Container servers in the same process. The communication to the XD Admin server is over HTTP and the XD Admin server communicates to an in-process XD Container using an in-memory queue.

Admin Server Architecture
管理服务器使用内嵌servlet容器和暴露的两个端点的创建和删除必要的模块来执行数据处理任务(在DSL中定义的)。
超媒体即引用状态引擎(Hypermedia As The Engine Of Application State,缩写为HATEOAS)
The Admin Server is implemented using Spring’s MVC framework and the Spring HATEOAS library to create REST representations that follow the HATEOAS principle. The Admin Server communicates with the Container Servers using a pluggable transport based, the default uses Redis queues.

Container Server Architecture
The key components of data processing in Spring XD are

Streams
Streams define how event driven data is collected, processed, and stored or forwarded. For example, a stream might collect syslog data, filter, and store it in HDFS.

Jobs
Jobs define how coarse grained and time consuming batch processing steps are orchestrated, for example a job could be be defined to coordinate performing HDFS operations and the subsequent execution of multiple MapReduce processing tasks.

Jobs精心策划定义粗粒度如何和费时批处理步骤,例如,一个Job例子被定义来细条执行HDFS操作和随后的多个MapReduce处理任务的执行。

Taps
Taps are used to process data in a non-invasive way as data is being processed by a Stream or a Job. Much like wiretaps(偷听) used on telephones, a Tap on a Stream lets you consume data at any point along the Stream’s processing pipeline. The behavior of the original stream is unaffected by the presence of the Tap.




猜你喜欢

转载自wangqiaowqo.iteye.com/blog/1996592