对于现在的大多数应用来说,基本上都是分布式的,单一应用的系统已经很少见了,那么对于分布式系统服务,我们总会遇到接口失败的情况,而对于这种情况,会根据返回结果类型来做不同处理,而我们今天要讨论的就是有限时间内重试的机制,该如何设计。
首先我们的思路肯定是循环尝试,是否已经到了指定时长,比方说3分钟,那么在第一次调用失败以后,我们应该在3分钟内多次尝试接口调用,如果成功则返回,如果一直失败超过3分钟,则邮件发送警告,或者采用默认返回值之类的补救措施。那么对于3分钟内多次尝试,阶段性的重复调用,该怎么设计呢。下面我描述的方案来自spring Cloud Ribbon中负载均衡在选择服务提供者的过程中的实现方案。
首先给出的自然是最大尝试时间maxRetryMillis=500;
其次对调用接口方法做修改:
public result getResultFromRemoveServer(String param){
//获取系统当前时间
long currentTime = System.currentTimeMillis();
//计算重试终止时间
long deadTime = currentTime+maxRetryMillis
//调用远端接口
Result result = getResult(param);
//判断返回值是否正常
if(result==null&&System.currentTimeMillis()<deadTime){
//表示没有成功返回而且需要重试来执行
//启动一个定时任务用于正常中断重复执行的任务
InterruptTask task = new InterruptTask(deadTime);//类的定义之后给出
//while循环来通过判断当前线程是否被中断来决定是否要结束循环并取消任务
while(!Thread.interrupted()){
result =getResult(param);
//判断返回值是否正常如果正常则跳出循环并结束task,否则暂停循环不定时继续执行while循环
if(result==null&&System.currentTimeMillis()<deadTime){
Thread.yield();//让出cpu,重新参与竞争不定期的重复调用接口
}else{
break;
}
}
task.cancel();//已经拿到正确值,或者是超时了,取消任务
}
//最后判断result值做不同操作
if(result!=null){
return result;
}else{
//一直没有拿到值
发送告警邮件或者短信到负责人;
return 默认的值或者是null
}
}
ok,到这里基本上这个方案就结束了,那么重点是两个,第一个就是task来做线程的中断,来暂停重试,第二个就是Thread.yield()方法来模拟暂停阶段性的调用接口而不是一直不停调用,降低服务器的消耗和对方接口的并发数。
下面我们给出InterruptTask的定义:
public class InterruptTask extends TimerTask { static Timer timer = new Timer("InterruptTimer", true); protected Thread target = null; public InterruptTask(long millis) { target = Thread.currentThread(); timer.schedule(this, millis); } /* Auto-scheduling constructor */ public InterruptTask(Thread target, long millis) { this.target = target; timer.schedule(this, millis); } public boolean cancel() { try { /* This shouldn't throw exceptions, but... */ return super.cancel(); } catch (Exception e) { return false; } } public void run() { if ((target != null) && (target.isAlive())) { target.interrupt(); } } }
接着给出TimerTask的定义:
public abstract class TimerTask implements Runnable { /** * This object is used to control access to the TimerTask internals. */ final Object lock = new Object(); /** * The state of this task, chosen from the constants below. */ int state = VIRGIN; /** * This task has not yet been scheduled. */ static final int VIRGIN = 0; /** * This task is scheduled for execution. If it is a non-repeating task, * it has not yet been executed. */ static final int SCHEDULED = 1; /** * This non-repeating task has already executed (or is currently * executing) and has not been cancelled. */ static final int EXECUTED = 2; /** * This task has been cancelled (with a call to TimerTask.cancel). */ static final int CANCELLED = 3; /** * Next execution time for this task in the format returned by * System.currentTimeMillis, assuming this task is scheduled for execution. * For repeating tasks, this field is updated prior to each task execution. */ long nextExecutionTime; /** * Period in milliseconds for repeating tasks. A positive value indicates * fixed-rate execution. A negative value indicates fixed-delay execution. * A value of 0 indicates a non-repeating task. */ long period = 0; /** * Creates a new timer task. */ protected TimerTask() { } /** * The action to be performed by this timer task. */ public abstract void run(); /** * Cancels this timer task. If the task has been scheduled for one-time * execution and has not yet run, or has not yet been scheduled, it will * never run. If the task has been scheduled for repeated execution, it * will never run again. (If the task is running when this call occurs, * the task will run to completion, but will never run again.) * * <p>Note that calling this method from within the <tt>run</tt> method of * a repeating timer task absolutely guarantees that the timer task will * not run again. * * <p>This method may be called repeatedly; the second and subsequent * calls have no effect. * * @return true if this task is scheduled for one-time execution and has * not yet run, or this task is scheduled for repeated execution. * Returns false if the task was scheduled for one-time execution * and has already run, or if the task was never scheduled, or if * the task was already cancelled. (Loosely speaking, this method * returns <tt>true</tt> if it prevents one or more scheduled * executions from taking place.) */ public boolean cancel() { synchronized(lock) { boolean result = (state == SCHEDULED); state = CANCELLED; return result; } } /** * Returns the <i>scheduled</i> execution time of the most recent * <i>actual</i> execution of this task. (If this method is invoked * while task execution is in progress, the return value is the scheduled * execution time of the ongoing task execution.) * * <p>This method is typically invoked from within a task's run method, to * determine whether the current execution of the task is sufficiently * timely to warrant performing the scheduled activity: * <pre>{@code * public void run() { * if (System.currentTimeMillis() - scheduledExecutionTime() >= * MAX_TARDINESS) * return; // Too late; skip this execution. * // Perform the task * } * }</pre> * This method is typically <i>not</i> used in conjunction with * <i>fixed-delay execution</i> repeating tasks, as their scheduled * execution times are allowed to drift over time, and so are not terribly * significant. * * @return the time at which the most recent execution of this task was * scheduled to occur, in the format returned by Date.getTime(). * The return value is undefined if the task has yet to commence * its first execution. * @see Date#getTime() */ public long scheduledExecutionTime() { synchronized(lock) { return (period < 0 ? nextExecutionTime + period : nextExecutionTime - period); } } }
好了,到这里基本上该方案的说明就结束了,上述方案的具体案例请查询spring cloud Ribbon中的源码中查看RetryRule中的choose方法,而InterruptTask的源码请查看包com.netflix.loadbalancer中,而TimeTask则是jdk中的java.util包里的类。
再次声明,上述方案的具体学习来自spring Cloud 微服务实战,有不懂的地方可以参考本书,写的很好,也可以留言讨论。