The first step to getting a better estimate on the off-the-shelf runtime for credit curve cooking is to get a sense of how long the valuation of a quoted SNAC CDS (standardized default swaps, no custom cashflows)  requires. Getting a SNAC CDS valuation runtime estimate is the subject of this post. The outline of the argument is to take the SNAC valuation runtime estimate and cook the credit curve (bootstrap the hazard rate term structure) from the quoted SNAC default swaps with terms 1Y, 3Y, and 5Y (see O’Kane and Turnbull pgs 12-14). Once we have a cooked credit curve we can run valuation and risk for the default swap inventory dependent on the given cooked credit curve. For a given credit, the serial computation runtime for a single CDS position valuation is an order of magnitude smaller than the credit curve cooking computation. Take a look at O’Kane and Turnbull pgs5-10 to review the Jarrow-Turnbull reduced form model for valuing the contingent leg and the fee leg. See the FINCAD document describing the modeling assumptions behind the ISDA CDS Standard Model, here. Recognize also that the ISDA code doesn’t have all the calendars and currency convention data you would expect in a complete production Credit P&L but it’s OK as a proxy code for getting estimates.

Vanilla default swaps have two legs a Premium Leg and a Contingent Leg, the default swap PV (present value) is the difference between the Premium Leg PV and the Contingent Leg PV. If counterparty A is long protection in a default swap they are paying premium to get the protection of the contingent leg, as well as being short the credit. We will assume the average maturity of a default swap is 5Y. We will proceed purely with static analysis of the code with no peeking at the underlying mathematics for better optimizations  – nothing but the code.

Valuing the Premium Leg

The basic computational task with the Premium Leg is to discount each of the scheduled premium cash flows off the risky curve accounting for both the time value of money and the probability that the premium may never be paid due to termination of the default swap. Typically the main computational part of valuing the Premium Leg is valuing the accrual on default. The actual discounting of the premium cashflows to be paid on the quarterly paydates is straightforward and computationally inexpensive. Accruing on default is a standard convention and refers to the portion of the premium owed by the protection buyer in the event that a default occurs between premium paydates ( as oppose to default arriving exactly on a paydate). The protection buyer owes the premium accrued up to the date that the relevant default is officially recognized.  For computational efficiency we are going to push the accrued on default computation over to the Contingent Leg computation, because they are so similar. This leaves about 20-30 cycles of computation for a 5Y default swap plus an IOU to account for the more computationally expensive accrual on default valuation. Here is the relevant code from ISDA from inside a loop for each paydate:

amount   = notional * couponRate * accTime;

survival = JpmcdsForwardZeroPrice(spreadCurve, today, accEndDate + obsOffset);

discount = JpmcdsForwardZeroPrice(discCurve, today, payDate);

myPv = amount * survival * discount;

Notice that the variables notional, couponRate, and n-1 of n assignments of accTime are known at compile time.  The calls to the function JpmcdsForwardZeroPrice() for the spreadCurve and this discCurve are simply interpolations from the cooked curves where the interpolation  parameters (at least n-1 out of n) are known at compile time. Price them as assignments and assume that the cache size will not be swamped with 20 or so doubles for the quarterly paying CDS. The product survival * discount per paydate is known at curve cooking time.  The trade off is between consuming a cycle per paydate versus adding to the cache load. Let’s put it in the cache for now and assume the curve cooker will compute the product survival*discount.  So there is slightly more than a cycle (call it 1.5 cycles) per quarterly paydate w. no cache pressure or wait state penalties. We will call it 30 cycles for a 5Y quarterly paying average default swap.

Valuing the Contingent Leg

We need to account for the clock cycles required to:

  1. PV the default swap payout given the expected default arrival over the term of the CDS contract.
  2. PV the accrued premium owed by the protection buyer  to the protection seller in the event the default arrives between premium cash flow paydates (IOU from the payleg).

Again static code analysis only, no peeking at the mathematics.  Here is the relevant ISDA code for 1.  integrating the value of the terminal CDS contingent payoff over the term of the CDS contract.

s1  = JpmcdsForwardZeroPrice(spreadCurve, today, startDate);

df1 = JpmcdsForwardZeroPrice(discCurve, today, MAX(today, startDate));

loss = 1.0 – recoveryRate;

for (i = 1; i < tl->fNumItems; ++i)

{

double lambda;

double fwdRate;

double thisPv;

s0  = s1;

df0 = df1;

s1  = JpmcdsForwardZeroPrice(spreadCurve, today, tl->fArray[i]);

df1 = JpmcdsForwardZeroPrice(discCurve, today, tl->fArray[i]);

t   = (double)(tl->fArray[i] – tl->fArray[i-1])/365.0;

lambda  = log(s0/s1)/t;

fwdRate = log(df0/df1)/t;

thisPv  = loss * lambda / (lambda + fwdRate) *

(1.0 – exp(-(lambda + fwdRate) * t)) * s0 * df0;

myPv += thisPv;

}

Again, as in the case of the pay leg, all the calls to JpmcdsForwardZeroPrice() are interpolations known at credit curve cooking time so we will account for them as variable assignments and assign the computational cost of interpolation to the curve cooker. The time consuming computation here and in the code below (for accrued on default) depends on the resolution of the time grid (fNumItems) to get the discrete summation to converge to the continuous integral value.  If the integration time grid has M points and r is the 5Y swap rate (the risk free rate USD currently 114 bps 19Jan12) then O’Kane and Turnbull (pg10) show the percentage error in the discreet approximation is:

r/2*M

In production P&L batches I have seen M=26 (biweekly integration grid points) but based on current levels M=4 (quarterly cashflow paydates) would bring the error to within the bid ask spread. Let’s assume M=26 worst cast and M=4 expected case for performance approximations.

Notice that the expensive floating point operations in these loops are the math.h functions log() and exp() and divides. We will push the log and exp calls to the curve cooker since the grid points for the integration as well as all the interpolations are known at curve cooking time. The variable lambda is known at credit curve cooking time and the variable fwdRate is known at Libor curve cooking time. Similarly all the values of t are known at compile time so we are not even going to multiply by the reciprocal of 365 inside the loop. We will also book the computational cost of the reciprocal 1.0/(lambda + fwdRate) to the curve cooker. So, no divides and no math.h calls in the loop we cost it at 4 fused add multiply cycles after vectorizing, loop unrolling, and optimization. In the expected case, loop 1 cost us 4 cycles * (4*5) grid points or 80 cycles. In the worst case, loop 1 cost us 4 cycles * (26*5) grid points or 520 clock cycles.

Here is the relevant ISDA code for 2, a very similar loop compared to the PV of the contingent payoff loop, right?

for (i = 1; i < tl->fNumItems; ++i)

{

double lambda;

double fwdRate;

double thisPv;

double t0;

double t1;

double lambdafwdRate;

if(tl->fArray[i] <= stepinDate)          continue;

s1  = JpmcdsForwardZeroPrice(spreadCurve, today, tl->fArray[i]);

df1 = JpmcdsForwardZeroPrice(discCurve, today, tl->fArray[i]);

t0  = (double)(subStartDate + 0.5 – startDate)/365.0;

t1  = (double)(tl->fArray[i] + 0.5- startDate)/365.0;

t   = t1-t0;

lambda  = log(s0/s1)/t;

fwdRate = log(df0/df1)/t;

lambdafwdRate = lambda + fwdRate + 1.0e-50;

thisPv  = lambda * accRate * s0 * df0 * (

(t0 + 1.0/(lambdafwdRate))/(lambdafwdRate) -

(t1 + 1.0/(lambdafwdRate))/(lambdafwdRate) *

s1/s0 * df1/df0);

myPv += thisPv;

s0  = s1;

df0 = df1;

subStartDate = tl->fArray[i];

}

We will treat both the loops simultaneously assume that the loops will be fused.  The same analysis applies to this loop: laqmbda, fwdRate, lambdafwdRate, and all the interpolations and ratios are known at curve cooking time so they will be accounted for in the curve cooker not in the CDS valuation. Net, 2 fused multiply cycles will get the accrued on default value per grid point. Expected case = 40 cycles, worst case an additional 260 cycles.

CDS valuation estimates are then as follows

Expected case =  30 + 80 + 40 = 150 clock cycles, 42 ns, 23MM valuations /second

Worst case = 30 +  520 + 260 = 810 clock cycles, 225 ns, 4.5 MM valuations/second

In another post we will go through the curve cooking accounting and deal with the cache traffic we are creating with allocating computation to the curve cooker. Informally, we think cooking is an order of magnitude more expensive than valuation so we are kind of thinking under 10 – 20 microseconds to cook a curve on a single off-the-shelf core in either expected or worst case. 50K curves cooked per second seems plausible  – let’s see how it goes the cache penalties/fp pipeline bubbles could catch up with us.

About these ads