You are currently browsing the tag archive for the ‘CDS’ tag.
Zerohedge, The Second Act Of The JPM CIO Fiasco Has Arrived – Mismarking Hundreds Of Billions In Credit Default Swaps, here. Durden is off to the races! Bloomberg reports the London Whale positions use CDS spreads for MTM distinct from the MTM spreads used by the JPM CDS desk. Not really all that shocking, but Zerohedge is speculating on how to fill in the details. This could bring magnified public focus on MTM controls at the JPM CIO and unveil some more MTM loss, but it still is not likely to explain where the Party B P&L went. Let’s see if this Bloomberg/Zerohedge angle gets any momentum. Even if this MTM mismarking story does not gain traction it does highlight how vacant and clueless the NYT Hunch, Pounce, Kill story was. The Zerohedge MTM delta estimate for the mismarked positions is five times larger than the entire documented Saba “Kill.” The ”Kill” is massively dominated by the estimated DV01 of the London Whale position. Moreover, NYT v. Zerohedge in the hyperbole races – Durden wears the Daddy pants.
Bottom line: Jamie Dimon’s “tempest in a teapot” just became a fully-formed, perfect storm which suddenly threatens his very position, and could potentially lead to billions more in losses for his firm.
Take a look again at the Flynn case for Dataflow in his Maxeler video, here. I think the argument that is playing out on the video is predicated on the observation that Moore’s Law ceased delivering extra clock cycles in about 2002 for big commercial microprocessors. Let’s accept that as true, even though I think if I go check, David Patterson will denote that “no more clock cycles for you” crossover time to sometime more like 2006. In the absence of any subsequent breakthroughs in Instruction Level Parallelism harvesting by compilers, figuratively then Dataflow architecture has a window of opportunity to demonstrate its advantages to the market unlike any previous opportunity that Arvind would have experienced in the 80s and 90s. OK?
The other argument that Flynn plays out is that compiler assisted parallel coding is challenging and typically has not demonstrated scaling parallel speed ups beyond 8 or so conventional cores. Again fine, fits with my experience from SGI compilers to now, I have no issue. Parallel programming with pragmas is no way to write code for fun and profit.
The jump to “so therefore dataflow is good for a bunch of general purpose mathmatics” needs further justification, maybe it is I don’t know. In particular financial applications like: closed form expression evaluation, discounting, lattices, and Monte Carlo for big portfolios of trades/positions, all hit the memory hierarchy slightly differently. The vectorization opportunities also vary in degree between these financial mathematics applications. I’m not sure about this but I think there were science experiments conducted somewhere uptown that showed you can teach dolphins PERL and Python and they can parallelize these position inventory calculations to speed, just don’t let them try to do load balancing because that will delay the project.
Perhaps Dataflow has some large performance advantage in 2012 but the costs of converting to FPGAs or waiting for the latest FGPA Xilinx parts in your supercomputer more than compensates for the advantage. Certainly you do not expect a massive popular dataflow movement where all your friends, holding copies of Dataflow for Dummies, flock about you to find out how to program dataflow on their phone apps. That’s not happening. Best case you are going to get a cool dataflow programming result that will make you a hero in the dataflow community but as a reward you will have to simply smile inwardly to yourself knowingly, because really no one else will ever know what you did or what you are talking about. What about debugging the production infrastructure and code, doesn’t that kill you inside a little bit every time you think about it? So, maybe it’s just me but this Dataflow architecture performance advantage better be pretty big for each of these specific financial computations to account for all the obvious really bad stuff.
Taking Flynn’s argument on face value I would guess there is cross over time for Dataflow architecture to maintain a significant performance advantage over off-the-shelf architecture, assuming everything else stays the same. I would think the cross over time will be closer to the time we see volume production of 16 multicore microprocessor chips, 2015? But that assumes nothing else changes. Big assumption.
Wall Street Journal – here Maxeler Makes Waves with Dataflow Design
HPCWire – here J.P. Morgan Deploys Maxeler Dataflow Supercomputer for Fixed Income Trading
Peter Cherasia, Head of Markets Strategies at J.P. Morgan, commented: “With the new Maxeler technology, J.P. Morgan’s trading businesses can now compute orders of magnitude more quickly, making it possible to improve our understanding and control of the profile of our complex trading risk.”
insideHPC – here JP Morgan Fires Up Maxeler FPGA Super
Compute Scotland – here Near realtime: JP Morgan & Maxeler
Apart from saving over a million dollars infrastructure cost annually, why bother running your credit derivative P&L/Risk batch Mac Pro+2s rather than Supercomputer+FPGAs+4m?
A. Rates, Mortgage, FX, Commodities P&L/Risk batches
It’s only the credit derivative guys in the Computerworld UK article that had this P&L and Risk batch performance problem in 2008 and now it is fixed?
B. Moore’s Law
More of a prediction than a law really, but it has predicted accurately for 40 years and is expected to hold for another decade. In technology if you do nothing more than “ride the wave” of Moore’s Law you are doing good, however, this observation’s simplicity can be deceiving. If the plan is for the credit derivative batch to “ride the wave” with thousands of grid CPUs, FPGAs, and associated staff you may have a challenge when scaling to larger problems, like counterparty valuation adjustment (CVA), that Moore’s Law will not automatically address for you. For example, Moore’s Law doesn’t get you power and machine room space for 10,000 grid CPUs for the CVA simulation. That is closer to “not riding the wave.”
Supercomputers and FPGAs, in addition to GPUs, Cuda, and CPU grids even dataflow architecture are all reasonable technology bets that could very well be the best way to ride the Moore’s Law wave through 2020. Using some or all of these technologies together in 2011 to run such a modest computation so slowly should elicit some follow up questions.
What is left to get a runtime estimate for the credit derivative batch on an off-the-shelf computer? We need a cooked credit curve (bootstraped hazard rate term structure) and to estimate the computational cost of the IOUs from the previous valuation step. Let’s account for the clock cycles needed to cook a USD credit curve out to 10 years with par CDS quotes at 1Y, 3Y, 5Y, 7Y, 10Y and a previously cooked USD Libor. Then we will address the additional computational cost of the valuation IOUs. The final post in this series will cover the additional computational cost of perturbational risk (multiply single curve cooking estimates by 20-30x corresponding to the number of distinct perturbed curves required for risk, ditto valuation estimates) as well as the cost of valuation for non-standard inventory CDS trades (trades where the paydates are not standard and therefore introduce an interpolation computation charge).
Credit Curve Cooking:
Assuming we are given a cooked USDLibor curve we can see from the valuation code the credit curve cooking only needs to take the par CDS quotes for the value date and return a set of assignments through JpmcdsForwardZeroPrice() for just the 20 paydates in the expected case (5Y CDS paying quarterly) or both the 20 paydates and the 130 numerical integration grid points (biweekly grid). So, if we derive estimates on the convergence speed of the one dimensional root finder, Brent’s method, and then get estimates on the cost of the Valuation IOUs we will have a reasonably complete floating point performance picture. We will assume away the cost of USD libor curve cooking and USDLibor interpolation since the computation cost is small and can be amortized across all the credits cooked against USDLibor, in any case. So, Valuation IOUs 2, 6, and 9 (see below) are assumed to cost less than one clock cycle of floating point execution. They are just variable assignments with some cost in cache pressure.
Here is the ISDA credit curve cooking code. Notice they use Brent’s method and the convergence criteria looks like zero to ten decimal paces. Given that you are cooking a credit curve that was cooked the previous business day as well you can assume the root is bracketed to 10bps or three decimal places. Let’s not even bother to account for potential compute time efficiency the bootstrapping valuations could provide. Assume every Brent function valuation cost as much as a 5Y CDS valuation.
/* we work with a continuously compounded curve since that is faster -
but we will convert to annual compounded since that is traditional */
cdsCurve = JpmcdsMakeTCurve (today,
if (cdsCurve == NULL) goto done;
context.discountCurve = discountCurve;
context.cdsCurve = cdsCurve;
context.recoveryRate = recoveryRate;
context.stepinDate = stepinDate;
context.cashSettleDate = cashSettleDate;
for (i = 0; i < nbDate; ++i)
guess = couponRates[i] / (1.0 – recoveryRate);
cl = JpmcdsCdsContingentLegMake (MAX(today, startDate),
if (cl == NULL) goto done;
fl = JpmcdsCdsFeeLegMake(startDate,
if (fl == NULL) goto done;
context.i = i;
context.cl = cl;
context.fl = fl;
if (JpmcdsRootFindBrent ((TObjectFunc)cdsBootstrapPointFunction,
0.0, /* boundLo */
1e10, /* boundHi */
100, /* numIterations */
0.0005, /* initialXstep */
0, /* initialFDeriv */
1e-10, /* xacc */
1e-10, /* facc */
&spread) != SUCCESS)
JpmcdsErrMsg (“%s: Could not add CDS maturity %s spread %.2fbp\n”,
1e4 * couponRates[i]);
cdsCurve->fArray[i].fRate = spread;
Assume Brent on average converges superlinearly like the secant method (the exponent is something like 1.6, see Convergence of Secant Method). Assume also that one iteration of Brent’s method costs one function evaluation (w. the IOUs computation time added) and some nominal extra clocks say +20%. Given the accuracy of the initial guess, the smooth objective function, and the expected superlinear convergence five iterations (1.6**5) should be good enough on average. With par CDS quotes at 1Y, 3Y, 5Y, 7Y, and 10Y to fit we expect to need 20 CDS valuations. Thinking ahead to perturbational risk (in the next post), we can obviously bundle many perturbations into a single massive vectorized Brent’s method call and drive the cooking cost massively lower. Let’s ignore this obvious optimization for now. What do we have so far?
CDS credit curve cooking estimates are then as follows:
Expected case = 1.2 * (20 * 150 + 20 * valuation IOU) clock cycles (quarterly paydates=gridpoints)
Worst case = 1.2 * (20 * 810 + 20 * valuation IOU) clock cycles (biweekly numerical integration gridpoints)
Expected case is running at 3600 cycles plus 24x the IOU cycle count. In other words, one microsecond plus some IOU accounting.
Valuation IOU Runtime Accounting:
With subtopic titles like “Valuation IOU Runtime Accounting” I am certain to cut my blog reading population in half – the price of a lonely optimization post. Recall our list of valuation IOUs. We assumed that some Rates process will exogenously produce some suitably cooked and interpolated USD Lidor (discCurve) at the cost of some L2 cache pressure to us. Note we assume on the code rewrite we vectorize and unroll loops anywhere possible in the code. So in our accounting we take the cycle counts from Intel’s vectorized math library (MKL) for double precision. Vector divides cost 5.4 cycles per element. Vector exponentials cost 7.9 cycles per element. Vector log costs 9.9 cycles and vector reciprocals cost 3.36 cycles per element. The vector lengths for these cycle counts are probably close to 1024 so perhaps the cycle counts are slightly optimistic.
Valuation IOU Accounting list
- survival [paydates] = JpmcdsForwardZeroPrice(SpreadCurve,…) 4*paydate clocks
- discount[paydates] = JpmcdsForwardZeroPrice(discCurve,…) 0 assumed away
- vector1[paydates] = notional*couponRate*accTime[paydates] paydate clocks
- vector2[paydates] = survival[paydates]*discount[paydates] paydate clocks
- s[gridpts] = JpmcdsForwardZeroPrice(SpreadCurve,…) 4*paydate clocks
- df[gridpts] = JpmcdsForwardZeroPrice(discCurve,…) 0 assumed away
- t[gridpts] grid time deltas 0 assume away
- lambda[gridpts] = log(s[gridpts]/s[gridpts])/t[gridpts] 15*gridpts clocks
- fwdRate[gridpts] = log(df[gridpts]/df[gridpts])/t[gridpts] 0 assumed away
- lambdafwdRates[gridpts] = lambda[gridpts] * fwdRates[gridpts] gridpts clocks
- vector3[gridpts0 = exp(-t[gridpts]*lambdafwdRates[gridpts]) 10 * gridpts clocks
- vector4[gridpts] = s[gridpts] * df[gridpts] gridpts clocks
- vector5[gridpts] = reciprocal (lambdafwdRates[gridpts]) 4 * gripts clocks
- t0[gridpts], t1[gridpts] coefficients for accrued interpolation 0 assume away
- thisPv[gridpts] 4 * gridpts clocks
Looks like the IOU cost in aggregate is 10 * paydate clocks or 200 clock cycles plus 35 * gridpts clocks (ugh!). So expected case assumes gridpoints = quarterly paydates so all in 45 * paydates clock cycles or 900 clocks. Worst case 200 clocks + 35 * 130 clocks or 4750 clocks.
All in then, CDS credit curve cooking estimates are as follows:
Expected case = 1.2 * (20 * 150 + 20 * 900) clock cycles (quarterly paydates=gridpoints)
Worst case = 1.2 * (20 * 810 + 20 * 4750) clock cycles (biweekly numerical integration gridpoints)
Expected case looks like 25,200 clocks or 7 microseconds. Worst case looks like 133,440 cycles or 37 microseconds. Ok, we tentatively thought 20 microseconds worst case but recall we left a chunk of optimizations unaccounted for in the interests of brevity. But the worst case is about two times more expensive than we naively expected.
From here it is easy, full risk will be 20x to 30x the single curve cooking estimate even if we are totally apathetic and clueless. Expected case is 7 * 20 microseconds call it 140 microseconds and worst case is 37 * 30 microseconds call it 1.11 milliseconds. Since we are going to argue that non-standard CDS interpolation can be stored with the trade description the incremental cycle count for non-standard deals is de minimus since the computational cost is amortized over the life of the position (5Y on average). Expected case then in our target 10K credit curves 100K CDS looks like 10 K * 140 microseconds + 100K * 20*42 nanoseconds or one and a half a seconds on a single core of a single microprocessor. If I have to do the Computerworld UK credit batch on the cheapest Mac Pro I can order from the Apple store today, the Quad core 2.8 GHz Nehalem for 2,499 USD (not gonna take the Add 1,200USD to get the 3.33 GHz Westmere). It’ll cost me 5.8 seconds on a single core but fortunately I purchased four cores (of raw power according to the Apple website) so the Computerworld UK credit derivative batch still runs in a second and change for 2,499 USD + shipping.
The first step to getting a better estimate on the off-the-shelf runtime for credit curve cooking is to get a sense of how long the valuation of a quoted SNAC CDS (standardized default swaps, no custom cashflows) requires. Getting a SNAC CDS valuation runtime estimate is the subject of this post. The outline of the argument is to take the SNAC valuation runtime estimate and cook the credit curve (bootstrap the hazard rate term structure) from the quoted SNAC default swaps with terms 1Y, 3Y, and 5Y (see O’Kane and Turnbull pgs 12-14). Once we have a cooked credit curve we can run valuation and risk for the default swap inventory dependent on the given cooked credit curve. For a given credit, the serial computation runtime for a single CDS position valuation is an order of magnitude smaller than the credit curve cooking computation. Take a look at O’Kane and Turnbull pgs5-10 to review the Jarrow-Turnbull reduced form model for valuing the contingent leg and the fee leg. See the FINCAD document describing the modeling assumptions behind the ISDA CDS Standard Model, here. Recognize also that the ISDA code doesn’t have all the calendars and currency convention data you would expect in a complete production Credit P&L but it’s OK as a proxy code for getting estimates.
Vanilla default swaps have two legs a Premium Leg and a Contingent Leg, the default swap PV (present value) is the difference between the Premium Leg PV and the Contingent Leg PV. If counterparty A is long protection in a default swap they are paying premium to get the protection of the contingent leg, as well as being short the credit. We will assume the average maturity of a default swap is 5Y. We will proceed purely with static analysis of the code with no peeking at the underlying mathematics for better optimizations – nothing but the code.
Valuing the Premium Leg
The basic computational task with the Premium Leg is to discount each of the scheduled premium cash flows off the risky curve accounting for both the time value of money and the probability that the premium may never be paid due to termination of the default swap. Typically the main computational part of valuing the Premium Leg is valuing the accrual on default. The actual discounting of the premium cashflows to be paid on the quarterly paydates is straightforward and computationally inexpensive. Accruing on default is a standard convention and refers to the portion of the premium owed by the protection buyer in the event that a default occurs between premium paydates ( as oppose to default arriving exactly on a paydate). The protection buyer owes the premium accrued up to the date that the relevant default is officially recognized. For computational efficiency we are going to push the accrued on default computation over to the Contingent Leg computation, because they are so similar. This leaves about 20-30 cycles of computation for a 5Y default swap plus an IOU to account for the more computationally expensive accrual on default valuation. Here is the relevant code from ISDA from inside a loop for each paydate:
amount = notional * couponRate * accTime;
survival = JpmcdsForwardZeroPrice(spreadCurve, today, accEndDate + obsOffset);
discount = JpmcdsForwardZeroPrice(discCurve, today, payDate);
myPv = amount * survival * discount;
Notice that the variables notional, couponRate, and n-1 of n assignments of accTime are known at compile time. The calls to the function JpmcdsForwardZeroPrice() for the spreadCurve and this discCurve are simply interpolations from the cooked curves where the interpolation parameters (at least n-1 out of n) are known at compile time. Price them as assignments and assume that the cache size will not be swamped with 20 or so doubles for the quarterly paying CDS. The product survival * discount per paydate is known at curve cooking time. The trade off is between consuming a cycle per paydate versus adding to the cache load. Let’s put it in the cache for now and assume the curve cooker will compute the product survival*discount. So there is slightly more than a cycle (call it 1.5 cycles) per quarterly paydate w. no cache pressure or wait state penalties. We will call it 30 cycles for a 5Y quarterly paying average default swap.
Valuing the Contingent Leg
We need to account for the clock cycles required to:
- PV the default swap payout given the expected default arrival over the term of the CDS contract.
- PV the accrued premium owed by the protection buyer to the protection seller in the event the default arrives between premium cash flow paydates (IOU from the payleg).
Again static code analysis only, no peeking at the mathematics. Here is the relevant ISDA code for 1. integrating the value of the terminal CDS contingent payoff over the term of the CDS contract.
s1 = JpmcdsForwardZeroPrice(spreadCurve, today, startDate);
df1 = JpmcdsForwardZeroPrice(discCurve, today, MAX(today, startDate));
loss = 1.0 – recoveryRate;
for (i = 1; i < tl->fNumItems; ++i)
s0 = s1;
df0 = df1;
s1 = JpmcdsForwardZeroPrice(spreadCurve, today, tl->fArray[i]);
df1 = JpmcdsForwardZeroPrice(discCurve, today, tl->fArray[i]);
t = (double)(tl->fArray[i] – tl->fArray[i-1])/365.0;
lambda = log(s0/s1)/t;
fwdRate = log(df0/df1)/t;
thisPv = loss * lambda / (lambda + fwdRate) *
(1.0 – exp(-(lambda + fwdRate) * t)) * s0 * df0;
myPv += thisPv;
Again, as in the case of the pay leg, all the calls to JpmcdsForwardZeroPrice() are interpolations known at credit curve cooking time so we will account for them as variable assignments and assign the computational cost of interpolation to the curve cooker. The time consuming computation here and in the code below (for accrued on default) depends on the resolution of the time grid (fNumItems) to get the discrete summation to converge to the continuous integral value. If the integration time grid has M points and r is the 5Y swap rate (the risk free rate USD currently 114 bps 19Jan12) then O’Kane and Turnbull (pg10) show the percentage error in the discreet approximation is:
In production P&L batches I have seen M=26 (biweekly integration grid points) but based on current levels M=4 (quarterly cashflow paydates) would bring the error to within the bid ask spread. Let’s assume M=26 worst cast and M=4 expected case for performance approximations.
Notice that the expensive floating point operations in these loops are the math.h functions log() and exp() and divides. We will push the log and exp calls to the curve cooker since the grid points for the integration as well as all the interpolations are known at curve cooking time. The variable lambda is known at credit curve cooking time and the variable fwdRate is known at Libor curve cooking time. Similarly all the values of t are known at compile time so we are not even going to multiply by the reciprocal of 365 inside the loop. We will also book the computational cost of the reciprocal 1.0/(lambda + fwdRate) to the curve cooker. So, no divides and no math.h calls in the loop we cost it at 4 fused add multiply cycles after vectorizing, loop unrolling, and optimization. In the expected case, loop 1 cost us 4 cycles * (4*5) grid points or 80 cycles. In the worst case, loop 1 cost us 4 cycles * (26*5) grid points or 520 clock cycles.
Here is the relevant ISDA code for 2, a very similar loop compared to the PV of the contingent payoff loop, right?
for (i = 1; i < tl->fNumItems; ++i)
if(tl->fArray[i] <= stepinDate) continue;
s1 = JpmcdsForwardZeroPrice(spreadCurve, today, tl->fArray[i]);
df1 = JpmcdsForwardZeroPrice(discCurve, today, tl->fArray[i]);
t0 = (double)(subStartDate + 0.5 – startDate)/365.0;
t1 = (double)(tl->fArray[i] + 0.5- startDate)/365.0;
t = t1-t0;
lambda = log(s0/s1)/t;
fwdRate = log(df0/df1)/t;
lambdafwdRate = lambda + fwdRate + 1.0e-50;
thisPv = lambda * accRate * s0 * df0 * (
(t0 + 1.0/(lambdafwdRate))/(lambdafwdRate) -
(t1 + 1.0/(lambdafwdRate))/(lambdafwdRate) *
s1/s0 * df1/df0);
myPv += thisPv;
s0 = s1;
df0 = df1;
subStartDate = tl->fArray[i];
We will treat both the loops simultaneously assume that the loops will be fused. The same analysis applies to this loop: laqmbda, fwdRate, lambdafwdRate, and all the interpolations and ratios are known at curve cooking time so they will be accounted for in the curve cooker not in the CDS valuation. Net, 2 fused multiply cycles will get the accrued on default value per grid point. Expected case = 40 cycles, worst case an additional 260 cycles.
CDS valuation estimates are then as follows
Expected case = 30 + 80 + 40 = 150 clock cycles, 42 ns, 23MM valuations /second
Worst case = 30 + 520 + 260 = 810 clock cycles, 225 ns, 4.5 MM valuations/second
In another post we will go through the curve cooking accounting and deal with the cache traffic we are creating with allocating computation to the curve cooker. Informally, we think cooking is an order of magnitude more expensive than valuation so we are kind of thinking under 10 – 20 microseconds to cook a curve on a single off-the-shelf core in either expected or worst case. 50K curves cooked per second seems plausible – let’s see how it goes the cache penalties/fp pipeline bubbles could catch up with us.
Dominic O’Kane has/had his own web based calculator based on his 2008 book Modeling Single-name and Multi-name Credit Derivatives which is in turn based on a very good Lehman research report O’Kane published with Stuart Turnbull in 2003, Valuation of Credit Default Swaps.
Hull and White 2003 on Valuation of a CDO and an nth to Default CDS Without Monte Carlo Simulation. If there is a broker dealer running a PDE solver on their Credit Derivative inventory for daily P&L, find out who the head of quantitative research is there and bow before that guy because he has achieved Steve Jobs-level marketing skills.
Matlab CDS pricer, here.
BionicTurtle has a YouTube video of how to run a CDS valuation on a spreadsheet, here. Appears to be the tip of iceberg of You Tube videos explaining Credit Derivatives
The Rutherford 1908 quote is appropriate in so many contexts but is useful in computational runtime optimization in quantitative finance in 2012 as well. Previously, in Totally Serial, we have shown that the 2009 global credit derivative inventory of 10MM default swap positions and 50K credit curves should take about 30 minutes to run full risk and valuation with a contemporary single microprocessor core and competitive code. Now in 2012 the single core performance is improved, but not massively, if anything the clocks offered today are a shade lower frequency than back in 2009. But you can always make a deal with the vendor to get overclocked boards a for a few more dollars. Stated like this, its too technical for the trade publication folks to grasp and apply. We need something more … have to think… Oh, lets quote the performance in iPhone equivalents, that will help, right?
An iPhone can run full risk and valuation for all the world’s credit derivatives in about the same time as it takes to watch an American Idol special episode. Lots of times today folks recognize that power consumption is an issue, so for a single counterparty, even a very large one, you could run full risk and valuation for all the counterparty’s default swap inventory on your iPhone plugged into a Panda Express electrical outlet while you wait for your take out order. If you need disaster recovery/fault tolerance, have another person with an iPhone plugged into an outlet in a Waffle House in South Carolina, run the code simultaneously, use Facetime to synchronize the execution time, and then 99 times out of 100 just send the Waffle House run results to /dev/null, the Panda Express results will be sufficient. We good?