You are currently browsing the category archive for the ‘Analytics’ category.
Don Clark, WSJ, Google Joins Supercomputing Project, here. Vern hasn’t talked to many folks in the Street, I fear. 11,000 x speedups even for just programming ( never mind hardware changes) are not that rare on the Street. Recently, one senior BD manager quoted FX Black Scholes performance of one evaluation in 1.76 milliseconds on their production low latency FX cash trading system. Didn’t even flinch when he quoted the benchmark, as his boss smiled knowingly in the background. The thing is the quoted a benchmark performance is faster than a blink of an eye which for many is like plenty infinite fast. You would need to have something approaching the skills and charm of Neil DeGrasse Tyson to directly convince them otherwise. Maxeler’s Award winning Credit batch QED.
I suppose I’m not convinced that I would want it to be otherwise.
The tests of the D-Wave system compared its performance on selected algorithms with a computer powered by a standard Intel Corp. INTC +1.49% chip. In some cases, D-Wave said, the quantum system was as much as 11,000 times faster.
“I pinch myself when I hear that,” said Vern Brownell, D-Wave’s chief executive. “When in the history of computer science has there been something that’s 11,000 times faster than the previous technology?”
Matt Levine, DealBreaker, Barclays Was Never Sure If It Cared Most About “Loving Success,” “100% Energy,” or “Think Smart”, here. These guys’ Kung Fu is good, very good.

Benjamen Walker, Too Much Information, 1-Apr Interview w Douglas Rushkoff, here. Listened to Walker’s podcast this morning – big format change – hope he gets back to talking over music.
One Hour with Douglas Rushkoff who talks about his new book “Present Shock”
Douglas Rushkoff, website, here. Kind of like this “recent versus relevant” analysis framework.
Rushkoff argues that the future is now and we’re contending with a fundamentally new challenge. Whereas Toffler said we were disoriented by a future that was careening toward us, Rushkoff argues that we no longer have a sense of a future, of goals, of direction at all. We have a completely new relationship to time; we live in an always-on “now,” where the priorities of this moment seem to be everything.
Pontus Eriksson, Sungard, Do you really know how to price an interest rate swap? here. All these papers are going to dive right in to the idea of recovering risk-neutral pricing in light of Libor being a curve with a non trivial credit spread component. That is important. If you are coming from a Equity background where long and short positions in a stock leave you flat there is one big difference in the irs world in determining if your portfolio is flat. Instead of a stock price you need a Swap Curve(s) and a pricing function for discounting expected cash flows and projecting future floating rates. Once you know the swap curve you can use Taylor’s Theorem (a lot) to show that for small changes in the prices of instruments used to construct the swap curve leads to corresponding changes in the price of the portfolio. If you can balance your portfolio so that the perturbations in the underlying leave your first and second derivatives (of the pricing function) small as well as the delta in the portfolio price being small then you are “flat.” Some rates folks will call this monitoring the Risk Along the Curve. There are lots of ways to get your swap portfolio flat overnight – think of it as a new fancy objective function. Sungard is kind of like Calypso.
Similarly during the crisis, the spread between the EUR deposit rate and the overnight indexed swap (OIS) rate exploded, peaking at roughly the time of Lehman’s default. Since then the basis has shrunk, but it has not reached the levels seen before the crisis. This has led to the notion of OIS discounting.
JR Varma, An Introductory Note on Two Curve Discounting, here. References at the end are also good as i recall.
“Ten years ago if you had suggested that a sophisticated investment bank did not know how to value a plain vanilla interest rate swap, people would have laughed at you. But that isn’t too far from the case today.”
– Deus Ex Machiatto3, June 23, 2010
Hull and White, Libor vs. OIS: The Derivatives Discounting Dilemma, here. This paper is new to me.
Traditionally practitioners have used LIBOR and LIBOR swap rates as proxies for risk-free rates when valuing derivatives. This practice has been called into question by the credit crisis that started in 2007. Many banks now consider that overnight indexed swap (OIS) rates should be used for discounting when collateralized portfolios are valued and that LIBOR should be used for discounting when portfolios are not collateralized. This paper critically examines this practice. We show that it is not generally possible to handle credit risk by changing the discount curve.
Robert X. Cringely, I, Cringely, Who’s your daddy? Intel swoons for Apple, here. If you write Wall Street style analytics (e.g., expression evaluation with great locality) you have to root for AVX2 to get to market before the mobile wave shuts down further floating point feature development. Smart architecture guy was asking how you can get reliable cycle count estimates on Wall Street analytics. Informally, I think the answer is – you know what the vectorized max performance is, you know the equations being evaluated, you know approximately how many adds and multiplies are required to retire the equations to be evaluated, and you know locality is not going to get so bad that you have to think deep thoughts about the memory hierarchy (i.e., the locality will give you something like 80% of the vectorized benchmark to retire ops) in many cases. I’m sure there are exceptions – but the gaussian copula for the London Whale is not one of them, for example. For that matter almost nothing in vanilla credit, rates, or FX analytics is an exception to this informal rule. Even the exotic stuff is most often just the Monte Carlo version of the vanilla valuation analytics with finite difference approximations. The thing that matters for expression evaluation is Locality, Locality, Locality. That, by the way, is one way you can easily tell when Execs are lost about what they are doing w specific analytics. Does not happen all that often but it is there if you look for it. Like the London Whale guys and the FPGA supercomputer running something about as fast as an optimized piece of code on an old iPhone, Happy Festivus.
Just days after I wrote a column saying Apple will dump Inteland make Macintosh computers with its own ARM-based processors, along comes a Wall Street analyst saying no, Intel will be taking over from Samsung making the Apple-designed iPhone and iPod chips and Apple will even switch to x86 silicon for future iPads. Well, who is correct?
Maybe both, maybe neither, but here’s what I think is happening.
Zerohedge, This Is What Happens When A Mega Bank Is Caught Red-Handed, here. Durden is really good at this.
Which brings up three important questions:
- Now that the “trading desk” that was responsible for up to 25% of JPM’s net income has been effectively closed, how will Jamie Dimon succeed in creating recurring profits in line with historical average and future expectations?
- What will happen to the other “VaRs” once they too are exposed, either after the loss is uncovered, or when regulators actually dare to do their job for once and truly dig through the banks’ books?
- Which other bank has a huge and heretofore undisclosed multi-billion derivative “easter egg” on its books?
For Question 3 we may have a suggestion.
Xilinx, High Performance Computing Using FPGAs, Sep 2010, here.
The shift to multicore CPUs forces application developers to adopt a parallel programming model to exploit CPU performance. Even using the newest multicore architectures, it is unclear whether the performance growth expected by the HPC end user can be delivered, especially when running the most data- and compute- intensive applications. CPU-based systems augmented with hardware accelerators as co-processors are emerging as an alternative to CPU-only systems. This has opened up opportunities for accelerators like Graphics Processing Units (GPUs), FPGAs, and other accelerator technologies to advance HPC to previously unattainable performance levels.
I buy the argument to a degree. As the number of cores per chip grow, the easy pipelining and parallelization opportunities will diminish. The argument is stronger if there are more cores per chip. 8 cores or under per general purpose chip it’s sort of a futuristic theoretical argument. More than a few programmers can figure out how to code up a 4 to 8 stage pipeline for their application without massive automated assistance. But the FPGA opportunity does exist.
The convergence of storage and Ethernet networking is driving the adoption of 40G and 100G Ethernet in data centers. Traditionally, data is brought into the processor memory space via a PCIe network interface card. However, there is a mismatch of bandwidth between PCIe (x8, Gen3) versus the Ethernet 40G and 100G protocols; with this bandwidth mismatch, PCIe (x8, Gen3) NICs cannot support Ethernet 40G and 100G protocols. This mismatch creates the opportunity for the QPI protocol to be used in networking systems. This adoption of QPI in networking and storage is in addition to HPC.
I buy the FPGA application in the NIC space. I want my NIC to go directly to L3 pinned pages, yessir I do, 100G please.
Xilinx FPGAs double their device density from one generation to the next. Peak performance of FPGAs and processors can be estimated to show the impact of doubling the performance on FPGAs [Ref 6], [Ref 7]. This doubling of capacity directly results in increased FPGA compute capabilities.
The idea proposed here is that you want to be on the exponentially increasing density curve for the FPGAs in lieu of clock speed increases you are never going to see again. Sort of a complicated bet to make for mortals, maybe.
I like how they do the comparisons though. They say here is our Virtex-n basketball player and here is the best NBA Basketball player … and they show you crusty old Mike Bibby 2012. Then they say watch as the Virtex-n basketball player takes Mike Bibby down low in the post, and notice the Virtex-n basketball player is still growing exponentially. So you can imagine how much better he will do against Mike Bibby in the post next year. Finally they say that Mike Bibby was chosen as the best NBA player for this comparison by his father Henry, who was also a great NBA player.
FPGAs tend to consume power in tens of watts, compared to other multicores and GPUs that tend to consume power in hundreds of watts. One primary reason for lower power consumption in FPGAs is that the applications typically operate between 100–300 MHz on FPGAs compared to applications on high-performance processors executing between 2–3 GHz.
Silly making Lemonade out of Lemons argument, the minute I can have my FPGAs clocked at 3 GHz I throw away the 300MHz FPGAs, no?
Intel, An Introduction to the Intel QuickPath Interconnect, QPI, Jan 2009, here.
Xilinx Research Labs/NCSA, FPGA HPC – The road beyond processors, Jul 2007, here. Need more current references but I keep hearing the same themes in arguments for FGPA HPC, so let’s think about this for a bit:
FPGAs have an opening because you are not getting any more clocks from microprocessor fab shrinks: OK.
Power density: meh. Lots of FinQuant code can run on a handful of cores. The Low Latency HFT folks cannot really afford many L2 misses. The NSA boys are talking about supercomputers for crypto not binary protocol parsing.
Microprocessors have all functions that are hardened in silicon and you pay for them whether you use them or not and you can’t use that silicon for something else: Meh, don’t really care if I use all the silicon on my 300 USD microprocessor as long as the code is running close to optimal on the parts of the silicon useful to my application. It would be nice if I got more runtime performance for my 300USD, no doubt. This point is like Advil is bad because you don’t always need to finish the bottle after you blow out your ankle. Yeah, I understand the silicon real estate is the most expensive in the world.
Benchmarks: Black Scholes 18msec FPGA @ 110 Mhz Virtex-4 203x faster than Opeteron – 2.2 Ghz: You Cannot be Serious! 3.7 microseconds per Black Scholes evaluation was competitive performance at the turn of the century. The relative speedup slides and quotations make me nervous. Oh, Celoxica provided the data – hey Black Scholes in 36 Nanoseconds on a single core of a dual core off-the-shelf general microprocessor from 2007. So the Virtex-4 does 1M Black Scholes evaluations in 18 milliseconds flat to competitive code on a dual core general purpose off-the-shelf microprocessor in 2007.
Make it easy for the users to use this hardware and get „enough of a performance‟ increase to be useful: meh, it’s for applications that do not need to go fast, for now (2007)?
Do not try to be the fastest thing around when being as fast with less power is sufficient: meh, really do not care so much about the power thing
FPGA: Different operations map to different silicon allows massive pipelining; lots of parallelism: OK. So, why bother with the previous two points?
Eggers/ U. Washington, CHiMPS, here. Eggers is reasonable.
There have been (at least) two hindrances to the widespread adoption of FPGAs by scientific application developers: having to code in a hardware description language, such as Verilog (with its accompanying hardware-based programming model) and poor FPGA memory performance for random memory accesses. CHiMPS, our C-to-FPGA synthesis compiler, solves both problems with one memory architecture, the many-cache memory model.
Many-cache organizes the small, distributed memories on an FPGA into application-specific caches, each targeting a particular data structure or region of memory in an application and each customized for the particular memory operations that access it.
CHiMPS provides all the traditional benefits we expect from caching. To reduce cache latency, CHiMPS duplicates the caches, so that they’re physically located near the hardware logic blocks that access them. To increase memory bandwidth, CHiMPS banks the caches to match the memory parallelism in the code. To increase task-level parallelism, CHiMPS duplicates caches (and their computation blocks) through loop unrolling and tiling. Despite the lack of FPGA support for cache coherency, CHiMPS facilitates data sharing among FPGA caches and between the FPGA and its CPU through a simple flushing of cached values. And in addition, to harness the potential of the massively parallel computation offered by FPGAs, CHiMPS compiles to a spatial dataflow execution model, and then provides a mechanism to order dependent memory operations to retain C memory ordering semantics.
CHiMPS’s compiler analyses automatically generate the caches from C source. The solution allows scientific programmers to retain their familiar programming environment and memory model, and at the same time provides performance that is on average 7.8x greater and power that is one fourth that of a CPU executing the same source code. The CHiMPS work has been published in the International Symposium on Computer Architecture (ISCA, 2009), the International Conference on Field Programmable Logic and Applications (FPL, 2008), and High-Performance Reconfigurable Computing Technology and Applications (HPRCTA, 2008), where it received the Best Paper Award.
BBC News magazine, Black-Scholes: The maths formula linked to the financial crash, here.
It’s not every day that someone writes down an equation that ends up changing the world. But it does happen sometimes, and the world doesn’t always change for the better. It has been argued that one formula known as Black-Scholes, along with its descendants, helped to blow up the financial world.
It doesn’t say if Scotland Yard had Scholes in for questioning yet. Oh, this story is sourced from Ian Stewart the math guy from Warwick.
Stewart says the lessons from Long-Term Capital Management were obvious. “It showed the danger of this kind of algorithmically-based trading if you don’t keep an eye on some of the indicators that the more conventional people would use,” he says. “They [Long-Term Capital Management] were committed, pretty much, to just ploughing ahead with the system they had. And it went wrong.”
Scholes says that’s not what happened at all. “It had nothing to do with equations and nothing to do with models,” he says. “I was not running the firm, let me be very clear about that. There was not an ability to withstand the shock that occurred in the market in the summer and fall of late 1998. So it was just a matter of risk-taking. It wasn’t a matter of modelling.”
Would it be a bad thing if John Meriwether and Myron Scholes attend a remedial applied maths course taught by Professor Stewart? Perhaps not, it could be awesome if there is You Tube video of the class.
One Div Zero: tentative add to our heroes list – James Iry “If cars were built like software then…well, I don’t know squat about building cars so who knows. It might be kinda cool. But probably not.” A Brief and Incomplete and Mostly Wrong History of Programming Languages
Zerohedge on the CDS market trade volume distribution and a proposal for CDS indicies to be exchange traded, here.
Kamakura Corporation, Risk management tools and Jarrow is involved somehow look at the research and blogs, here.
Cloud Computing get reviewed by US DOE, Argonne, and Lawrence Berkeley and gets a grade of meh, here from Clusterstock.
Salmon on Udacity, here. Stanford AI professor starts up online University UDACITY. Agreed this looks like it could grow. An AI course at Stanford gets 100K worldwide enrollment? wow. I would like to hear why the notion that Stanford, Harvard, Ptown, or Oxford should brand this is an obviously bad idea.
HPCWire Russell Fish @ Venray Technologies has an embedded Microprocessor in DRAM play, here. Problem is apart from Mortgages I doubt much Street P&L/Risk analytics is intrinsically memory bandwidth starved as opposed to processing starved. Super good at creating pipeline bubbles though.
Asymco analysis, here, highlights recent trends in computing. It looks increasingly as if you need to adapt to what the commercial market is giving you, even in floating point. I suspect that Joe and Suzy Sixpack will decide how you get your fp cycles. GPUs start to look more attractive in this light,right?
Oh and Terry Tao on Black Scholes, here
“Sandy Bridge-EP” Xeon E5 processors and their related “Romley” server platforms, are now in volume shipment, here
Overclocking insurance for Sandy Bridge from Intel, here
Whoa more to think about now: Maxeler says Intel’s Knights Ferry simplicity might not suit HPC, here, at The Inquirer. The article by Lawrence Latif has a subtitle that reads “More effort yields better performance”! I think I like the Inquirer it looks like a fancy version of the old Microprocessor Reports.
Check out the Thalesian Seminar: Stein from Bloomberg talking about CVA at the NY public Library 31 Jan 6pm
FPGAs in HFT Thalesian Seminar Slides from eigen.systems
Intel Xeon E5-2690 Sandy Bridge-EP Performance Revealed, Tom’s Hardware, here
MJ Flynn Accelerating computation with FPGAs, slides, @Berkeley video nice talk audio starts to break up around 31:00 though
Maxeler Technologies, home page
JP Morgan FPGA Careers factsheet. So, that JP Morgan credit derivative batch @300K CDS and 30K credit curves is going to run in 2 milliseconds (1000x the cheapest MacPro at the Apple Store) but I might only get 200x in practice so it will be between 2-10 milliseconds on the 40-node hybrid FPGA machine. Do I get all Virtex6 chips or is that hybrid as well?
Xilinx Virtex-7 FPGA Family looks to be the fastest 2X and 50% lower power then there is Virtex-6.
Apart from saving over a million dollars infrastructure cost annually, why bother running your credit derivative P&L/Risk batch Mac Pro+2s rather than Supercomputer+FPGAs+4m?
A. Rates, Mortgage, FX, Commodities P&L/Risk batches
It’s only the credit derivative guys in the Computerworld UK article that had this P&L and Risk batch performance problem in 2008 and now it is fixed?
B. Moore’s Law
More of a prediction than a law really, but it has predicted accurately for 40 years and is expected to hold for another decade. In technology if you do nothing more than “ride the wave” of Moore’s Law you are doing good, however, this observation’s simplicity can be deceiving. If the plan is for the credit derivative batch to “ride the wave” with thousands of grid CPUs, FPGAs, and associated staff you may have a challenge when scaling to larger problems, like counterparty valuation adjustment (CVA), that Moore’s Law will not automatically address for you. For example, Moore’s Law doesn’t get you power and machine room space for 10,000 grid CPUs for the CVA simulation. That is closer to “not riding the wave.”
C. Simplicity
Supercomputers and FPGAs, in addition to GPUs, Cuda, and CPU grids even dataflow architecture are all reasonable technology bets that could very well be the best way to ride the Moore’s Law wave through 2020. Using some or all of these technologies together in 2011 to run such a modest computation so slowly should elicit some follow up questions.
