You are currently browsing the tag archive for the ‘Supercomputers’ tag.
First, some background: Under Dodd-Frank, the CFTC was given the task of regulating the $300 trillion market for swaps in the U.S. The basic point was to bring light to a dark market and prevent another AIG by pushing as much of the over-the-counter swaps market as possible onto exchanges where prices and volume are posted. With about 80 percent of those swaps rules written,according to CFTC Chairman Gary Gensler, and a bunch of them now in effect, traders have begun “futurizing their swaps”—that is, trading futures contracts instead of entering into swaps deals. Some say that’s a clever way around Dodd-Frank. Others see it as merely a natural evolution of financial instruments.
Whatever the reason, it’s happening. And as arcane as the details may be, the potential consequences are enormous, as evidenced by Thursday’s packed house. The general consensus of those present was that Thursday was the most crowded CFTC hearing in recent memory. Lawyers and lobbyists lined the walls; congressional staffers and industry suits packed the chairs. More than 150 people crammed into the CFTC’s main conference room, and a healthy number of folks watched on TVs in the hallway outside.
Dodd-Frank has upended the derivatives market, and in the shakeout that follows, there will be winners and losers. Perhaps those with the most at stake areIntercontinentalExchange (ICE) and the Chicago Mercantile Exchange (CME), the two biggest futures exchanges in the U.S. As more companies and traders start favoring futures over swaps, the two exchanges stand to capture a much bigger portion of that activity. The potential losers? Dealers such as Goldman Sachs (GS) that have done a lot of swaps business. Standing at the back of the room, Chris Giancarlo, chair of the Wholesale Markets Brokers’ Association, likened the fight over swaps and futures to “the Maginot Line for the exchanges.”
Easley, de Prado, & O’Hara, SSRN, The Volume Clock: Insights into the High Frequency Paradigm, here. 2 LFT structural weaknesses
Over the last two centuries, technological advantages have allowed some traders to be faster than others. We argue that, contrary to popular perception, speed is not the defining characteristic that sets High Frequency Trading (HFT) apart. HFT is the natural evolution of a new trading paradigm that is characterized by strategic decisions made in a volume-clock metric. Even if the speed advantage disappears, HFT will evolve to continue exploiting Low Frequency Trading’s (LFT) structural weaknesses. However, LFT practitioners are not defenseless against HFT players, and we offer options that can help them survive and adapt to this new environment.
Nerval’s Lobster, Mars Rover Curiosity: Less Brain Power Than Apple’s iPhone 5, here. 3. So, remember that 8 hour to 238 second 2011 award winning Credit valuation and risk computation on the million dollar FPGA supercomputer with the Dataflow acceleration? That’s the Apollo Creed here, on a good day. There are days, (e.g., July 27, 2018 57.6 million km distance between Mars and Earth) when you can send the entire credit portfolio to Mars, compute the entire Risk and Valuation for the portfolio in the down time on the spare computer in the Mars Rover, then send the results back to Earth, and finish in ~360 seconds. That’s just about 50% slower than the 2011 award winning Credit valuation and risk computation on the million dollar FPGA supercomputer with the Dataflow acceleration. So, the message here is, I guess, if your computing infrastructure is on Mars … and has less brain power than an iPhone5 … then you are probably not going to be at the very top of the USD fixed/float Vanilla Swap League tables … on most days, but …if you own an iPhone5 here on earth … you have more brain power … than the 2011 award winning Credit valuation and risk computation on the million dollar FPGA supercomputer with the Dataflow acceleration?
“To give the Mars Rover Curiosity the brains she needs to operate took 5 million lines of code. And while the Mars Science Laboratory team froze the code a year before the roaming laboratory landed on August 5, they kept sending software updates to the spacecraft during its 253-day, 352 million-mile flight. In its belly, Curiosity has two computers, a primary and a backup. Fun fact: Apple’s iPhone 5 has more processing power than this one-eyed explorer. ‘You’re carrying more processing power in your pocket than Curiosity,’ Ben Cichy, chief flight software engineer, told an audience at this year’s MacWorld.”
Craig’s List, I will legally change my name to yours for a WWDC ticket, here. I like how Gruber posts stuff that induces Karl Denninger (here) to call a market top on AAPL, Gruber records it in Claim Chowder on Daring Fireball, and then Gruber spikes the unfortunate Karl Denninger 6 months later. It’s like who killed Kenny in South Park. I am worried however that John Gruber is just an alias for Karl Denninger, which would make the world a smaller, less predictable, and meaner place, so I won’t think about that.
Turing’s Invisible Hand, I grade grad AI, here. Nice slides from the course.
This semester I have been co-teaching (with the awesome Martial Hebert) CMU’sgraduate artificial intelligence (grad AI) course. It’s been a lot of fun teaching AI to a class where a significant fraction of the students build robots for a living (possibly some of the students are robots, especially the ones who got a perfect score on the midterm exam). Although the last class is on May 2, I already gave my last lecture, so this seems like a good time to share some thoughts.
My general impression is that many AI courses try to cover all of AI, broadly defined. Granted, you get Renaissance students who can write “hello world” in Prolog while reciting the advantages and disadvantages of iterative deepening depth-first search. On the down side, breadth comes at the expense of depth, and the students inevitably get the very wrong impression that AI is a shallow field. Another issue is that AI is so broad that some if its subdisciplines are only loosely related, and in particular someone specializing in, say, algorithmic economics, may not be passionate about teaching, say, logic (to give a completely hypothetical example).
Business Insider, BLANKFEIN: The Only Reason Goldman Got Into Trouble Is Because Our Competitors Sucked At Risk Management, here.
DealBreaker, Marvel At The Derivative On Its Derivatives That Credit Suisse Wrote To Itself, here. This looks like the mezz tranche CS awarded for end of year compensation a couple years back. I stopped reading Deal Breaker for a while, but Levine has been very solid recently.
Business Week, Stock Trading Is About to Get 5.2 Milliseconds Faster, here. 59.6 milliseconds NYC to Lon roundtrip latency.
HPC Wire, Intel Makes a Deal for Cray’s Interconnect Technology, here. So Cray wants out of the interconnect hardware business.
Supercomputer maker Cray is methodically and inevitably shifting its technology focus from hardware to software. Another step in that direction played itself out this week in the company’s sale of its highly treasured supercomputing interconnect technology. On Tuesday evening, Cray and Intel announced that they signed a “definitive agreement” that would transfer the interconnect program and expertise to the x86 chipmaker.
Cluster Monkey, Cluster Interconnects, here.
This article will focus on interconnects that aren’t tied to vendor specific node hardware, but can work in a variety of cluster nodes. While determining which interconnect to use is beyond the scope of this article, I can present what is available and make some basic comparisons. I’ll present information that I have obtained from the vendors websites, from information people have posted to the beowulf mailing list, the vendors, and various other places. I won’t make any judgments or conclusions about the various options because, simply, I can’t. The choice of an interconnect depends on your situation and there is no universal solution. I also intend to stay “vendor neutral” but will make observations where appropriate. Finally, I have created a table that presents various performance aspects of the interconnects. There is also a table with list prices for 8 nodes, 24 nodes, and 128 nodes to give you an idea of costs.
ExtremeTech, What can you do with a supercomputer? here. I think there may be a stronger result in here. If you are not doing Weather Forecasting, Nuke simulations, Crypto, Airplane Design, or Molecular Dynamics simulation the thing you are computing on probably is not a supercomputer. If you need to discount cashflows off a curve, even if they are contingent cashflows, and your programmers tell you they need a supercomputer here’s what you do. Ask the programmers if they are doing: Weather Forecasting, Nuke simulations, Crypto, Airplane Design, or Molecular Dynamics if the answer is no, you do not need a supercomputer. If the answer is yes, the problem that you have is bigger than any supercomputer is going to address in 2012.
London Whale: DealBreaker, JPMorgan’s Voldemort Probably Isn’t That Magical, here; and More Voldemort v. Volker, here. Salmon, Bruno Iksil and the CHIPS trade, here. Pollack/Alphaville, Thar she blows! here. Big market moving off-the-run index (series 9 HG CDX on-the-run when the Credit Crisis hit) trades by one of only a handful of US Corporate Loan sources is going to attract attention. Folks need to process the idea that the information distribution could be even more asymmetric than in the past (even with Volker in place) e.g., default risk in the CDS market maybe less than commonly expected if your US Corporate Loan folks are standing by to offer credit to the distressed. From Pollack:
Typically the most liquid tenors for credit indices are the 5-year and the 10-year. The 5-year contract for the CDX.NA.IG.9 will mature in December 2012, the 10-year in December 2017. This index is no spring chicken.
So why the recent increase in volume?
There are some perfectly logical reasons for the 9s to have high volume in general, so the first part of the above chart makes sense to us. This is because it was the last pre-full-blown-crisis index. It became awash in liquidity at the time by virtue of being the on-the-run index, and then the world came crashing down around it. Subsequent indices didn’t find their feet in the same way.
The 9s were used to build structured products; they were used to hedge other more customised structured products; and so on. It also had the last active set of tranches traded with reference to it. All of which leads us to think that a large amount of net notional for this index relative to others isn’t that weird.
But then… it increased like crazy in January of this year, and that’s the weird part. What went on in 2012? A whale swam in…?
At some point we are going to get a glimpse into how the Maxeler FPGA implementation of the Gaussian Copula for the JPM Credit batch fits into this story. Doubt that the $100bn notional CDX positions are marked against anything other than the standard default swap model with some specially quoted spreads. If there is something going on in series 9 tranches though then maybe you do optimize the Gaussian Copula code to run on a
Mac Pro in 20FPGA Supercomputer + 1000 grid nodes in 238 seconds.
Success in high-performance computing (HPC) is often difficult to measure. Ultimately it depends on your goals and budget. Many casual practitioners assume that a good HPL (High Performance Linpack) benchmark and a few aisles of servers demonstrate a successful HPC installation. However, this notion could not be further from the truth, unless your goal is to build a multimillion dollar HPL machine. In reality, a successful and productive HPC effort requires serious planning and design before purchasing hardware. Indeed, the entire process requires integration skills that often extend far beyond those of a typical data center administrator.
The multivendor nature of today’s HPC market necessitates that the practitioner make many key decisions. In this paper, I outline several common pitfalls and how they might be avoided.
US Regulations: NYT/DealBook, Regulators to Ease a Rule on Derivatives Dealers, here; Business Insider, ANDREW LO: Thanks To The JOBS Act, Hedge Funds Will Be Able To Use Ads To Trick Prospective Investors, here; Huff Post, SEC Fails To Monitor More Than Half Of Stock Trading, Former Agency Lawyers Say, here.
I’m a former high frequency trader. And following the tradition of G.H. Hardy, I feel the need to make an apology for my former profession. Not an apology in the sense of a request for forgiveness of wrongs performed, but merely an intellectual justification of a field which is often misunderstood.
In this blog post, I’ll attempt to explain the basics of how high frequency trading works and why traders attempt to improve their latency. In future blog posts, I’ll attempt to justify the social value of HFT (under some circumstances), and describe other circumstances under which it is not very useful. Eventually I’ll even put forward a policy prescription which I believe could cause HFT to focus primarily on socially valuable activities.
At Yale, where I have been teaching for 25 years, I’ve been hearing a great deal lately from my students about financial innovations linked to social media. One such innovation, called crowdfunding, is embedded in the jobs bill signed into law by President Obama on Thursday. The idea involves Web sites that help many investors contribute small amounts of capital to projects that they read about online, and that might otherwise be starved for money.
I dismissed this the first time I read it, but now it seems sort of plausible. You connect Kickstarter and an HFT platform or something like Almgren’s quantitativebrokers , the Kickstarter project is like a fixed for floating swap. For example,
I will pay 100 USD to be used as Liquidity premia to facilitate order execution between 9:35am and 9:36am on ARCA SPY trading platform XYZ for 12 Apr 2012
and in return
I receive 0.85% of the net arbitrage profits for trading platform XYZ between 9:36am and 9:37am on Arca SPY for 12 Apr 2012.
Roll through all the obvious parameterizations for different examples of swap/projects: different contracts/tickers; different exchanges/venues; different trading platforms; got your micros, millis, seconds, minutes; anytime during the trading day; pay in EUR and trade in USD etc. etc. Build multiple-leg structures to execute synchronously with limit order placement. Buy protection against specific corporate actions, Federal reserve announcements, or Foreign Central bank news. Choose from the extensive list of prepackaged market microstructure strategies, or create your own. Oh, and one more thing, you can synchronize your execution strategies to the microsecond with your friends to bring added liquidity to new markets with complete privacy and confidentiality. Track your project P&L and backtest though the free iPad/iPhone/Android app. Project orders booked by COB T-1, project settlement T+3 though PayPal.
Get the trade documentation and the execution levels right to establish par value for the swap, who says no? Technically, very feasible. Trade docs not a problem. Regulatory, get on board the Shiller 25 years @ Yale, democracy, JOBS, and risk mitigation train. The crowd funded project needs a name … ok “DynaFair” until something better comes along.
The way Shiller thinks about this, it’s all about risk — finance, after all, is all about mitigating risk. The more people who can invest in companies (especially start-ups) the more risk is spread around.
Not only that, but it means more people will have a vested interest in getting to understand Wall Street — “The faults in institutions that contributed to the recent financial crisis can be corrected,” said Shiller, “but only by those who are willing to get inside the mechanism and get to know it. “
NYT, Economix Blog, here.
Pingdom, WordPress Completely dominates top 100 blogs, here. Look at the List of the top 100 blogs, Blodget at 22 and Durden at 27 wow they both beat Funny or Die.
HPC Wire, The Processors of Petascale, here.
The odd one out is the PowerXCell 8i, a Cell processor variant IBM designed for HPC duty. The PowerXCell 8i debuted in the world’s first petaflop system, Roadrunner, which was booted up in 2008. That system employed the processor as an accelerator alongside AMD Opteron CPUs, using IBM’s QS22 blade server. Each PowerXCell 8i consists of a Power Processing Element and eight Synergistic Processing Elements (SPEs) that together deliver 102.4 gigaflops — a whopping amount for a 2008-era processor.
Unfortunately for IBM, GPU computing was coming onto the scene just as the PowerXCell 8i was debuting. Thanks to a concerted effort by NVIDIA to build GPU accelerators for HPC, not to mention a more favorable business model for graphics processors, IBM decided to abandon any follow-up to the PowerXCell 8i(At one point IBM was said to have a 32-SPE PowerXCell on the drawing board.) Roadrunner would be the first and last Cell processor-based supercomputer.
Even though the PowerXCell design died an untimely death, there are some new processors on the horizon to takes its place. The most well known is Intel’s Many Integrated Core (MIC) chip, a manycore x86 architecture based on the “Larrabee” prototype. The first implementation, known as Knights Corner, is expected to deliver more than a teraflop of double precision floating point performance when its introduced in late 2012 or early 2013.
Noahpinion, Thursday Roundup, here.
11. Barkley Rosser pulpifies Robert Samuelson. This is not difficult; Robert Samuelson is one of the worst economics writers in existence. He is also one of the most highly paid. (However, this is not surprising, from a physics standpoint…if Knowledge = Power, and Time = Money, and Power = Work/Time, then Money ~ 1/Knowledge…)
Wired, Code Not Physical Property, Court Rules in Goldman Sachs Espionage Case, here.
Princeton University Quant, slides for talks, here. Almgren’s slides are here.
Wall Street Journal – here Maxeler Makes Waves with Dataflow Design
HPCWire – here J.P. Morgan Deploys Maxeler Dataflow Supercomputer for Fixed Income Trading
Peter Cherasia, Head of Markets Strategies at J.P. Morgan, commented: “With the new Maxeler technology, J.P. Morgan’s trading businesses can now compute orders of magnitude more quickly, making it possible to improve our understanding and control of the profile of our complex trading risk.”
insideHPC – here JP Morgan Fires Up Maxeler FPGA Super
Compute Scotland – here Near realtime: JP Morgan & Maxeler
So, a major broker-dealer gets a 2011 industry award for running their overnight Risk and P&L credit derivative batch though a 1000 node CPU grid and some FPGAs in 238 seconds (in 2008 the same computation and same broker-dealer but presumably different CDS inventory took 8 hours, a great success). Then some blog posting claims that this same credit derivative batch could be run with some optimized C++ code on a $2500 Mac Pro in under 2 seconds IEEE 754 double precision tied out to the precision of the inputs. What’s going on? Does the credit derivative batch require a $1,000,000 CPU grid, FPGAs, and 238 seconds or one $2,500 MacPro, some optimized code, and 2 seconds?
It’s the MacPro + 2 seconds very likely, let’s think how this could be wrong:
A. The disclosed data is materially wrong or we have misinterpreted it egregiously. The unusual event in all this is the public disclosure of the broker-dealer’s runtime experience and the code they were running. It is exceedingly rare to see such a public disclosure. That said, the 8 hour production credit derivative batch at a broker-dealer in 2008 is not the least-bit surprising. The disclosure itself tells you the production code was once bloated enough to be optimized from 8 hours to 4 minutes. You think they nailed the code optimization on an absolute basis when the 4 minutes result was enough to get a 2011 industry award, really? The part about the production infrastructure being a supercomputer with thousands of grid CPUs and FPGAs, while consistent with other production processes we have seen and heard of running on the Street, is the part you really have to hope is not true.
B. The several hundred thousand position credit derivative batch could be Synthetic CDO tranches and standard tranches and require slightly more complicated computation than the batch of default swaps assumed in the previous analysis. But all the correlation traders (folks that trade CDOs and tranches) we know in 2011 were back to trading vanilla credit derivatives and bonds. The credit derivative batch with active risk flowing through in 2011 is the default swap batch (you can include index protection like CDX and ITRX in this batch as well). Who is going to spend three years improving the overnight process on a correlation credit derivative book that is managed part time by a junior trader with instructions to take on zero risk? No one.
C. The ISDA code just analyzed is not likely to be the same as the 2011 award winning production credit derivative batch code. In fact, we know portions of the production credit derivative batch were translated into FPGA circuitry, so the code is real different, right? Well over the last decade of CDS trading most of the broker-dealers evolved to the same quantitative methodology for valuing default swaps. Standards for converting upfront fees to spreads (ISDA) and unwind fees (Bloomberg’s CDSW screen) have influenced broker-dealer CDS valuation methodology. We do not personally know exactly what quantitative analytics each one of the of the broker-dealers runs in 2011, but Jarrow-Turnbull default arrival and Brent’s method for credit curve cooking covers a non-trivial subset of the broker-dealers. The ISDA code is not likely to be vastly different from the production code the other broker-dealers use in terms of quantitative method. Of course, in any shop there could be some undisclosed quantitative tweaks included in production and the MacPro + 2 seconds analysis case would be exposed in that event.
D. The computational performance analysis just presented could be flawed. We have only thought about this since seeing the Computerworld UK article and spent a couple weekends working out the estimate details. We could have made a mistake or missed something. But even if we are off by a factor of 100 in our estimates (we are not) its still $2500 + MacPro + 200 seconds versus $1,000,000 + 1000 CPU+ FPGAs+238 seconds.
Innovative FPGA project dramatically sped up the bank’s risk analysis processes
JP Morgan has won an award for an IT deployment that enabled its supercomputer to run risk analyses in near real-time.
Earlier this year, the bank revealed how it reduced the time it took to run an end-of-day risk calculationfrom eight hours down to just 238 seconds.
This was enabled by the implementation of an application-led, High Performance Computing (HPC) system based on Field-Programmable Gate Array (FPGA) technology, developed with HPC solutions provider Maxeler Technologies.
It was this FPGA technology deployment that won the prize for ‘Most Cutting Edge IT Initiative’ at the American Financial Technology Awards 2011 this week.
JP Morgan beat off competition from finalists Royal Bank of Scotland (RBS) and the IntercontinentalExchange (ICE) to win the award.
RBS and ICE had submitted mobile applications, RBSMobile and ICE Mobile, respectively, which provided market information, analysis and trading capabilities to users in real-time.
So, if I am not mistaken this is the same credit derivative overnight batch as in the 11-Jul Computerworld UK article authored by Anh Nguyen. Over a three year project a major broker-dealer converted their credit derivative analytics batch from C++ to Java so it could be run through a Java compiler that would burn Field Programmable Gate Arrays employing a Dataflow Archtecture (see How it All Began a dataflow retrospective , Jack Dennis) into a supercomputer that (in addition to a pool of several thousand CPUs) can execute end-of-day P&L and risk for several hundred thousand credit derivatives in 238 seconds. The FPGAs are idle for all but 12 seconds of the 238 second runtime and it was the FPGA technology deployment that won the prize for “Most Cutting Edge IT Initiative” at the American Financial Technology Awards 2011.
The world’s largest genome sequencing center once needed four days to analyze data describing a human genome. Now it needs just six hours.
The trick is servers built with graphics chips — the sort of processors that were originally designed to draw images on your personal computer. They’re called graphics processing units, or GPUs — a term coined by chip giant Nvidia. This fall, BGI — a mega lab headquartered in Shenzhen, China — switched to servers that use GPUs built by Nvidia, and this slashed its genome analysis time by more than an order of magnitude.