You are currently browsing the tag archive for the ‘Computer Arch’ tag.

Business Insider, Presenting: The 101 Best Finance People To Follow on Twitter, here. Not so many FinQuants on the list. Nick Firoozye makes it for Nomura Fixed Income Strategy/Macro. Salmon, DeLong, and Zerohedge are buried in there.

Slashdot, Ellison Doesn’t Know If Java Is Free, here. Well … is anything really Free, when you think about it? Must have sucked to be Ellison, for like 5 seconds.

EETimes, Intel says 25% of shipments will be on 22-nm in Q2, here. Probably want to be on 22nm silicon before end of 2012.

Wall Street and Technology, Call-Out To Regulators: Banks Just Can’t Get Enough Stress Testing, here. This should be lifetime employment for a bunch of people, like the Maxeler FPGA RTL hackers.

“There appears to be room for improvement at virtually every firm, and at some firms the amount of work needed is still significant,” Fed Governor Daniel Tarullo said last week. However, almost all the North American respondents of Sybase’s survey had major reservations over the efficiency of current stress tests. 

Mathbabe, Reputational risk is insufficient for ratings agencies, here.  Mostly ok argument with one odd exception.

I see the need for ratings agencies – it’s a way of crowd sourcing due diligence, which makes sense, but only if we can trust the ratings agencies as an impartial third party. And I don’t want to seem like someone who doesn’t have faith in humanity, but my trust isn’t won by a system of perverse incentives that has already failed. Let’s just say I have hope for humanity but I also acknowledge our weaknesses.

Usually mathbabe has this stuff thought through reasonably carefully, maybe I am missing something. The idea that we need the rating agencies for crowd sourcing due diligence on a CDO tranche is curious. Doesn’t the spread of the CDO tranche already do the crowd sourcing due diligence in a somewhat classical fashion? Effectively the rating agency sets a second price (AAA, AA, B, etc.), in addition to the market price, for the CDO tranche. We had AAA senior tranches returning 40 bps over USDLIBOR, remember? That’s the part where some guy on the Blue Collar Comedy Tour, I forget which one, would say “Here’s your sign.” There may be some virtue to having some hysteresis in setting the second price through the rating agency, but I suspect the argument is more subtle than “crowd sourcing.”

Actually now that I think about it, given mathbabe’s math background, her argument is sort of lame. She could crush this. The interesting technical problem faced by the rating agencies is the online nature of the process where they are asked to issue a series of ratings on collections of previously rated securities and tranches defined over collections of previously rated securities. These ratings may lag the market price movement of the underlying previously rated securities. How should the rating agency determine ratings so the levels are consistent and minimize the arbitrage available as a consequence of their ratings? The reputation risk insufficiency argument is sort of nonsense if the rating agency does not even have an adequate optimization framework in place to give some sort of hint why everyone is throwing cash at them, lifting their tee-shirts over their heads, and yelling ” The Arb is on!” It’s like debating an undergraduate’s reputation risk for claiming P=NP after the first day of computer algorithms for artists class.

Ritholtz, The Shiller Duality, here. Come on DynaFair! One Economist’s Mission to redeem the Field of Finance, here.

DeLong, Economics 1: Origins of the Financial Crisis and the Downturn, slides, here.

Techlaze, Richard Stallman To Launch His Own Fashion Line, here.

After making massive neckbeards a universal style statement, Richard Stallman, revered software freedom activist and computer programmer, has turned to the world of fashion. His new collection, titled RMS, is a luxury line of clothing specially designed for geeks and programmers. The launch follows on the success of GNU, a fragrance which he created and distributed for free. 

HPC Wire, Latest FPGAs Show Big Gains in Floating Point Performance, here. This isn’t totally crazy, the idea of pushing some floating point off to an FPGA, but there will need to be more adult supervision than is indicated by the article. Who is going to say no to more floating point instruction execution units?  It is good to see some Virtex-7 numbers.  The theoretical peak results are impressive but are perhaps better for finding applications that run well on FPGAs than finding FPGA configurations that run specific applications well.

I get the idea that FPGAs are going to offer greater flexibility in selecting the precision of the desired arithmetic – a good thing for optimizers, but again check the specific numerical behavior of your quant model, implementation, and application. I’m guessing Kahan and Higham have books full of optimized floating point precision stories filled with tragedy and the stains of end-user’s dried tears.  James Gosling is quoted in 1998 famously ” 95% of the folks out there are completely clueless about floating point.”  This of course highlights the value of programmers who at the very  least, know what they don’t know.

Tom’s Hardware,  Ivy Bridge CPU Torndown, Photographed, Tri-Gates Revealed, here. May 4 -18 release.

Tom’s Hardware, Does Your SSD’s File System Affect Performance? here. 380+ MBs writes.

IEEE Spectrum, The World According to DARPA. here. Dugen goes to google.

Wall Street and Technology, Prime Brokerages Consolidate After “Big Bang”, here.

In 2009, hedge funds with over $3 billion in assets had an average of 4.8 prime brokers, according to Tabb Group. Funds typically had just one broker before the crisis. But that disaggregation is reversing, falling to 3.9 in 2010 and to 2.9 brokers in 2011.

After the financial crisis, lower-tier players such as Deutsche Bank AG and Citigroup Inc tried – and succeeded – to take market share from the traditional duopoly of Morgan Stanley and Goldman Sachs, while a number of smaller primes ramped up their operations.

That jockeying for position came ahead of an expected renaissance that has so far had mixed blessings for the industry.

Hedge fund assets, which peaked at $2.2 trillion in 2007, rebounded dramatically after the financial crisis. They fell to $1.4 trillion in 2008, but are now around $2 trillion.

But successive crises and volatile, correlated markets have sapped investor confidence, hitting trading, which in turn is hurting prime brokerages. Average daily U.S. trading volume so far in 2012 is 6.83 billion shares, down from 7.84 billion in 2011.

HFT Throwdown: Bloomberg, High-Speed Trading Is Progress, Not Piracy, here and Ritholtz/Saluzzi, HFT Pirates And Their Academic Friends, here. SEC Concept Release, here.

SUMMARY: The Securities and Exchange Commission (“Commission”) is conducting a broad review of the current equity market structure. The review includes an evaluation of equity market structure performance in recent years and an assessment of whether market structure rules have kept pace with, among other things, changes in trading technology and practices. To help further its review, the Commission is publishing this concept release to invite public comment on a wide range of market structure issues, including high frequency trading, order routing, market data linkages, and undisplayed, or “dark,” liquidity. The Commission intends to use the public’s comments to help determine whether regulatory initiatives to improve the current equity market structure are needed and, if so, the specific nature of such initiatives.

Zerohedge, here. nothing like Father Ted – Down with This Sort of Thing.

Intel AVX, here. Speculating that any competitive fast x86 code for FIX Protocol binary message coding and decoding will need Advanced Vector Extensions (AVX). AVX gives you SIMD execution on 256 bit registers and extends the SSE instruction set. Like anything else in the performance optimization world, the issue will be to figure out how to get the ICC optimizer to generate the AVX executable code that performs at competitive speeds on the FIX Protocol running, in this case, on Sandy Bridge.

Intel, Intel® 64 and IA-32 Architectures Software Developer’s Manual, here. Volume 1 Basic Architecture.

FIX Protocol Tutorial, here.

Tech Power Up, Overclocking Ivy Bridge, here. Folks overclock Intel Core i7-3770K to ~6.9GHz and cool w liquid nitrogen.

Take a look again at the Flynn case for Dataflow in his Maxeler video, here. I think the argument that is playing out on the video is predicated on the observation that  Moore’s Law ceased delivering extra clock cycles in about  2002 for big commercial microprocessors. Let’s accept that as true, even though I think if I go check, David Patterson will denote that “no more clock cycles for you” crossover time to sometime more like 2006. In the absence of any subsequent breakthroughs in Instruction Level Parallelism harvesting by compilers, figuratively then Dataflow architecture has a window of opportunity to demonstrate its advantages to the market unlike any previous opportunity that Arvind would have experienced in the  80s and 90s. OK?

The other argument that Flynn plays out is that compiler assisted parallel coding is challenging and typically has not demonstrated scaling parallel speed ups beyond 8 or so conventional cores. Again fine, fits with my experience from SGI compilers to now, I have no issue. Parallel programming with pragmas is no way to write code for fun and profit.

The jump to “so therefore dataflow is good for a bunch of general purpose mathmatics” needs further justification, maybe it is I don’t know. In particular financial applications like: closed form expression evaluation, discounting, lattices, and Monte Carlo for big portfolios of trades/positions, all hit the memory hierarchy slightly differently. The vectorization opportunities also vary in degree between these financial mathematics applications. I’m not sure about this but I think there were science experiments conducted somewhere uptown that showed you can teach dolphins PERL and Python and they can parallelize these position inventory calculations to speed, just don’t let them try to do load balancing because that will delay the project.

Perhaps Dataflow has some large performance advantage in 2012 but the costs of converting to FPGAs or waiting for the latest FGPA Xilinx parts in your supercomputer more than compensates for the advantage. Certainly you do not expect a massive popular dataflow movement where all your friends, holding copies of Dataflow for Dummies, flock about you to find out how to program dataflow on their phone apps. That’s not happening. Best case you are going to get a cool dataflow programming result that will make you a hero in the dataflow community but as a reward you will have to simply smile inwardly to yourself knowingly, because really no one else will ever know what you did or what you are talking about.  What about debugging the production infrastructure and code, doesn’t that  kill you inside a little bit every time you  think about it? So, maybe it’s just me but this Dataflow architecture performance advantage better be pretty big for each of these specific financial computations to account for all the obvious really bad stuff.

Taking Flynn’s argument on face value I would guess there is cross over time for Dataflow architecture to maintain a significant performance advantage over off-the-shelf architecture, assuming everything else stays the same. I would think the cross over time will be closer to the time we see volume production of 16 multicore microprocessor chips, 2015?  But that assumes nothing else  changes. Big assumption.

Wall Street Journalhere Maxeler Makes Waves with Dataflow Design

HPCWirehere  J.P. Morgan Deploys Maxeler Dataflow Supercomputer for Fixed Income Trading

Peter Cherasia, Head of Markets Strategies at J.P. Morgan, commented: “With the new Maxeler technology, J.P. Morgan’s trading businesses can now compute orders of magnitude more quickly, making it possible to improve our understanding and control of the profile of our complex trading risk.”

insideHPChere  JP Morgan Fires Up Maxeler FPGA Super

Compute Scotlandhere Near realtime: JP Morgan & Maxeler

Asymco analysis, here, highlights recent trends in computing. It looks increasingly as if you need to adapt to what the commercial market is giving you,  even in floating point. I suspect that Joe and Suzy Sixpack will decide how you get your fp cycles. GPUs start to look more attractive in this light,right?

Oh and Terry Tao on Black Scholes, here

“Sandy Bridge-EP” Xeon E5 processors and their related “Romley” server platforms, are now in volume shipment, here

Overclocking insurance for Sandy Bridge from Intel, here

Whoa more to think about now: Maxeler says Intel’s Knights Ferry simplicity might not suit HPC, here, at The Inquirer. The article by Lawrence Latif has a  subtitle that reads “More effort yields better performance”!  I think I like the Inquirer it looks like a fancy version of the old Microprocessor Reports.

Check out the Thalesian Seminar: Stein from Bloomberg talking about CVA at the NY public Library 31 Jan 6pm

FPGAs in HFT Thalesian Seminar Slides from eigen.systems

Intel Xeon E5-2690 Sandy Bridge-EP Performance Revealed, Tom’s Hardware, here

MJ Flynn Accelerating computation with FPGAs, slides, @Berkeley video nice talk audio starts to break up around 31:00 though

Maxeler Technologies, home page

JP Morgan FPGA Careers factsheet. So, that JP Morgan credit derivative batch @300K CDS and 30K credit curves is going to run in 2 milliseconds (1000x the cheapest MacPro at the Apple Store)  but I might only get 200x in practice so it will be between 2-10 milliseconds on the 40-node hybrid FPGA machine. Do I get all Virtex6 chips or is that hybrid as well?

Xilinx Virtex-7 FPGA Family looks to be the fastest 2X and 50% lower power then there is Virtex-6.

Apart from saving over a million dollars infrastructure cost annually, why bother running your credit derivative P&L/Risk batch Mac Pro+2s rather than Supercomputer+FPGAs+4m?

A. Rates, Mortgage, FX, Commodities P&L/Risk batches

It’s only the credit derivative guys in the Computerworld UK article that had this P&L and Risk batch performance problem in 2008 and now it is fixed?

B. Moore’s Law

“The number of transistors incorporated in a chip will approximately double every 24 months.”  —Gordon Moore, Intel Co-Founder

More of a prediction than a law really, but it has predicted accurately for 40 years and is expected to hold for another decade.  In technology if you do nothing more than “ride the wave” of Moore’s Law you are doing good, however, this observation’s simplicity can be deceiving. If the plan is for the credit derivative batch to “ride the wave” with thousands of grid CPUs, FPGAs, and associated staff you may have a challenge when scaling to larger problems, like counterparty valuation adjustment (CVA), that Moore’s Law will not automatically address for you. For example, Moore’s Law doesn’t get you power and machine room space for 10,000 grid CPUs for the CVA simulation.   That is closer to “not riding the wave.”

C. Simplicity

Supercomputers and FPGAs, in addition to GPUs, Cuda, and CPU grids even dataflow architecture are all reasonable technology bets that could very well be the best way to ride the Moore’s Law wave through 2020. Using some or all of these technologies together in 2011 to run such a modest computation so slowly should elicit some follow up questions.

Graydon Carter interview with Michael Lewis, here. No pressure just Tom Wolfe and Gay Talese in the audience.

Tyler Cowen: Simulations and the Fermi paradox, here

Superlinear convergence of the Secant Method, here.

Freeman Dyson interview, here.

Fast FFT in MIT News, here.

Felix Salmon on Fact Checking, here. Does the previous Credit Derivative post get classified as fact checking?

So, a major broker-dealer gets a 2011 industry award for running their overnight Risk and P&L credit derivative batch though a 1000 node CPU grid and some FPGAs in 238 seconds (in 2008 the same computation and same broker-dealer but presumably different CDS inventory took 8 hours, a great success). Then some blog posting claims that this same credit derivative batch could be run with some optimized C++ code on a $2500 Mac Pro in under 2 seconds IEEE 754 double precision tied out to the precision of the inputs. What’s going on? Does the credit derivative batch require a $1,000,000 CPU grid, FPGAs, and 238 seconds or one $2,500 MacPro, some optimized code, and 2 seconds?

It’s the MacPro + 2 seconds very likely, let’s think how this could be wrong:

A. The disclosed data is materially wrong or we have misinterpreted it egregiously. The unusual event in all this is the public disclosure of the broker-dealer’s runtime experience and the code they were running. It is exceedingly rare to see such a public disclosure.  That said, the 8 hour production credit derivative batch at a broker-dealer in 2008 is not the least-bit surprising.  The disclosure itself tells you the production code was once bloated enough to be optimized from 8 hours to 4 minutes. You think they nailed the code optimization on an absolute basis when the 4 minutes result was enough to  get a 2011 industry award, really? The part about the production infrastructure being a supercomputer with thousands of grid CPUs and FPGAs, while consistent with other production processes we have seen and heard of running on the Street, is the part you really have to hope is not true.

B. The several hundred thousand position credit derivative batch could be Synthetic CDO tranches and standard tranches and require slightly more complicated computation than the batch of default swaps assumed in the previous analysis.  But all the correlation traders (folks that trade CDOs and tranches) we know in 2011 were back to trading vanilla credit derivatives and bonds. The credit derivative batch with active risk flowing through in 2011 is the default swap batch (you can include index protection like CDX and ITRX in this batch as well). Who is going to spend three years improving the overnight process on a correlation credit derivative book that is managed part time by a junior trader with instructions to take on zero risk? No one.

C. The ISDA code just analyzed is not likely to be the same as the 2011 award winning production credit derivative batch code. In fact, we know portions of the production credit derivative batch were translated into FPGA circuitry, so the code is real different, right? Well over the last decade of CDS trading most of the broker-dealers evolved to the same quantitative methodology for valuing default swaps. Standards for converting upfront fees to spreads (ISDA) and unwind fees (Bloomberg’s CDSW screen) have influenced broker-dealer CDS valuation methodology. We do not personally know exactly what quantitative analytics each one of the of the broker-dealers runs in 2011, but Jarrow-Turnbull default arrival and Brent’s method for credit curve cooking covers a non-trivial subset of the broker-dealers. The ISDA code is not likely to be vastly different from the production code the other broker-dealers use in terms of quantitative method. Of course, in any shop there could be some undisclosed quantitative tweaks included in production and the MacPro + 2 seconds analysis case would be exposed in that event.

D. The computational performance analysis just presented could be flawed. We have only thought about this since seeing the Computerworld UK article and spent a couple weekends working out the estimate details. We could have made a mistake or missed something. But even if we are off by a factor of 100 in our estimates (we are not) its still $2500 + MacPro + 200 seconds versus $1,000,000 + 1000 CPU+ FPGAs+238 seconds.

Follow

Get every new post delivered to your Inbox.

Join 79 other followers