Need some background numbers for L3 latency on random page faults? See SiSoftware, here. To think about how fast you can parse arcadirect packets we need to get some cache latencies, these folks provide them for Sandy Bridge silicon as follows: L1 3 clocks, L2 11 clocks, Sequential L3 12-14 clocks, random L3 28-38 clocks. So if RDMS is loading the arcadirect payload directly into L3 , then the L3 to L1 payload feed is not going to fall under the full random latency category, sequential seems more likely. You are going to drop 20+ clocks (5+ ns) to get the first payload into L1. Thereafter the L1 loads are going to be overlapped with parsing the previous payload with standard fetch ahead. Unless you can decompress and parse in under 5+ ns the L1 payload feed latency will be fully pipelined. The packet size itself must be bound by ~2K bytes since it’s a 10Gbs line and Algo-Logic/DB are telling you they can process a payload a microsecond. So the current payload and the next payload fit in L1 directly if necessary.
Then this really becomes a throughput exercise since at the arca open you are going to flirt with peak packet transmission levels for anywhere from 20 seconds to 20 minutes, doesn’t really matter. If the peak and near peak transmission was 1-2 seconds in duration you would still think of this as more of a throughput exercise than a low latency exercise, the bursts go on “forever” comparatively. The main complications, as Leber et.al. point out in HFT acceleration using FPGAs, seems to include the decompression and this:
“The FAST protocol is applied by the feed handler to transfer pricing information to the market participants. To reduce the overhead, multiple FAST messages are encapsulated in one UDP frame. These messages do not contain any size information nor do they define a framing which aggravates decoding. Instead, each message is defined by a template which needs to be known in advance to be able to decode the stream. Most feed handlers define their own FAST protocol by providing independent template specifications. Care has to be taken as a single decoding mistake requires dropping the entire UDP frame. Templates define a set of fields, sequences and groups, where groups are a set of fields that can only occur once and sequences are a set of fields that can occur multiple times.”
See also IBM, Subramoni et.al., Streaming, Low-latency Communication in On-line Trading Systems, here.