BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up

Theta EdgeCloud Boosts LLM Speed by Splitting GPU Work

Splitting AI workloads across GPUs boosts performance keeps response times steady

  • Benchmark testing shows splitting AI workloads between separate GPUs speeds up large language model inference.
  • The approach keeps response times steady even as user prompts become much longer.
  • The architecture outperformed a commercial competitor on key metrics like first-token latency.
  • Disaggregation prevents heavy users with long queries from slowing down the system for everyone else.

The team behind Theta EdgeCloud recently completed a benchmark demonstrating a more efficient method for serving large language models. Their tests split the two distinct phases of LLM inference, prefill and decode, across separate pools of NVIDIA H200 GPUs to improve performance.

- Advertisement -

Prefill, the compute-heavy prompt processing phase, was handled on one set of hardware. Meanwhile, the memory-sensitive decode phase, which generates the response, ran on another. These pools communicated over a high-speed RDMA network link, transferring the model’s working memory between them. This architecture prevents the two workload types from competing for resources on the same GPUs.

Consequently, response times remained remarkably consistent even as prompts grew longer. For instance, the time to first token was around 783ms for a 1,000-word prompt and only 794ms for a 4,000-word prompt. This steadiness makes performance more predictable under real-world conditions where query length varies. The deployment’s results were compared to a commercial offering from Together.ai.

Under matched workloads, the EdgeCloud setup outperformed Together.ai’s serverless endpoint on first-token latency and burst performance. It also demonstrated stronger throughput under steady load. However, Together.ai held a slight edge during very long continuous tests, providing a fair comparison for potential users. This technical advancement is part of a broader effort to get more output from existing, scarce GPU supply.

✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.

- Advertisement -

Previous Articles:

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading
Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount

Latest News

Tesla’s Full Self-Driving Approved in Denmark

Tesla Inc has secured approval for its Full Self-Driving (FSD) Supervised software from the...

Florida Man Funds Bitcoin Buys via IRS Tax Payment Plan

A Florida man used his tax liability to purchase Bitcoin, opting for an IRS...

Bitcoin Serves as ‘Canary in Coal Mine’ for Risk

Bitcoin is acting as a leading indicator, signaling broader market risk-off sentiment before equities...

Meta Expands AI Data Use for Feeds, Chatbots

Meta will now use data from other businesses to personalize user feeds and AI...

Micron Stock Targets Hit $1500 on AI Chip Boom

Micron Technology's stock closed at $949.28 on June 8, 2026, up nearly 10% for...

Must Read

Top 8 Best Anonymous Web Hosting Companies That Accept Crypto

Nowadays, there is plenty of information about people online, and malicious people use them to carry out inappropriate activities. If you want to keep...
Ad
Altseason Is Loading. These 4 coins are trending right now.
SOL $92.12
DOGE $0.0950
LINK $9.02
SUI $1.02
5% off spot fees when you sign up
Start Trading