Advanced

Earnings Guidance Delta

Trade the change in management language, not just the EPS headline

Why This Exists

Most earnings reactions focus on beat or miss. Better alpha often comes from the difference between what management says now and what it said last quarter. GPT-style extraction is useful because it can compare tone, detail, and guidance language at scale across many names without collapsing everything into a naive positive/negative score.

Where the Edge Lives

The market prices the headline first. It takes longer to price the revision path. A company can beat estimates and still be a weak setup if demand language deteriorates, gross margin commentary softens, or management dodges follow-up questions. Another company can miss slightly but become a strong long if guidance language materially improves and analysts are still anchored to the old narrative.

This is why sophisticated earnings strategies focus less on "beat or miss" and more on what changed in the expected future path.

  • Guidance drift: is the company quietly derisking the next quarter?
  • Tone asymmetry: are prepared remarks upbeat while Q&A turns cautious?
  • Revision potential: will analysts raise or cut estimates after reading the call?
  • Cross-sectional ranking: which names improved most relative to peers?

Signal Components

A robust signal compares four things:

  • Current guidance versus prior guidance
  • Current language versus prior language
  • Management framing versus sell-side consensus framing
  • Prepared remarks versus Q&A tone and detail

High-Value Language Categories

  • Revenue outlook: acceleration, deceleration, funnel strength, bookings, pipeline quality
  • Margins: pricing power, promotional activity, mix shift, freight, labor, input costs
  • Demand texture: enterprise optimization, macro caution, elongating sales cycles, channel inventory
  • Capital allocation: buybacks, capex, hiring, restructuring, project timing
  • Inventory and backlog: normalization can be bullish or bearish depending on industry

Structured Extraction Schema

Force outputs into a versioned schema so they are testable and comparable.

{
  "revenue_outlook_delta": -2 to 2,
  "margin_outlook_delta": -2 to 2,
  "capex_intensity_delta": -2 to 2,
  "demand_tone_delta": -2 to 2,
  "inventory_commentary": "improving | stable | worsening",
  "prepared_remarks_score": -2 to 2,
  "qa_score": -2 to 2,
  "management_credibility": 0 to 1,
  "confidence": 0 to 1,
  "evidence_spans": ["quote 1", "quote 2"]
}

Why Q&A Deserves Its Own Field

Prepared remarks are curated. Q&A reveals pressure points. If management repeats the same script but becomes evasive when asked about order timing, inventory, channel health, or promotional intensity, that matters more than generic optimism in the opening remarks.

Ranking and Portfolio Logic

A good ranking engine blends text extraction with market context. The text model proposes candidates. The ranking model decides whether the trade is still worth taking.

Basic Long Ranking Inputs

  • Strong positive guidance delta
  • Improved Q&A score
  • Positive or improving estimate revisions
  • High relative volume
  • Gap not so large that the edge is already consumed

Basic Short Ranking Inputs

  • Deteriorating demand or margin language
  • Weak Q&A versus prepared remarks
  • Consensus still too optimistic
  • Bearish setup hidden behind an optical beat

Simple Composite Score

Earnings Alpha =
    0.30 * guidance_delta
  + 0.20 * qa_delta
  + 0.15 * margin_delta
  + 0.15 * demand_delta
  + 0.10 * estimate_revision_trend
  + 0.10 * relative_volume_confirmation

The important part is not the exact weights. The important part is that you cross-sectionally rank names against peers in the same earnings window. This works far better as a top-decile / bottom-decile system than as isolated single-name conviction.

Execution Rules

The most common mistake is entering too quickly and paying for the entire signal in the overnight gap.

Practical Hold Rules

Entry: day 1 close or day 2 open after earnings
Hold: 5 to 15 trading days
Exit: signal decay, revision reversal, or stop based on ATR / gap fill
Portfolio: top 5 longs, top 5 shorts, sector-normalized

What to Avoid

  • Parabolic gaps where the best part of the move is gone
  • Low-liquidity names where slippage overwhelms expected edge
  • Companies with strong prepared remarks but obviously weak Q&A
  • Names where sector peers and suppliers contradict management optimism

The cleanest implementation is usually to enter after the first reaction, then harvest revision drift over the next one to three weeks.

Validation Framework

  • Test 1-day, 5-day, 10-day, and 20-day forward returns
  • Sector-normalize every ranking
  • Separate large gaps from modest gaps
  • Track hit rate, average win, average loss, and turnover
  • Replay prior calls whenever the prompt changes

One useful robustness test is to compare the model against simple baselines such as headline sentiment, estimate revision changes, and raw post-earnings drift. If the GPT extraction layer does not add information beyond those baselines, it is complexity without alpha.

Where People Blow Up

  • Headline overfitting: the model keys off phrases like "strong demand" without checking whether guidance actually changed
  • No baseline: you must compare to the prior call, not just score one transcript in isolation
  • No sector normalization: semiconductor and staples language have different meanings and volatility regimes
  • No execution discipline: chasing overnight gaps can consume the entire edge
  • No version control: if prompt changes alter rankings materially, you do not have a stable process

Best Use Case

This works best as a cross-sectional ranking system around earnings season, not as a single-stock oracle. The alpha usually comes from relative ranking, estimate drift, and disciplined execution after the first headline move.