Advanced

Earnings Guidance Delta

Trade the change in management language, not just the EPS headline

⏱️ 40 min read Post-Earnings Drift Text Revision Alpha

Why This Exists

Most earnings reactions focus on beat or miss. Better alpha often comes from the difference between what management says now and what it said last quarter. GPT-style extraction is useful because it can compare tone, detail, and guidance language at scale across many names without collapsing everything into a naive positive/negative score.

Where the Edge Lives
Signal Components
Structured Extraction Schema
Ranking and Portfolio Logic
Execution Rules
Validation Framework
Where People Blow Up

Where the Edge Lives

The market prices the headline first. It takes longer to price the revision path. A company can beat estimates and still be a weak setup if demand language deteriorates, gross margin commentary softens, or management dodges follow-up questions. Another company can miss slightly but become a strong long if guidance language materially improves and analysts are still anchored to the old narrative.

This is why sophisticated earnings strategies focus less on "beat or miss" and more on what changed in the expected future path.

Guidance drift: is the company quietly derisking the next quarter?
Tone asymmetry: are prepared remarks upbeat while Q&A turns cautious?
Revision potential: will analysts raise or cut estimates after reading the call?
Cross-sectional ranking: which names improved most relative to peers?

Signal Components

A robust signal compares four things:

Current guidance versus prior guidance
Current language versus prior language
Management framing versus sell-side consensus framing
Prepared remarks versus Q&A tone and detail

High-Value Language Categories

Revenue outlook: acceleration, deceleration, funnel strength, bookings, pipeline quality
Margins: pricing power, promotional activity, mix shift, freight, labor, input costs
Demand texture: enterprise optimization, macro caution, elongating sales cycles, channel inventory
Capital allocation: buybacks, capex, hiring, restructuring, project timing
Inventory and backlog: normalization can be bullish or bearish depending on industry

Structured Extraction Schema

Force outputs into a versioned schema so they are testable and comparable.

{
  "revenue_outlook_delta": -2 to 2,
  "margin_outlook_delta": -2 to 2,
  "capex_intensity_delta": -2 to 2,
  "demand_tone_delta": -2 to 2,
  "inventory_commentary": "improving | stable | worsening",
  "prepared_remarks_score": -2 to 2,
  "qa_score": -2 to 2,
  "management_credibility": 0 to 1,
  "confidence": 0 to 1,
  "evidence_spans": ["quote 1", "quote 2"]
}

Why Q&A Deserves Its Own Field

Prepared remarks are curated. Q&A reveals pressure points. If management repeats the same script but becomes evasive when asked about order timing, inventory, channel health, or promotional intensity, that matters more than generic optimism in the opening remarks.

Ranking and Portfolio Logic

A good ranking engine blends text extraction with market context. The text model proposes candidates. The ranking model decides whether the trade is still worth taking.

Basic Long Ranking Inputs

Strong positive guidance delta
Improved Q&A score
Positive or improving estimate revisions
High relative volume
Gap not so large that the edge is already consumed

Basic Short Ranking Inputs

Deteriorating demand or margin language
Weak Q&A versus prepared remarks
Consensus still too optimistic
Bearish setup hidden behind an optical beat

Simple Composite Score

Earnings Alpha =
    0.30 * guidance_delta
  + 0.20 * qa_delta
  + 0.15 * margin_delta
  + 0.15 * demand_delta
  + 0.10 * estimate_revision_trend
  + 0.10 * relative_volume_confirmation

The important part is not the exact weights. The important part is that you cross-sectionally rank names against peers in the same earnings window. This works far better as a top-decile / bottom-decile system than as isolated single-name conviction.

Execution Rules

The most common mistake is entering too quickly and paying for the entire signal in the overnight gap.

Practical Hold Rules

Entry: day 1 close or day 2 open after earnings
Hold: 5 to 15 trading days
Exit: signal decay, revision reversal, or stop based on ATR / gap fill
Portfolio: top 5 longs, top 5 shorts, sector-normalized

What to Avoid

Parabolic gaps where the best part of the move is gone
Low-liquidity names where slippage overwhelms expected edge
Companies with strong prepared remarks but obviously weak Q&A
Names where sector peers and suppliers contradict management optimism

The cleanest implementation is usually to enter after the first reaction, then harvest revision drift over the next one to three weeks.

Validation Framework

Test 1-day, 5-day, 10-day, and 20-day forward returns
Sector-normalize every ranking
Separate large gaps from modest gaps
Track hit rate, average win, average loss, and turnover
Replay prior calls whenever the prompt changes

One useful robustness test is to compare the model against simple baselines such as headline sentiment, estimate revision changes, and raw post-earnings drift. If the GPT extraction layer does not add information beyond those baselines, it is complexity without alpha.

Where People Blow Up

Headline overfitting: the model keys off phrases like "strong demand" without checking whether guidance actually changed
No baseline: you must compare to the prior call, not just score one transcript in isolation
No sector normalization: semiconductor and staples language have different meanings and volatility regimes
No execution discipline: chasing overnight gaps can consume the entire edge
No version control: if prompt changes alter rankings materially, you do not have a stable process

Best Use Case

This works best as a cross-sectional ranking system around earnings season, not as a single-stock oracle. The alpha usually comes from relative ranking, estimate drift, and disciplined execution after the first headline move.