Q1 vs Q2: Deep Dive

Name: AletheionGuard
Rating: 4.8 (127 reviews)
Author: AletheionGuard

Mathematical foundations, calibration metrics, and practical interpretation of aleatoric and epistemic uncertainty

Quick Overview

Q1 - Aleatoric Uncertainty

• Quantile: 25% (pessimistic)
• Measures: Data noise and ambiguity
• Reducibility: Irreducible
• Threshold: 0.35

Q2 - Epistemic Uncertainty

• Quantile: 75% (optimistic)
• Measures: Model ignorance
• Reducibility: Reducible with more data
• Threshold: 0.35

Mathematical Definition

Q1 - Quantile 25%

Q1 represents the 25th percentile of the uncertainty distribution. It's a pessimistic estimate of uncertainty.

Q1 ∈ [0, 1]

# Interpretation (calibrated):

# In 100 similar claims with Q1 = x:

# → ~25 have veracity < x

# → ~75 have veracity > x

Example

Claim: "The Earth is flat"

Q1 prediction: 0.02 (very low aleatoric uncertainty)

In 100 claims similar to this with Q1=0.02:
→ ~25 have veracity < 0.02 (very false)
→ ~75 have veracity > 0.02 (somewhat true or completely true)

Q2 - Quantile 75%

Q2 represents the 75th percentile of the uncertainty distribution. It's an optimistic estimate of confidence.

Q2 ∈ [0, 1]

Q2 = f(embeddings, Q1) # Conditioned on Q1

# Interpretation (calibrated):

# In 100 similar claims with Q2 = y:

# → ~75 have veracity < y

# → ~25 have veracity > y

Example

Claim: "Vaccines prevent diseases"

Q2 prediction: 0.92 (high epistemic confidence)

In 100 claims similar to this with Q2=0.92:
→ ~75 have veracity < 0.92
→ ~25 have veracity > 0.92 (highly accurate)

Why Q2 is Conditioned on Q1

Key Insight

Conditioning Q2 on Q1 improves calibration by 21%. When a question is very ambiguous (high Q1), the model should account for that when assessing its own knowledge (Q2).

❌ Without Conditioning

Q1 and Q2 predicted independently:

q1 = Q1Gate(embeddings)

q2 = Q2Gate(embeddings)

Problem: Q2 doesn't know about question ambiguity, leading to poor calibration.

✅ With Conditioning

Q2 conditioned on Q1:

q1 = Q1Gate(embeddings)

q2 = Q2Gate(embeddings, q1)

Benefit: Q2 adjusts based on Q1, improving calibration by 21%.

# Q2 Gate Architecture (Simplified)

class Q2Gate(nn.Module):

def __init__(self):

self.feature_net = nn.Linear(384 + 1, 256) # +1 for Q1

self.q2_head = nn.Sequential(

nn.ReLU(),

nn.Dropout(0.1),

nn.Linear(256, 1)

)

def forward(self, embeddings, q1):

combined = torch.cat([embeddings, q1.unsqueeze(1)], dim=1)

features = self.feature_net(combined)

q2 = torch.sigmoid(self.q2_head(features))

return q2

Calibration Metrics

AletheionGuard uses multiple metrics to ensure Q1 and Q2 are well-calibrated:

ECE (Expected Calibration Error)

Measures the average gap between predicted confidence and observed accuracy across bins.

ECE = (1/N) × Σ |conf_bin - acc_bin|

# Target: ECE < 0.10 (ideally < 0.08)

AletheionGuard Level 1: ECE ~0.07-0.10 (30-50% better than Level 0)

RCE (Relative Calibration Error)

Measures relative error of calibration as a percentage of observed accuracy.

RCE = |predicted_confidence - observed_accuracy| / observed_accuracy

# Target: RCE < 0.05 (5% error)

Threshold for Production: RCE < 0.05 to be considered "calibrated"

Brier Score

Mean squared difference between predicted probabilities and actual outcomes.

Brier = (1/N) × Σ (p_pred - p_true)²

# Lower is better

Used alongside RCE for comprehensive calibration assessment.

Uncertainty Correlation

Correlation between epistemic uncertainty and actual error rate.

corr = correlation(epistemic_uncertainty, actual_error)

# Target: > 0.5

Interpretation: When the model is uncertain (high Q2), it should actually be wrong more often.

Target Thresholds

Metric	Target	Level 0	Level 1
Q1 MSE	< 0.05	~0.06	~0.048
Q2 MSE	< 0.05	~0.057	~0.045
RCE	< 0.05	~0.06	~0.042
ECE	< 0.10	~0.10-0.15	~0.07-0.10
Uncertainty Corr.	> 0.5	~0.52	~0.61

Practical Interpretation

Reading Q1 Values

Q1 < 0.20

Low aleatoric uncertainty. Question is unambiguous, has a clear correct answer.

0.20 ≤ Q1 < 0.35

Moderate aleatoric uncertainty. Some ambiguity in the question or data.

Q1 ≥ 0.35

High aleatoric uncertainty. Question is ambiguous, admits multiple valid answers. Verdict: "MAYBE"

Reading Q2 Values

Q2 < 0.20

Low epistemic uncertainty. Model has strong knowledge, low hallucination risk.

0.20 ≤ Q2 < 0.35

Moderate epistemic uncertainty. Model has some knowledge but not complete confidence.

Q2 ≥ 0.35

High epistemic uncertainty. Model lacks knowledge, high hallucination risk. Verdict: "REFUSED"

Combined Interpretation

Low Q1, Low Q2: Ideal case. Clear question, model knows the answer. → ACCEPT

High Q1, Low Q2: Question is ambiguous but model has knowledge. → MAYBE (ask for clarification)

Low Q1, High Q2: Clear question but model lacks knowledge. → REFUSED (escalate to expert)

High Q1, High Q2: Ambiguous question and model lacks knowledge. → REFUSED (highest uncertainty)

Code Example

from aletheion_guard import EpistemicAuditor

auditor = EpistemicAuditor()

# Example 1: Low Q1, Low Q2

result = auditor.evaluate("Paris is the capital of France")

print(f"Q1: {result.q1:.3f}, Q2: {result.q2:.3f}")

print(f"Verdict: {result.verdict}") # ACCEPT

# Example 2: High Q1 (ambiguous)

result = auditor.evaluate("What's the capital of Netherlands?")

print(f"Q1: {result.q1:.3f}, Q2: {result.q2:.3f}")

print(f"Verdict: {result.verdict}") # MAYBE

# Example 3: High Q2 (model doesn't know)

result = auditor.evaluate("What will Bitcoin cost tomorrow?")

print(f"Q1: {result.q1:.3f}, Q2: {result.q2:.3f}")

print(f"Verdict: {result.verdict}") # REFUSED

# Access calibration info

print(f"RCE: {result.rce:.3f}")

print(f"Is calibrated: {result.calibrated}") # True if RCE < 0.05

Performance Characteristics

Latency

• Embedding: ~10ms
• Q1/Q2 inference: ~5ms
• Calibration: ~3ms
• Total: 20-30ms per response

Throughput

• Single: 50 req/sec
• Batch 32: 500 req/sec
• Batch 128: 1000+ req/sec
• Production: ~200-400 req/sec sustained

Training Loss Function

AletheionGuard Level 1 uses Pyramidal VARO loss to train Q1 and Q2 gates:

L = λ₁ × MSE(q1, q1_true) +

λ₂ × MSE(q2, q2_true) +

λ₃ × MSE(height, height_true) +

λ₄ × calibration_loss(q2, error) +

λ₅ × fractal_loss(height, sqrt(q1² + q2²))

Component Breakdown

• λ₁: Q1 accuracy weight
• λ₂: Q2 accuracy weight
• λ₃: Height regression weight
• λ₄: Calibration weight (RCE)
• λ₅: Fractal constraint weight

Typical Values

• λ₁: 1.0
• λ₂: 1.2 (slightly higher)
• λ₃: 0.8
• λ₄: 1.5 (prioritize calibration)
• λ₅: 0.5

Questions about Q1 and Q2?

Our team can help you understand uncertainty quantification for your use case

Contact Support Community Forum API Reference

Q1 vs Q2: Deep Dive

Quick Overview

Q1 - Aleatoric Uncertainty

Q2 - Epistemic Uncertainty

Mathematical Definition

Q1 - Quantile 25%

Example

Q2 - Quantile 75%

Example

Why Q2 is Conditioned on Q1

Key Insight

❌ Without Conditioning

✅ With Conditioning

Calibration Metrics

ECE (Expected Calibration Error)

RCE (Relative Calibration Error)

Brier Score

Uncertainty Correlation

Target Thresholds

Practical Interpretation

Reading Q1 Values

Reading Q2 Values

Combined Interpretation

Code Example

Performance Characteristics

Latency

Throughput

Training Loss Function

Component Breakdown

Typical Values

Next Steps

📐 Pyramid Architecture

⚖️ Verdict System

💻 Code Examples

🎮 Try Interactive Demo

Questions about Q1 and Q2?