Files
gogo2/ANNOTATE/UNICODE_AND_SHAPE_FIXES.md
2025-10-31 03:14:35 +02:00

4.1 KiB

Unicode and Shape Fixes

Issues Fixed

1. Unicode Encoding Error (Windows)

Error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2713' in position 61
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 63

Cause: Windows console (cp1252 encoding) cannot display Unicode characters like ✓ (checkmark) and → (arrow)

Fix: Replaced Unicode characters with ASCII equivalents

# Before
logger.info(f"    ✓ Fetched {len(market_state['timeframes'])} primary timeframes")
logger.info(f"      → {before_count} before signal, {after_count} after signal")

# After
logger.info(f"    [OK] Fetched {len(market_state['timeframes'])} primary timeframes")
logger.info(f"      -> {before_count} before signal, {after_count} after signal")

2. BCELoss Shape Mismatch Warning

Warning:

Using a target size (torch.Size([1])) that is different to the input size (torch.Size([1, 1]))

Cause: Even though trade_success was created with shape [1, 1], the .to(device) operation in the batch processing was potentially flattening it.

Fix: Added explicit shape enforcement before BCELoss

# In train_step() method
if trade_target.dim() == 1:
    trade_target = trade_target.unsqueeze(-1)
if confidence_pred.dim() == 1:
    confidence_pred = confidence_pred.unsqueeze(-1)

# Final shape verification
if confidence_pred.shape != trade_target.shape:
    # Force reshape to match
    trade_target = trade_target.view(confidence_pred.shape)

Result: Both tensors guaranteed to have shape [batch_size, 1] before BCELoss


Training Output (Fixed)

Fetching HISTORICAL market state for ETH/USDT at 2025-10-30 19:59:00+00:00
   Primary symbol: ETH/USDT - Timeframes: ['1s', '1m', '1h', '1d']
   Secondary symbol: BTC/USDT - Timeframe: 1m
   Candles per batch: 600

   Fetching primary symbol data: ETH/USDT
       ETH/USDT 1s: 600 candles
       ETH/USDT 1m: 735 candles
       ETH/USDT 1h: 995 candles
       ETH/USDT 1d: 600 candles

   Fetching secondary symbol data: BTC/USDT (1m)
       BTC/USDT 1m: 731 candles

    [OK] Fetched 4 primary timeframes (2930 total candles)
    [OK] Fetched 1 secondary timeframes (731 total candles)

   Test case 4: ENTRY sample - LONG @ 3680.1
   Test case 4: Added 15 NO_TRADE samples (±15 candles)
      -> 0 before signal, 15 after signal

 Prepared 351 training samples from 5 test cases
   ENTRY samples: 5
   HOLD samples: 331
   EXIT samples: 0
   NO_TRADE samples: 15
   Ratio: 1:3.0 (entry:no_trade)

 Starting Transformer training...
    Converting annotation data to transformer format...
     Converted 351 samples to 9525 training batches

Files Modified

  1. ANNOTATE/core/real_training_adapter.py

    • Line 502: Changed ✓ to [OK]
    • Line 503: Changed ✓ to [OK]
    • Line 618: Changed → to ->
  2. NN/models/advanced_transformer_trading.py

    • Lines 973-991: Enhanced shape enforcement for BCELoss
    • Added explicit unsqueeze operations
    • Added final shape verification with view()

Verification

Unicode Fix

  • No more UnicodeEncodeError on Windows
  • Logs display correctly in Windows console
  • ASCII characters work on all platforms

Shape Fix

  • No more BCELoss shape mismatch warning
  • Both tensors have shape [batch_size, 1]
  • Training proceeds without warnings

Notes

Unicode in Logs

When logging on Windows, avoid these characters:

  • ✓ (U+2713) - Use [OK] or [✓] in comments only
  • ✗ (U+2717) - Use [X] or [FAIL]
  • → (U+2192) - Use ->
  • ← (U+2190) - Use <-
  • • (U+2022) - Use * or -

Tensor Shapes in PyTorch

BCELoss is strict about shapes:

  • Input and target MUST have identical shapes
  • Use .view() to force reshape if needed
  • Always verify shapes before loss calculation
  • .to(device) can sometimes change shapes unexpectedly

Summary

Fixed Unicode encoding errors for Windows compatibility
Fixed BCELoss shape mismatch warning
Training now runs cleanly without warnings
All platforms supported (Windows, Linux, macOS)