Troubleshooting Guide - Common Issues & Solutions¶

Solutions to common problems when working with Dagster in honey-duck.

Table of Contents¶

Asset Errors
Dependency Issues
Data Validation Errors
IO Manager Issues
Performance Issues
UI/Server Issues
Environment Issues

Asset Errors¶

Error: "DagsterInvalidDefinitionError: Cannot annotate context parameter"¶

# WRONG
@dg.asset
def my_asset(context: dg.AssetExecutionContext) -> dict:
    pass

Problem: Some Dagster decorators don't allow typed context parameters.

Solution:

# For regular assets - use type annotation
@dg.asset
def my_asset(context: dg.AssetExecutionContext) -> pl.DataFrame:
    pass

# For dynamic assets - remove type annotation
@dg.asset(deps=["other_asset"])
def notification_asset(context) -> dict:  # ← No type annotation
    pass

Error: "Asset 'sales_transform' has no materializations"¶

Problem: You're trying to use an asset that hasn't been materialized yet.

Solution:

# Materialize the missing asset first
uv run dg launch --assets sales_transform

# Or materialize with all dependencies
uv run dg launch --assets sales_output --select +sales_output

In UI: Click "Materialize" button on the asset page.

Error: "No such asset: 'sales_data'"¶

# WRONG - Parameter name doesn't match asset name
@dg.asset
def my_output(context, sales_data: pl.DataFrame):  # Looking for 'sales_data'
    pass

Problem: Parameter name must exactly match the asset name.

Solution:

# CORRECT - Parameter matches asset name
@dg.asset
def my_output(context, sales_transform: pl.DataFrame):  # Matches 'sales_transform'
    pass

Check available assets:

uv run dg list defs

Dependency Issues¶

Error: "Circular dependency detected"¶

Problem: Assets depend on each other in a loop:

A → B → C → A  # ← Circular!

Solution:

Find the cycle:
```
uv run dg list defs --show-deps
```

Break the cycle by refactoring:

# Instead of:
# A depends on B
# B depends on A  ← Circular!

# Do:
# C = shared logic
# A depends on C
# B depends on C

Error: "ImportError: cannot import name 'my_asset'"¶

Problem: Missing import or circular import.

Solution:

Check the import:

# In definitions.py
from honey_duck.defs.polars.my_assets import my_asset  # ← Verify file/folder

Avoid circular imports:

# WRONG - Don't import definitions in asset files
from honey_duck.defs.definitions import defs

# CORRECT - Only import utilities
from honey_duck.defs.shared.resources import PathsResource

Verify file exists:

ls src/honey_duck/defs/polars/my_assets.py

Data Validation Errors¶

Error: "MissingTableError: Table 'sales_raw' not found in schema 'raw'"¶

Problem: Harvest data hasn't been loaded or table name is wrong.

Solution:

Check if harvest data exists:
```
ls data/output/dlt/harvest_parquet/
```

Materialize harvest assets first:

uv run dg launch --assets dlt_harvest_sales

Check table names:

from cogapp_libs.dagster import read_harvest_tables_lazy

# This validates table exists and shows available tables in error message
tables = read_harvest_tables_lazy(
    paths.harvest_dir,
    ("sales_raw", []),  # ← Will list available if wrong
    asset_name="my_asset",
)

Error message shows available tables:

MissingTableError: Table 'sales_raw' not found.
Available tables: ['artworks_raw', 'artists_raw', 'media']

Error: "MissingColumnError: Column 'sale_price' not found in table 'sales_raw'"¶

Problem: Column name is wrong or doesn't exist in data.

Solution:

Check actual columns:

uv run python -c "
import polars as pl
df = pl.read_parquet('data/output/dlt/harvest_parquet/sales_raw.parquet')
print(df.columns)
"

Fix column name:

# Error says available columns
MissingColumnError: Column 'sale_price' not found.
Available columns: ['sale_id', 'sale_price_usd', 'artwork_id']

# Use correct name
("sales_raw", ["sale_price_usd"])  # ← Not 'sale_price'

Use helper validation:

from cogapp_libs.dagster import read_harvest_tables_lazy

# Validates columns exist
tables = read_harvest_tables_lazy(
    paths.harvest_dir,
    ("sales_raw", ["sale_id", "sale_price_usd"]),  # ← Auto-validated
    asset_name="my_asset",
)

Error: "ColumnNotFoundError" (Polars)¶

# Polars error during transformation
polars.exceptions.ColumnNotFoundError: sale_price

Problem: Typo in column name or column was dropped in earlier operation.

Solution:

Add debug logging:

@dg.asset
def my_asset(context: dg.AssetExecutionContext):
    result = load_data()

    # Debug: print columns
    context.log.info(f"Columns available: {result.columns}")

    # Now use correct column
    result = result.filter(pl.col("sale_price_usd") > 100)
    return result

Check for typos:

# Common typos:
pl.col("sale_price")      # Missing _usd
pl.col("sale_price_usd")  # Correct

IO Manager Issues¶

Error: "Failed to pickle DataFrame"¶

Problem: Trying to use wrong IO manager for data type.

Solution:

Ensure asset returns Polars DataFrame for PolarsParquetIOManager:

@dg.asset
def my_asset(context) -> pl.DataFrame:  # ← Polars, not Pandas
    # If using DuckDB
    result = conn.sql("SELECT * FROM table").pl()  # ← Use .pl() not .df()
    return result

Error: "FileNotFoundError: No such file or directory: 'data/output/storage/...'"¶

Problem: IO manager trying to load asset that doesn't exist.

Solution:

Materialize upstream asset:

uv run dg launch --assets upstream_asset

Or materialize with dependencies:

uv run dg launch --assets my_asset --select +my_asset

Check file system:

# Check if asset files exist
ls -R data/output/storage/

Performance Issues¶

Issue: "Asset taking too long to materialize"¶

Symptoms: Asset runs for minutes/hours.

Solutions:

Use lazy evaluation (Polars):

# SLOW - Collects too early
df = pl.read_parquet(path).collect()  # ← Loads all
df = df.filter(pl.col("price") > 1000)  # Works on full dataset

# FAST - Push down filters
df = pl.scan_parquet(path)  # ← Lazy
df = df.filter(pl.col("price") > 1000)  # Filter planned
df = df.collect()  # ← Only loads filtered data

Profile your code:

from cogapp_libs.dagster import track_timing

@dg.asset
def my_asset(context):
    with track_timing(context, "loading"):
        data = load_data()

    with track_timing(context, "transforming"):
        result = transform(data)

    # Check metadata to see which part is slow
    return result

Check data size:

@dg.asset
def my_asset(context):
    result = transform()

    context.log.info(f"Result size: {len(result):,} rows")
    context.log.info(f"Memory: {result.estimated_size('mb'):.2f} MB")

    return result

Issue: "Out of memory errors"¶

Solutions:

Stream data with lazy operations:

# Instead of loading entire file
df = pl.scan_parquet(path).filter(...).collect()

Process in batches:

@dg.asset
def process_large_data(context):
    total_rows = 0

    for batch in pl.read_parquet(path, use_pyarrow=True, n_rows=10000):
        processed = transform(batch)
        total_rows += len(processed)

    context.log.info(f"Processed {total_rows:,} rows")

Use DuckDB for large datasets:

# DuckDB streams results
conn.sql("""
    SELECT * FROM 'large_file.parquet'
    WHERE price > 1000
""").pl()  # Only loads filtered results

UI/Server Issues¶

Error: "Failed to connect to Dagster"¶

Problem: Dagster server not running or wrong port.

Solution:

Start the server:
```
uv run dg dev
```

Check the port:

# Default port
http://localhost:3000

# Custom port
uv run dg dev --port 3001
http://localhost:3001

Check process:

# Is Dagster running?
ps aux | grep dagster

# Kill stuck process
pkill -f dagster

Error: "Code location failed to load"¶

Problem: Python syntax error or import error in definitions.

Solution:

Check error in UI: Look at code location error message.

Test import directly:

uv run python -c "from honey_duck.defs.definitions import defs"

Common causes:

# Missing comma
assets=[
    asset1
    asset2  # Missing comma
]

# Indentation error
@dg.asset
def my_asset(context):
result = ...  # Wrong indentation

# Circular import
from honey_duck.defs.definitions import defs  # In asset file

Issue: "UI is slow or unresponsive"¶

Solutions:

Clear browser cache: Ctrl+Shift+R (Chrome/Firefox)
Restart Dagster:
```
pkill -f dagster
uv run dg dev
```

Check for too many runs:

# Delete old runs
uv run dagster run delete --all -j my_job

Environment Issues¶

Error: "ModuleNotFoundError: No module named 'polars'"¶

Problem: Dependencies not installed.

Solution:

# Install dependencies
uv sync

# Verify installation
uv run python -c "import polars; print(polars.__version__)"

Error: "Permission denied" when writing files¶

Problem: No write permissions to output directory.

Solution:

Check permissions:
```
ls -la data/output/
```
Fix permissions:
```
chmod -R u+w data/output/
```

Or create directory:

mkdir -p data/output/json data/output/storage data/output/dlt

Error: "Database is locked" (DuckDB)¶

Problem: Multiple processes trying to write to DuckDB at once.

Solution:

In honey-duck, output assets have deps=["other_output"] to enforce ordering:

@dg.asset(
    deps=["artworks_output"],  # ← Wait for artworks_output first
)
def sales_output(context, sales_transform: pl.DataFrame):
    # Won't run until artworks_output completes
    pass

Why: DuckDB only allows one writer at a time.

Getting More Help¶

Enable Debug Logging¶

# Set log level
export DAGSTER_CLI_LOG_LEVEL=DEBUG
uv run dg dev

# Or in code
@dg.asset
def my_asset(context):
    context.log.set_level(logging.DEBUG)
    context.log.debug("Detailed debug info")

Check Dagster Logs¶

# View recent run logs
uv run dagster run logs <run_id>

# Server logs when running dg dev
# → Check terminal output

Validate Your Code¶

# Check Python syntax
uv run python -m py_compile src/honey_duck/defs/polars/my_assets.py

# Check imports
uv run python -c "from honey_duck.defs.definitions import defs"

# Run tests
uv run pytest -xvs

Still Stuck?¶

Check Dagster Docs: https://docs.dagster.io
Check honey-duck docs: Getting Started
View example implementations: src/honey_duck/defs/polars/assets.py
Read error messages carefully: They often include helpful suggestions!

Common Error Message Patterns¶

Pattern 1: "Expected X, got Y"¶

→ Type mismatch: Check return type annotations

Pattern 2: "Asset 'X' not found"¶

→ Typo or missing import: Check asset name spelling

Pattern 3: "Table/Column 'X' not found"¶

→ Data validation error: Use helper functions for auto-validation

Pattern 4: "Failed to load"¶

→ Missing materialization: Materialize upstream assets first

Pattern 5: "Circular dependency"¶

→ Design issue: Refactor to break the cycle

Pro Tip: Most errors have helpful messages. Read them carefully -- they often tell you exactly what's wrong.