Skip to content

Data Validation

DataValidator

Validates OHLCV DataFrames for quality issues.

from replaybt import DataValidator

validator = DataValidator(df, symbol="ETH", timeframe="1m")
issues = validator.validate()
print(validator.report())

Constructor

DataValidator(
    df: pd.DataFrame,
    symbol: str = "",
    timeframe: str = "1m",
)

Methods

Method Returns Description
validate() List[DataIssue] Run all checks
report() str Formatted report

Checks

Check Severity Description
Missing columns ERROR OHLCV columns not present
Duplicate timestamps ERROR Same timestamp twice
Time gaps WARNING Missing bars
OHLC violations WARNING High < Low, etc.
Negative values ERROR Negative prices/volume
Zero values WARNING Zero prices

DataIssue

A single data quality issue.

Field Type Description
severity str "ERROR", "WARNING", "INFO"
check str Issue type identifier
message str Human-readable description
row_index Optional[int] Row where issue occurs
timestamp Optional[datetime] Timestamp of the row

ValidatedProvider

Wraps a provider and validates bars on the fly.

from replaybt import ValidatedProvider

data = ValidatedProvider(
    inner=CSVProvider("ETH_1m.csv", symbol_name="ETH"),
    stop_on_error=False,
)
Parameter Type Default Description
inner DataProvider required Provider to validate
symbol str "" Symbol name
timeframe str "1m" Expected timeframe
stop_on_error bool False Raise on first ERROR

Convenience Functions

from replaybt import validate_dataframe, validate_provider

# Validate DataFrame
issues = validate_dataframe(df, symbol="ETH", timeframe="1m")

# Validate provider
issues = validate_provider(my_provider, symbol="ETH")