The PII Masking Benchmark evaluates models on their ability to detect and mask Personally Identifiable Information (PII) in text. Models are ranked by their PII detection character-level F2 score averaged across 3 test splits of public datasets.
Please help me improve this leaderboard by suggesting new test datasets, models, metrics, or any other improvements!