XML with Wrong Encoding Declaration
Download a free XML file that declares UTF-8 encoding in its XML declaration but actually contains ISO-8859-1 encoded characters. This encoding mismatch is a common problem with XML files from legacy systems, older databases, and Windows applications. Use it to test how your XML parser detects and handles encoding mismatches.
What Is Broken
The XML declaration says encoding="UTF-8" but the file contains bytes that are valid ISO-8859-1 but invalid UTF-8 sequences (e.g., accented characters like é, ü, ñ encoded as single bytes 0xE9, 0xFC, 0xF1).
Broken Example
<?xml version="1.0" encoding="UTF-8"?>
<contacts>
<person>
<name>José García</name> <!-- é and í are ISO-8859-1 bytes -->
<city>Zürich</city> <!-- ü is 0xFC in ISO-8859-1 -->
<note>Señor García's file</note> <!-- ñ is 0xF1 -->
</person>
</contacts>Why It Matters
Encoding mismatches cause mojibake (garbled text), parsing failures, and data corruption. They are especially common when migrating data between systems with different default encodings.
Expected Parser / Validator Behavior
Strict UTF-8 parsers should reject the document with an encoding error. Lenient parsers may attempt to auto-detect the actual encoding. Well-behaved applications should report the mismatch and suggest re-encoding.
Related Invalid Files
Related Validators & Tools
Valid Sample Files
Frequently Asked Questions
What is an encoding mismatch?
The XML declaration claims one encoding (UTF-8) but the file bytes are actually in a different encoding (ISO-8859-1). This causes parsers to misinterpret character sequences.
How do I fix encoding issues?
Either re-encode the file to match the declaration (convert to actual UTF-8), or update the declaration to match the actual encoding (change to encoding="ISO-8859-1").