XML Data Processing
Learn to read and process XML data in Python
XML Data Processing
What is XML?
XML stands for eXtensible Markup Language. It's a text format for storing structured data using tags.
What XML looks like:
<person>
<name>John</name>
<age>25</age>
<city>New York</city>
</person>Where XML is used:
- Configuration files
- Data exchange between systems
- Web services (SOAP APIs)
- Office documents (docx, xlsx)
- RSS feeds
XML vs JSON:
- XML: More verbose, tags, used in older systems
- JSON: Simpler, lighter, modern APIs
The xml.etree.ElementTree Module
Python has built-in XML support.
import xml.etree.ElementTree as ETWhat this does: Imports XML parser with shorter name (ET).
Reading XML String
import xml.etree.ElementTree as ET
xml_string = """
<person>
<name>John</name>
<age>25</age>
<city>New York</city>
</person>
"""
root = ET.fromstring(xml_string)
print("Tag:", root.tag)
print("Name:", root.find("name").text)
print("Age:", root.find("age").text)What this does:
- fromstring() parses XML text
- root is the top element
- find() locates child elements
- .text gets content inside tags
Reading XML File
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
print("Root tag:", root.tag)
for child in root:
print("Child:", child.tag, "Value:", child.text)What this does:
- parse() reads XML file
- getroot() gets top element
- Loops through child elements
Finding Elements
Find First Match
import xml.etree.ElementTree as ET
tree = ET.parse("students.xml")
root = tree.getroot()
first_student = root.find("student")
name = first_student.find("name").text
print("First student:", name)What find() does: Returns first element that matches tag name.
Find All Matches
import xml.etree.ElementTree as ET
tree = ET.parse("students.xml")
root = tree.getroot()
students = root.findall("student")
for student in students:
name = student.find("name").text
grade = student.find("grade").text
print("Student:", name, "Grade:", grade)What findall() does: Returns list of all matching elements.
Reading XML Attributes
XML tags can have attributes.
XML with attributes:
<student id="1" status="active">
<name>John</name>
</student>Reading attributes:
import xml.etree.ElementTree as ET
tree = ET.parse("students.xml")
root = tree.getroot()
for student in root.findall("student"):
student_id = student.get("id")
status = student.get("status")
name = student.find("name").text
print("ID:", student_id)
print("Status:", status)
print("Name:", name)
print()What .get() does: Gets attribute value from element.
Nested XML
XML can have multiple levels.
Example XML:
<school>
<classroom>
<student>
<name>John</name>
<subjects>
<subject>Math</subject>
<subject>Science</subject>
</subjects>
</student>
</classroom>
</school>Reading nested data:
import xml.etree.ElementTree as ET
tree = ET.parse("school.xml")
root = tree.getroot()
classroom = root.find("classroom")
student = classroom.find("student")
name = student.find("name").text
print("Student:", name)
print("Subjects:")
subjects = student.find("subjects")
for subject in subjects.findall("subject"):
print("-", subject.text)What this does: Navigates through multiple levels to get data.
Using XPath
XPath is a powerful way to find elements.
import xml.etree.ElementTree as ET
tree = ET.parse("students.xml")
root = tree.getroot()
names = root.findall(".//name")
for name in names:
print(name.text)What .// means: Find all elements with this tag anywhere in the tree.
More XPath examples:
root.findall("./student")
root.findall("./student/name")
root.findall(".//student[@status='active']")XPath patterns:
- . current element
- .. parent element
- .// all descendants
- [@attr='value'] filter by attribute
Creating XML
Build XML from Python.
import xml.etree.ElementTree as ET
root = ET.Element("students")
student1 = ET.SubElement(root, "student")
student1.set("id", "1")
name1 = ET.SubElement(student1, "name")
name1.text = "John"
age1 = ET.SubElement(student1, "age")
age1.text = "20"
tree = ET.ElementTree(root)
tree.write("output.xml", encoding="utf-8", xml_declaration=True)
print("XML file created")What this creates:
<?xml version='1.0' encoding='utf-8'?>
<students>
<student id="1">
<name>John</name>
<age>20</age>
</student>
</students>Practice Example
The scenario: Process product catalog XML file.
Example XML (products.xml):
<catalog>
<product id="1" category="Electronics">
<name>Laptop</name>
<price>999.99</price>
<stock>5</stock>
</product>
<product id="2" category="Electronics">
<name>Phone</name>
<price>599.99</price>
<stock>10</stock>
</product>
<product id="3" category="Accessories">
<name>Mouse</name>
<price>25.99</price>
<stock>50</stock>
</product>
</catalog>Python program:
import xml.etree.ElementTree as ET
tree = ET.parse("products.xml")
root = tree.getroot()
print("Product Catalog")
print("=" * 40)
total_value = 0
product_count = 0
for product in root.findall("product"):
product_id = product.get("id")
category = product.get("category")
name = product.find("name").text
price = float(product.find("price").text)
stock = int(product.find("stock").text)
value = price * stock
total_value = total_value + value
product_count = product_count + 1
print("Product ID:", product_id)
print("Name:", name)
print("Category:", category)
print("Price:", price)
print("Stock:", stock)
print("Value:", value)
print()
print("=" * 40)
print("Total products:", product_count)
print("Total inventory value:", total_value)
electronics = root.findall(".//product[@category='Electronics']")
print("Electronics count:", len(electronics))What this program does:
- Parses XML file
- Loops through all products
- Extracts attributes and child elements
- Calculates inventory value
- Uses XPath to filter by category
Converting XML to Dictionary
import xml.etree.ElementTree as ET
tree = ET.parse("student.xml")
root = tree.getroot()
student_dict = {}
for child in root:
student_dict[child.tag] = child.text
print(student_dict)What this creates: {'name': 'John', 'age': '25', 'city': 'New York'}
Key Points to Remember
XML uses tags to structure data. Tags come in pairs: opening and closing.
ET.parse() reads XML files, ET.fromstring() reads XML strings.
find() gets first match, findall() gets all matches. Use .text to get content, .get() for attributes.
XPath (.// pattern) helps find elements anywhere in tree.
XML is more verbose than JSON but still widely used in enterprise systems.
Common Mistakes
Mistake 1: Forgetting .text
name = root.find("name") # This is element object
name = root.find("name").text # This is actual textMistake 2: Wrong method
students = root.find("student") # Only gets first one
students = root.findall("student") # Gets allMistake 3: Not checking if element exists
name = root.find("name").text # Error if name doesn't exist!Better:
name_element = root.find("name")
if name_element is not None:
name = name_element.textWhat's Next?
You now know XML basics. Next, you'll learn about Introduction to APIs - how to connect your Python programs to web services and get data from the internet.