How to parse xml using python

May 15, 2022 Python 2 Comments

Spread the love

Last updated:10th June, 2022

How to parse xml using python

Python is a modern programming language that developers use to perform variety of tasks. Parsing XML file is a feature of Python often used. In this tutorial, we will learn to parse XML using python. In a previous tutorials, we learned to parse XML using nodejs and php. Python can also be used to join wav audio files using python and Upload files using Python and Flask.

Following tasks are performed in this tutorial:

1. Create books XML file

2. Install Python and Pip

3. Install lxml package

4. Create an XML script to parse XML file

5. Run the script

First, create a project folder named parse-xml-file-using-python.

1. Create books XML file

Open the Microsoft Books XML URL. Copy the XML and save it on your project folder parse-xml-file-using-python.

2. Install Python and Pip

To install Python, Visit Download Python page and install on your system.

Install Pip on Windows

Pip is a package management system for python package and libraries. Execute the command to download the get-pip.py file.

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

To install pip, execute command below.

python get-pip.py

To verify Pip installation, type command below.

pip help

3. Install lxml package

lxml package is used to parse XML files, to install this package, type the command below on command line.

pip install lxml

4. Create an XML script to parse xml file

Open the project folder, create file parse-xml-using-python.py. Add code below into it.

from lxml import etree

def parseBookXML(xmlFile):
    with open(xmlFile) as fobj:
        xml = fobj.read()
    root = etree.fromstring(xml)
    book_dict = {}
    books = []

    for book in root.getchildren():
        for elem in book.getchildren():
            if elem.text:
                text = elem.text
            else:
                text = ''
            if elem.tag == 'title':
                book_dict[elem.tag] = text
            if elem.tag == 'genre':
                book_dict[elem.tag] = text
            if elem.tag == 'price':
                book_dict[elem.tag] = text
            if elem.tag == 'publish_date':
                book_dict[elem.tag] = text
            if elem.tag == 'description':
                book_dict[elem.tag] = text
            if elem.tag == 'author':
                book_dict[elem.tag] = text
        if book.tag == "book":
            books.append(book_dict)
            book_dict = {}
    return books

books_list = parseBookXML("books.xml")

print(books_list)

In the first line etree is imported from lxml package. Next, a function parseBookXML, and it accepts a parameter xmlFile. Inside function, using with open method, the XML file is opened as fobj.

The root element of XML file is fetched using etree’s fromstring method. The book_dict object and books array is defined.

Inside Books root element, there are multiple books elements. The book element is fetched using root.getchildren method. Each book element has child elements.

The child element of book are returned by book.getChildren. The text element tag is assigned to text variable if its value exists.

Next, if element tag is an title, genre, publish_date, description, or author, text is appended to books dictionary. The books dictionary to appended to the books array.

In the last, books list is returned. The parseBookXML method is called and parsed books list is returned.

5. Running the script

In order to run the script, Open command line. Navigate to the project folder, and type command.

python .\parse-xml-using-python.py

This command will output, the parsed xml below.

[

{'author': 'Gambardella, Matthew', 'title': "XML Developer's Guide", 'genre': 'Computer', 'price': '44.95', 'publish_date': '2000-10-01', 'description': 'An in-depth look at creating applications \n      with XML.'},
    
{'author': 'Ralls, Kim', 'title': 'Midnight Rain', 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2000-12-16', 'description': 'A former architect battles corporate zombies, \n      an evil sorceress, and her own childhood to become queen \n      of the world.'}, 

{'author': 'Corets, Eva', 'title': 'Maeve Ascendant', 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2000-11-17', 'description': 'After the collapse of a nanotechnology \n      society in England, the young survivors lay the \n      foundation for a new society.'}, 

{'author': 'Corets, Eva', 'title': "Oberon's Legacy", 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2001-03-10', 'description': 'In post-apocalypse England, the mysterious \n      agent known only as Oberon helps to create a new life \n      for the inhabitants of London. Sequel to Maeve \n      Ascendant.'}, 

{'author': 'Corets, Eva', 'title': 'The Sundered Grail', 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2001-09-10', 'description': "The two daughters of Maeve, half-sisters, \n      battle one another for control of England. Sequel to \n      Oberon's Legacy."}, 

{'author': 'Randall, Cynthia', 'title': 'Lover Birds', 'genre': 'Romance', 'price': '4.95', 'publish_date': '2000-09-02', 'description': 'When Carla meets Paul at an ornithology \n      conference, tempers fly as feathers get ruffled.'}, 
   
{'author': 'Thurman, Paula', 'title': 'Splish Splash', 'genre': 'Romance', 'price': '4.95', 'publish_date': '2000-11-02', 'description': 'A deep sea diver finds true love twenty \n      thousand leagues beneath the sea.'}, 
  
{'author': 'Knorr, Stefan', 'title': 'Creepy Crawlies', 'genre': 'Horror', 'price': '4.95', 'publish_date':'2000-12-06', 'description': 'An anthology of horror stories about roaches,\n      centipedes, scorpions  and other insects.'}, 
  
{'author': 'Kress, Peter', 'title': 'Paradox Lost', 'genre': 'Science Fiction', 'price': '6.95', 'publish_date': '2000-11-02', 'description': 'After an inadvertant trip through a Heisenberg\n      Uncertainty Device, James Salway discovers the problems \n      of being quantum.'}, 
  
{'author': "O'Brien, Tim", 'title': 'Microsoft .NET: The Programming Bible', 'genre': 'Computer', 'price': '36.95', 'publish_date': '2000-12-09', 'description': "Microsoft's .NET initiative is explored in \n      detail in this deep programmer's reference."}, 
  
{'author': "O'Brien, Tim", 'title': 'MSXML3: A Comprehensive Guide', 'genre': 'Computer', 'price': '36.95', 'publish_date': '2000-12-01', 'description': 'The Microsoft MSXML3 parser is covered in \n      detail, with attention to XML DOM interfaces, XSLT processing, \n      SAX and more.'}, 
    
{'author': 'Galos, Mike', 'title': 'Visual Studio 7: A Comprehensive Guide', 'genre': 'Computer', 'price': '49.95', 'publish_date': '2001-04-16', 'description': 'Microsoft Visual Studio 7 is explored in depth,\n      looking at how Visual Basic, Visual C++, C#, and ASP+ are \n      integrated into a comprehensive development \n      environment.'}

]

Source code for the how to parse XML using python

You can find or clone the source code for repository on this repo link.

Learning Python

2022 Complete Python Bootcamp From Zero to Hero in Python

Learn Python like a Professional Start from the basics and go all the way to creating your own applications and games

Summary

In this tutorial, we explored that how to parse the XML file using python. LXML package is used to parse the XML file. The source code and link to the best course is also included.