How to parse xml using python
Python is a modern programming language that developers use to perform variety of tasks. Parsing XML file is a feature of Python often used. In this tutorial, we will learn to parse XML using python. In a previous tutorials, we learned to parse XML using nodejs and php. Python can also be used to join wav audio files using python and Upload files using Python and Flask.
Following tasks are performed in this tutorial:
1. Create books XML file
2. Install Python and Pip
3. Install lxml package
4. Create an XML script to parse XML file
5. Run the script
First, create a project folder named parse-xml-file-using-python.
1. Create books XML file
Open the Microsoft Books XML URL. Copy the XML and save it on your project folder parse-xml-file-using-python.
2. Install Python and Pip
To install Python, Visit Download Python page and install on your system.
Install Pip on Windows
Pip is a package management system for python package and libraries. Execute the command to download the get-pip.py file.
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
To install pip, execute command below.
python get-pip.py
To verify Pip installation, type command below.
pip help
3. Install lxml package
lxml package is used to parse XML files, to install this package, type the command below on command line.
pip install lxml
4. Create an XML script to parse xml file
Open the project folder, create file parse-xml-using-python.py. Add code below into it.
from lxml import etree def parseBookXML(xmlFile): with open(xmlFile) as fobj: xml = fobj.read() root = etree.fromstring(xml) book_dict = {} books = [] for book in root.getchildren(): for elem in book.getchildren(): if elem.text: text = elem.text else: text = '' if elem.tag == 'title': book_dict[elem.tag] = text if elem.tag == 'genre': book_dict[elem.tag] = text if elem.tag == 'price': book_dict[elem.tag] = text if elem.tag == 'publish_date': book_dict[elem.tag] = text if elem.tag == 'description': book_dict[elem.tag] = text if elem.tag == 'author': book_dict[elem.tag] = text if book.tag == "book": books.append(book_dict) book_dict = {} return books books_list = parseBookXML("books.xml") print(books_list)
In the first line etree is imported from lxml package. Next, a function parseBookXML, and it accepts a parameter xmlFile. Inside function, using with open method, the XML file is opened as fobj.
The root element of XML file is fetched using etree’s fromstring method. The book_dict object and books array is defined.
Inside Books root element, there are multiple books elements. The book element is fetched using root.getchildren method. Each book element has child elements.
The child element of book are returned by book.getChildren. The text element tag is assigned to text variable if its value exists.
Next, if element tag is an title, genre, publish_date, description, or author, text is appended to books dictionary. The books dictionary to appended to the books array.
In the last, books list is returned. The parseBookXML method is called and parsed books list is returned.
5. Running the script
In order to run the script, Open command line. Navigate to the project folder, and type command.
python .\parse-xml-using-python.py
This command will output, the parsed xml below.
[ {'author': 'Gambardella, Matthew', 'title': "XML Developer's Guide", 'genre': 'Computer', 'price': '44.95', 'publish_date': '2000-10-01', 'description': 'An in-depth look at creating applications \n with XML.'}, {'author': 'Ralls, Kim', 'title': 'Midnight Rain', 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2000-12-16', 'description': 'A former architect battles corporate zombies, \n an evil sorceress, and her own childhood to become queen \n of the world.'}, {'author': 'Corets, Eva', 'title': 'Maeve Ascendant', 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2000-11-17', 'description': 'After the collapse of a nanotechnology \n society in England, the young survivors lay the \n foundation for a new society.'}, {'author': 'Corets, Eva', 'title': "Oberon's Legacy", 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2001-03-10', 'description': 'In post-apocalypse England, the mysterious \n agent known only as Oberon helps to create a new life \n for the inhabitants of London. Sequel to Maeve \n Ascendant.'}, {'author': 'Corets, Eva', 'title': 'The Sundered Grail', 'genre': 'Fantasy', 'price': '5.95', 'publish_date': '2001-09-10', 'description': "The two daughters of Maeve, half-sisters, \n battle one another for control of England. Sequel to \n Oberon's Legacy."}, {'author': 'Randall, Cynthia', 'title': 'Lover Birds', 'genre': 'Romance', 'price': '4.95', 'publish_date': '2000-09-02', 'description': 'When Carla meets Paul at an ornithology \n conference, tempers fly as feathers get ruffled.'}, {'author': 'Thurman, Paula', 'title': 'Splish Splash', 'genre': 'Romance', 'price': '4.95', 'publish_date': '2000-11-02', 'description': 'A deep sea diver finds true love twenty \n thousand leagues beneath the sea.'}, {'author': 'Knorr, Stefan', 'title': 'Creepy Crawlies', 'genre': 'Horror', 'price': '4.95', 'publish_date':'2000-12-06', 'description': 'An anthology of horror stories about roaches,\n centipedes, scorpions and other insects.'}, {'author': 'Kress, Peter', 'title': 'Paradox Lost', 'genre': 'Science Fiction', 'price': '6.95', 'publish_date': '2000-11-02', 'description': 'After an inadvertant trip through a Heisenberg\n Uncertainty Device, James Salway discovers the problems \n of being quantum.'}, {'author': "O'Brien, Tim", 'title': 'Microsoft .NET: The Programming Bible', 'genre': 'Computer', 'price': '36.95', 'publish_date': '2000-12-09', 'description': "Microsoft's .NET initiative is explored in \n detail in this deep programmer's reference."}, {'author': "O'Brien, Tim", 'title': 'MSXML3: A Comprehensive Guide', 'genre': 'Computer', 'price': '36.95', 'publish_date': '2000-12-01', 'description': 'The Microsoft MSXML3 parser is covered in \n detail, with attention to XML DOM interfaces, XSLT processing, \n SAX and more.'}, {'author': 'Galos, Mike', 'title': 'Visual Studio 7: A Comprehensive Guide', 'genre': 'Computer', 'price': '49.95', 'publish_date': '2001-04-16', 'description': 'Microsoft Visual Studio 7 is explored in depth,\n looking at how Visual Basic, Visual C++, C#, and ASP+ are \n integrated into a comprehensive development \n environment.'} ]
Source code for the how to parse XML using python
You can find or clone the source code for repository on this repo link.
Learning Python
Summary
In this tutorial, we explored that how to parse the XML file using python. LXML package is used to parse the XML file. The source code and link to the best course is also included.
Previous Article:
Related Articles:
- How to join wav audio files using python – a beginner tutorial
- Generate XML with NodeJS and MySQL using XML builder
- Generate XML with NodeJS and MySQL using XML builder
- Latest Web Development Courses To Make You Pro Developer
- 11 Best Freelancing Platforms for Web Developers in 2021
Next Article: