Scourgify
Pset 6 - scourgify |
The Challenge:
The challenge involves cleaning and reformatting data from a CSV file containing student information. The data is initially structured with names and houses in a single column, separated by a comma and space, and enclosed in double quotes. The goal is to split the names into first and last names and create a new CSV file with columns for first name, last name, and house.
The program, implemented in a file called scourgify.py, expects two command-line arguments: the name of an existing CSV file to read as input and the name of a new CSV file to write as output. The program uses the csv module, specifically DictReader and DictWriter, to handle the CSV operations.
Additionally, the program performs error handling:
If the user does not provide exactly two command-line arguments, the program exits with an error message: "Too few command-line arguments."
If the specified input file cannot be read, the program exits with an error message: "Could not read [input file]."
The challenge also provides a set of instructions on setting up the environment, creating the Python file (scourgify.py), and downloading the sample CSV file (before.csv) for testing.
The provided hints suggest using the csv module's DictReader and DictWriter methods, along with some guidance on handling fieldnames and writing headers.
To test the program manually, users are instructed to run the program with various command-line arguments and verify that it produces the expected output or error messages.
In summary, the challenge focuses on data cleaning and CSV manipulation in Python, with an emphasis on handling command-line arguments and utilizing the csv module for efficient processing.
Solution:
import csv
import sys
Code Documentation:
This Python code imports the csv and sys modules and defines a list of dictionaries named "data".
It then checks the number of command-line arguments passed to the script. If there are fewer than three, the script will exit with the message "Too few command-line arguments". If there are more than three, the script will exit with the message "Too many command-line arguments".
If there are exactly three arguments, the script will check if the second argument (which should be the name of a CSV file) ends with ".csv". If it does not, the script will exit with the message "Not a CSV file". If it does, the script will proceed with opening the input file and creating an output file with the third argument.
The script reads the input file using the csv.DictReader() method and adds each row to the "data" list as a dictionary with keys 'name' and 'house' and values corresponding to the values in the 'name' and 'house' columns of the input CSV file.
The script then writes the 'first', 'last', and 'house' values from each dictionary in "data" to the output file using csv.DictWriter() method, after splitting the 'name' value by comma and reversing the order of the resulting two strings.
If the input file cannot be found, the script will exit with the message "Could not read {sys.argv[1]}".
0 Comments
Your opinion matters, your voice makes us proud and happy. Your words are our motivation.