-->

CS50 Python | Week 6 | File I/O | Problem Set 6 | Scourgify

Scourgify

Pset 6 - scourgify

The Challenge:

The challenge involves cleaning and reformatting data from a CSV file containing student information. The data is initially structured with names and houses in a single column, separated by a comma and space, and enclosed in double quotes. The goal is to split the names into first and last names and create a new CSV file with columns for first name, last name, and house.

The program, implemented in a file called scourgify.py, expects two command-line arguments: the name of an existing CSV file to read as input and the name of a new CSV file to write as output. The program uses the csv module, specifically DictReader and DictWriter, to handle the CSV operations.

Additionally, the program performs error handling:

If the user does not provide exactly two command-line arguments, the program exits with an error message: "Too few command-line arguments."

If the specified input file cannot be read, the program exits with an error message: "Could not read [input file]."

The challenge also provides a set of instructions on setting up the environment, creating the Python file (scourgify.py), and downloading the sample CSV file (before.csv) for testing.

The provided hints suggest using the csv module's DictReader and DictWriter methods, along with some guidance on handling fieldnames and writing headers.

To test the program manually, users are instructed to run the program with various command-line arguments and verify that it produces the expected output or error messages. 

In summary, the challenge focuses on data cleaning and CSV manipulation in Python, with an emphasis on handling command-line arguments and utilizing the csv module for efficient processing.

Solution:

import csv
import sys


# name, house
# 'last, first', house
'''first, last, house'''

#check the number and format of command-line arguments.

if len(sys.argv) < 3:
    sys.exit('Too few command-line arguments')
elif len(sys.argv) > 3:
    sys.exit('Too many command-line arguments')
elif not sys.argv[1].endswith('.csv'):
    sys.exit('Not a CSV file')

# assign the names of the input and output files to variables
input_file = sys.argv[1]
outut_file = sys.argv[2]

# Initialize an empty list to store the rows of the input CSV file as dictionaries
data = []

try:
    # Open the input file in read mode
    with open(input_file, newline='') as file:
        # open the output file in write mode
        with open(outut_file, 'w', newline='') as file2:
            reader = csv.DictReader(file)
            for row in reader:
                # Adds the data to the list in the form of a dictionary
                data.append({'name':row['name'], 'house':row['house']})
            # Specify the headers names
            writer = csv.DictWriter(file2, fieldnames=['first', 'last', 'house'])
            # Write the headers names to the new file
            writer.writeheader()

            # Iterate through the dictionary in [data]
            for _ in data:
                # Split the names in (name)
                name = _['name'].replace(' ', '')

                # Reverse the order of (names) and assign it to other variables
                last_name , first_name = name.split(',')
                house = _['house']
               
                # Write the rows in the order of first, last, house with referenced data
                # from the disctinary
                writer.writerow({'first': first_name, 'last': last_name, 'house': house})
# Finally, exits if the input file does not exist.
except FileNotFoundError:
    sys.exit(f'Could not read {sys.argv[1]}')


Code Documentation:

This Python code imports the csv and sys modules and defines a list of dictionaries named "data".

It then checks the number of command-line arguments passed to the script. If there are fewer than three, the script will exit with the message "Too few command-line arguments". If there are more than three, the script will exit with the message "Too many command-line arguments".

If there are exactly three arguments, the script will check if the second argument (which should be the name of a CSV file) ends with ".csv". If it does not, the script will exit with the message "Not a CSV file". If it does, the script will proceed with opening the input file and creating an output file with the third argument.

The script reads the input file using the csv.DictReader() method and adds each row to the "data" list as a dictionary with keys 'name' and 'house' and values corresponding to the values in the 'name' and 'house' columns of the input CSV file.

The script then writes the 'first', 'last', and 'house' values from each dictionary in "data" to the output file using csv.DictWriter() method, after splitting the 'name' value by comma and reversing the order of the resulting two strings.

If the input file cannot be found, the script will exit with the message "Could not read {sys.argv[1]}".


Post a Comment

0 Comments