Pandas Data StructuresSeries and DataFrame To Manage and Analyze Data
Student Exam Scores Analysis
import pandas as pd
# Create a Pandas Series for student names
students = pd.Series(['Alice', 'Bob', 'Charlie', 'David', 'Eva'])
# Create a DataFrame for their scores
scores = pd.DataFrame({
'Math': [85, 92, 78, 90, 88],
'Science': [88, 84, 91, 89, 93],
'English': [80, 79, 85, 94, 92]
})
# Add the student names to the DataFrame
scores['Student'] = students
# Set "Student" as the index
scores.set_index('Student', inplace=True)
# Display the DataFrame
print("Student Exam Scores:")
print(scores)
# Calculate the average score for each student
scores['Average'] = scores.mean(axis=1)
print("\nAverage Scores:")
print(scores[['Average']])
# Find the student with the highest average score
top_student = scores['Average'].idxmax()
top_average = scores['Average'].max()
print(f"\nTop student: {top_student} with an average score of {top_average:.2f}")
# Find the subject with the highest average score
subject_avg = scores.mean(axis=0)
top_subject = subject_avg.idxmax()
print(f"\nSubject with the highest average score: {top_subject} ({subject_avg[top_subject]:.2f})")
# Add a column to indicate if a student passed all subjects (Pass if score >= 50)
scores['Passed All'] = scores[['Math', 'Science', 'English']].apply(lambda row: all(row >= 50), axis=1)
print("\nScores with Pass Status:")
print(scores)
# Save the DataFrame to a CSV file
scores.to_csv('student_scores.csv', index=True)
print("\nData saved to 'student_scores.csv'")
Program Highlights
1. Pandas Series: Used for the student names.
2. Pandas DataFrame: Used for managing and analyzing scores for three subjects.
3. Calculations:
- Computes average scores for each student.
- Identifies the student with the highest average.
- Finds the subject with the highest average score.
4. Logical Operation: Adds a "Passed All" column to indicate if a student passed all subjects.
5. Data Export: Saves the DataFrame to a CSV file for future use.
Sample Output
Student Exam Scores:
Student Math Science English
Alice 85 88 80
Bob 92 84 79
Charlie 78 91 85
David 90 89 94
Eva 88 93 92
Average Scores:
Student Average
Alice 84.333333
Bob 85.000000
Charlie 84.666667
David 91.000000
Eva 91.000000
Top student: David with an average score of 91.00
Subject with the highest average score: Science (89.00)
Scores with Pass Status:
Math Science English Average Passed All
Student
Alice 85 88 80 84.333333 True
Bob 92 84 79 85.000000 True
Charlie 78 91 85 84.666667 True
David 90 89 94 91.000000 True
Eva 88 93 92 91.000000 True
Data saved to 'student_scores.csv'
Usage
- Pandas Series and DataFrame make it easy to store, manipulate, and analyze tabular data.
- The program handles tasks such as:
Computing averages.
Sorting and filtering.
Adding calculated columns.
- Exporting to a CSV file is useful for real-world applications where data persistence is needed.
This example is suitable for learning and can be extended for more advanced use cases.
Comments
Post a Comment