DataSciencePandas

Pandas Scratchpad – I

This blog is scratchpad for day-to-day Pandas commands.
pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
1. Few quick ways to create Pandas DataFrame
DataFrame from Dict of List –
df_from_dict_of_list
DataFrame from List of List –
df_from_list_of_list
DataFrame from List of Dict –
df_from_list_of_dict
DataFrame using zip function –
df_using_zip

data = {'Name':['Iron Man', 'Deadpool', 'Captian America', Thor', 'Hulk', 'Spider Man'], 'Age':[48, 30, 100, 150, 50, 22]}
data = [['Iron Man', 48], ['Deadpool', 30], ['Captian America', 100], ['Thor', 150], ['Hulk', 50], ['Spider Man', 22]]
data = [{'Name':'Iron Man', 'Age': 48}, {'Name':'Deadpool', 'Age': 30}, {'Name':'Captian America', 'Age': 100},
{'Name':'Thor', 'Age': 150}, {'Name':'Hulk', 'Age': 50}, {'Name':'Spider Man', 'Age': 22}]
Name = [''Iron Man', 'Deadpool', 'Captian America', 'Thor', 'Hulk', 'Spider Man']
Age = [48, 30, 100, 150, 50, 22]
data = list(zip(Name, Age))
df = pd.DataFrame(data, columns = ['Name', 'Age'])

2. Reading Data from CSV
df_read_csv.png
While reading csv using pandas you might hit error like

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

In such cases you need to use encoding parameter. For example –

df = pd.read_csv('avengers.csv', encoding = "ISO-8859-1")

3. Converting CSV to JSON
Default way
read_csv_to_json.png

cat /tmp/avengers.json
{"Name":{"0":"Iron Man","1":"Deadpool","2":"Captian America","3":"Thor","4":"Hulk","5":"Spider Man","6":"Batman"},"Age":{"0":48.0,"1":30.0,"2":100.0,"3":150.0,"4":50.0,"5":22.0,"6":null},"Avenger":{"0":"Y","1":"Y","2":"Y","3":"Y","4":"Y","5":"Y","6":"N"}}

With “orient” parameter
to_json_orient_records.png

cat /tmp/avengers_orient_record.json
[{"Name":"Iron Man","Age":48.0,"Avenger":"Y"},{"Name":"Deadpool","Age":30.0,"Avenger":"Y"},{"Name":"Captian America","Age":100.0,"Avenger":"Y"},{"Name":"Thor","Age":150.0,"Avenger":"Y"},{"Name":"Hulk","Age":50.0,"Avenger":"Y"},{"Name":"Spider Man","Age":22.0,"Avenger":"Y"},{"Name":"Batman","Age":null,"Avenger":"N"}]

The way data is stored in json file is visually different. With orient=records, the record is list of dictionary.
4. Reading Json
read_json.png
5. Change DataType
Change_dtype
6. Describe DataFrame
By default, describing a Dataframe returns only numeric fields.
describe_default
To describe all the columns –
describe_all
To describe  columns with category datatype –
df_category
7. Count distinct observations
Screen Shot 2019-10-06 at 11.18.15 PM
8. Count of Unique values
Screen Shot 2019-10-06 at 11.16.55 PM
9.  Null values
Total null values for a column
Screen Shot 2019-10-06 at 11.34.56 PM
10. Pandas Profiling
To generate profile report of DataFrame use pandas-profiling. The profile contains Overview, Variable details, Pearson and Spearman Correlations helping in quick analysis of data.
Screen Shot 2019-10-06 at 11.29.14 PM

Leave a Reply