Pandas Scratchpad – I

This blog is scratchpad for day-to-day Pandas commands.
pandas is an open-source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
1. Few quick ways to create Pandas DataFrame
DataFrame from Dict of List –
DataFrame from List of List –
DataFrame from List of Dict –
DataFrame using zip function –

data = {'Name':['Iron Man', 'Deadpool', 'Captian America', Thor', 'Hulk', 'Spider Man'], 'Age':[48, 30, 100, 150, 50, 22]}
data = [['Iron Man', 48], ['Deadpool', 30], ['Captian America', 100], ['Thor', 150], ['Hulk', 50], ['Spider Man', 22]]
data = [{'Name':'Iron Man', 'Age': 48}, {'Name':'Deadpool', 'Age': 30}, {'Name':'Captian America', 'Age': 100},
{'Name':'Thor', 'Age': 150}, {'Name':'Hulk', 'Age': 50}, {'Name':'Spider Man', 'Age': 22}]
Name = [''Iron Man', 'Deadpool', 'Captian America', 'Thor', 'Hulk', 'Spider Man']
Age = [48, 30, 100, 150, 50, 22]
data = list(zip(Name, Age))
df = pd.DataFrame(data, columns = ['Name', 'Age'])

2. Reading Data from CSV
While reading csv using pandas you might hit error like

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte

In such cases you need to use encoding parameter. For example –

df = pd.read_csv('avengers.csv', encoding = "ISO-8859-1")

3. Converting CSV to JSON
Default way

cat /tmp/avengers.json
{"Name":{"0":"Iron Man","1":"Deadpool","2":"Captian America","3":"Thor","4":"Hulk","5":"Spider Man","6":"Batman"},"Age":{"0":48.0,"1":30.0,"2":100.0,"3":150.0,"4":50.0,"5":22.0,"6":null},"Avenger":{"0":"Y","1":"Y","2":"Y","3":"Y","4":"Y","5":"Y","6":"N"}}

With “orient” parameter

cat /tmp/avengers_orient_record.json
[{"Name":"Iron Man","Age":48.0,"Avenger":"Y"},{"Name":"Deadpool","Age":30.0,"Avenger":"Y"},{"Name":"Captian America","Age":100.0,"Avenger":"Y"},{"Name":"Thor","Age":150.0,"Avenger":"Y"},{"Name":"Hulk","Age":50.0,"Avenger":"Y"},{"Name":"Spider Man","Age":22.0,"Avenger":"Y"},{"Name":"Batman","Age":null,"Avenger":"N"}]

The way data is stored in json file is visually different. With orient=records, the record is list of dictionary.
4. Reading Json
5. Change DataType
6. Describe DataFrame
By default, describing a Dataframe returns only numeric fields.
To describe all the columns –
To describe  columns with category datatype –
7. Count distinct observations
Screen Shot 2019-10-06 at 11.18.15 PM
8. Count of Unique values
Screen Shot 2019-10-06 at 11.16.55 PM
9.  Null values
Total null values for a column
Screen Shot 2019-10-06 at 11.34.56 PM
10. Pandas Profiling
To generate profile report of DataFrame use pandas-profiling. The profile contains Overview, Variable details, Pearson and Spearman Correlations helping in quick analysis of data.
Screen Shot 2019-10-06 at 11.29.14 PM

Leave a Reply