DATA CLEANING STEPS WITH PANDAS

import pandas as pd
with open("FILE_NAME.csv", 'rb') as rawdata:
result = chardet.detect(rawdata.read(100000))
print(result)
{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}
df = pd.read_csv("FILE_NAME.csv", index_col=0)
  • Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line.
df.columns
Index(['columns1', 'columns2', 'columns3', 'columns4'],
dtype='object')
df.head(5)
df.head(5).T
df.drop_duplicates(["COLUMNS1","COLUMNS2"])
df.tail()
df.isna().sum()
columns1            0
columns2 0
columns3 0
columns4 0
columns5 0
dtype: int64
df.shape
df.columns = df.columns.str.lower()
df = df.apply(lambda x: x.astype(str).str.lower())
df.COLUMNS_NAME.unique()
df.COLUMNS_NAME.nunique()
replace_value = {"[kK]":"e3","[mM]":"e6",}
df1 = df["COLUMNS_NAME"].copy()
df1 = df1.replace(replace_value,regex=True).map(pd.eval).astype(int)
df_removed_last = df.copy()
df_removed_last = df.drop("COLUMNS_NAME", axis =1)
df_merged = pd.concat([df_removed_last, df1],axis=1)

Conclusion:

These actions are basic action which has to be performed on most of dataframes. Also if you are want to save time like me and do not want to copy everything, please find a link to Google Colab, where all code is written you only need to import a CSV file and most of the action will be performed automatically.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store