Pandas
Jump to navigation
Jump to search
This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!
File Types
Pandas can load data from a text file or from an Excel spreadsheet.
Text Files
For text files, you need to figure out two things:
- How are individual data points separated in the file? (tabs, commas, spaces, etc)
- If separated by commas, use pd.read_csv("file") to load data frame
- If separated by tabs, use pd.read_table("file") to load data frame
- If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points; if the separator is more than one character, you will also need to add
engine="python"
to the command. - Note - if there are spaces in addition to other symbols, pandas will skip the spaces for the numerical information but not for the headers!
- Do the columns have headers (column labels) or not?
- If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
- If the first row of the file should be included in the data set and does not contain column headers, add
header=None
to thepd.read_csv()
orpd.read_table()
command.
- The following shows examples and different ways to load text files. Note that once the program runs:
t_h
,c_h
, ando_h
will all be the samet_nh
,c_nh
, ando_nh
will all be the same
Excel Files
For Excel files, you need to figure out two things:
- Does the file have one sheet or more than one sheet?
- If there is only one sheet, use pd.read_excel("file") to load data frame
- If there are multiple sheets, include
sheet_name=X
where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
- Do the columns have headers (column labels) or not?
- If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
- If the first row of the file should be included in the data set and does not contain column headers, add
header=None
to thepd.read_excel()