Pandas
Jump to navigation
Jump to search
This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!
File Types
Pandas can load data from a text file or from an Excel spreadsheet.
Text Files
For text files, you need to figure out two things:
- How are individual data points separated in the file? (tabs, commas, spaces, etc)
- If separated by commas, use pd.read_csv("file") to load data frame
- If separated by tabs, use pd.read_table("file") to load data frame
- If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points; if the separator is more than one character, you will also need to add
engine="python"
to the command. - Note - if there are spaces in addition to other symbols, pandas will skip the spaces for the numerical information but not for the headers! If you look at the example files in the Trinket below, all extra spaces have been removed.
- Do the columns have headers (column labels) or not?
- If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
- If the first row of the file should be included in the data set and does not contain column headers, add
header=None
to thepd.read_csv()
orpd.read_table()
command.
- The following shows examples and different ways to load text files. Note that once the program runs:
t_h
,c_h
, ando_h
will all be the samet_nh
,c_nh
, ando_nh
will all be the same
Excel Files
For Excel files, you need to figure out two things:
- Does the file have one sheet or more than one sheet?
- If there is only one sheet, use pd.read_excel("file") to load data frame
- If there are multiple sheets, include
sheet_name=X
where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
- Do the columns have headers (column labels) or not?
- If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
- If the first row of the file should be included in the data set and does not contain column headers, add
header=None
to thepd.read_excel()