Pandas

From PrattWiki
Revision as of 19:30, 19 January 2020 by DukeEgr93 (talk | contribs)
Jump to navigation Jump to search

This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!

File Types

Pandas can load data from a text file or from an Excel spreadsheet.

Text Files

For text files, you need to figure out two things:

  • How are individual data points separated in the file? (tabs, commas, spaces, etc)
    • If separated by commas, use pd.read_csv("file") to load data frame
    • If separated by tabs, use pd.read_table("file") to load data frame
    • If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points; if the separator is more than one character, you will also need to add engine="python" to the command.
    • Note - if there are spaces in addition to other symbols, pandas will skip the spaces for the numerical information but not for the headers!
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_csv() or pd.read_table() command.
  • The following shows examples and different ways to load text files. Note that once the program runs:
  • t_h, c_h, and o_h will all be the same
  • t_nh, c_nh, and o_nh will all be the same


Excel Files

For Excel files, you need to figure out two things:

  • Does the file have one sheet or more than one sheet?
    • If there is only one sheet, use pd.read_excel("file") to load data frame
    • If there are multiple sheets, include sheet_name=X where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_excel()