Difference between revisions of "Pandas"

From PrattWiki
Jump to navigation Jump to search
(Created page with "This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than tha...")
(No difference)

Revision as of 18:30, 19 January 2020

This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!

File Types

Pandas can load data from a text file or from an Excel spreadsheet.

Text Files

For text files, you need to figure out two things:

  • How are individual data points separated in the file? (tabs, commas, spaces, etc)
    • If separated by commas, use pd.read_csv("file") to load data frame
    • If separated by tabs, use pd.read_table("file") to load data frame
    • If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_csv() or pd.read_table() command.

Excel Files

For Excel files, you need to figure out two things:

  • Does the file have one sheet or more than one sheet?
    • If there is only one sheet, use pd.read_excel("file") to load data frame
    • If there are multiple sheets, include sheet_name=X where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_excel()