Difference between revisions of "Pandas"

From PrattWiki
Jump to navigation Jump to search
(Created page with "This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than tha...")
 
Line 8: Line 8:
 
** If separated by commas, use pd.read_csv("file") to load data frame
 
** If separated by commas, use pd.read_csv("file") to load data frame
 
** If separated by tabs, use pd.read_table("file") to load data frame
 
** If separated by tabs, use pd.read_table("file") to load data frame
** If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points
+
** If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points; if the separator is more than one character, you will also need to add <code>engine="python"</code> to the command.
 +
** Note - if there are spaces in addition to other symbols, pandas will skip the spaces for the numerical information but '''not''' for the headers!
 
* Do the columns have headers (column labels) or not?
 
* Do the columns have headers (column labels) or not?
 
** If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
 
** If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
 
** If the first row of the file should be included in the data set and does not contain column headers, add <code>header=None</code> to the <code>pd.read_csv()</code> or <code>pd.read_table()</code> command.
 
** If the first row of the file should be included in the data set and does not contain column headers, add <code>header=None</code> to the <code>pd.read_csv()</code> or <code>pd.read_table()</code> command.
 +
* The following shows examples and different ways to load text files.  Note that once the program runs:
 +
* <code>t_h</code>, <code>c_h</code>, and <code>o_h</code> will all be the same
 +
* <code>t_nh</code>, <code>c_nh</code>, and <code>o_nh</code> will all be the same
 +
 +
<html><iframe src="https://trinket.io/embed/python3/2373a73e09?start=result&runMode=console" width="100%" height="600" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe></html>
 +
 +
 +
 
=== Excel Files ===
 
=== Excel Files ===
 
For Excel files, you need to figure out two things:
 
For Excel files, you need to figure out two things:

Revision as of 19:30, 19 January 2020

This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!

File Types

Pandas can load data from a text file or from an Excel spreadsheet.

Text Files

For text files, you need to figure out two things:

  • How are individual data points separated in the file? (tabs, commas, spaces, etc)
    • If separated by commas, use pd.read_csv("file") to load data frame
    • If separated by tabs, use pd.read_table("file") to load data frame
    • If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points; if the separator is more than one character, you will also need to add engine="python" to the command.
    • Note - if there are spaces in addition to other symbols, pandas will skip the spaces for the numerical information but not for the headers!
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_csv() or pd.read_table() command.
  • The following shows examples and different ways to load text files. Note that once the program runs:
  • t_h, c_h, and o_h will all be the same
  • t_nh, c_nh, and o_nh will all be the same


Excel Files

For Excel files, you need to figure out two things:

  • Does the file have one sheet or more than one sheet?
    • If there is only one sheet, use pd.read_excel("file") to load data frame
    • If there are multiple sheets, include sheet_name=X where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_excel()