Difference between revisions of "Pandas"

From PrattWiki
Jump to navigation Jump to search
(Excel Files)
 
(One intermediate revision by the same user not shown)
Line 26: Line 26:
 
* Do the columns have headers (column labels) or not?
 
* Do the columns have headers (column labels) or not?
 
** If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
 
** If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
** If the first row of the file should be included in the data set and does not contain column headers, add <code>header=None</code> to the <code>pd.read_excel()
+
** If the first row of the file should be included in the data set and does not contain column headers, add <code>header=None</code> to the <code>pd.read_excel()</code>
  
 
== Loading With Headers ==
 
== Loading With Headers ==
If you load a data set with headers, the headers become keys for the different series.   
+
 
 +
If you load a data set with headers, the headers become keys for the different series.  This can be very useful in extracting information (for example, in making plots) and can also make the code much more readable.
 +
 
 
<html>
 
<html>
 
<iframe src="https://trinket.io/embed/python3/d7e4662299" width="100%" height="600" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe>
 
<iframe src="https://trinket.io/embed/python3/d7e4662299" width="100%" height="600" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe>
 
</html>
 
</html>

Latest revision as of 21:04, 2 March 2020

This page is currently very much in draft form and is focused on commands needed to get numerical data from a file into Python. The Pandas package can do *much* more than that!

File Types

Pandas can load data from a text file or from an Excel spreadsheet.

Text Files

For text files, you need to figure out two things:

  • How are individual data points separated in the file? (tabs, commas, spaces, etc)
    • If separated by commas, use pd.read_csv("file") to load data frame
    • If separated by tabs, use pd.read_table("file") to load data frame
    • If separated by some other character, use pd.read_csv("file", sep="X") where X is replaced by whatever is between data points; if the separator is more than one character, you will also need to add engine="python" to the command.
    • Note - if there are spaces in addition to other symbols, pandas will skip the spaces for the numerical information but not for the headers! If you look at the example files in the Trinket below, all extra spaces have been removed.
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, both pd.read_csv() and pd.read_table() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_csv() or pd.read_table() command.
  • The following shows examples and different ways to load text files. Note that once the program runs:
    • t_h, c_h, and o_h will all be the same
    • t_nh, c_nh, and o_nh will all be the same

Excel Files

For Excel files, you need to figure out two things:

  • Does the file have one sheet or more than one sheet?
    • If there is only one sheet, use pd.read_excel("file") to load data frame
    • If there are multiple sheets, include sheet_name=X where X can be an integer indicating which sheet (in order from left to right, with 0 being furthest left) or a string with a sheet name. You can also load multiple sheets at once - that is not covered yet.
  • Do the columns have headers (column labels) or not?
    • If the first row of the file has column headers, pd.read_excel() will assign the first row as column labels
    • If the first row of the file should be included in the data set and does not contain column headers, add header=None to the pd.read_excel()

Loading With Headers

If you load a data set with headers, the headers become keys for the different series. This can be very useful in extracting information (for example, in making plots) and can also make the code much more readable.