Wednesday 27 December 2017

Starting with Python - Pandas

In this first post on Pandas I just want to discuss why pandas is cool rather than go into too much detail, after all there are books and documentation that can go into much more depth than I can. Pandas comes with the Anaconda distribution so assuming that you took this route to install python you are good to go.

Here are some of the things you can do in python pandas:
  • Deduplicate data
  • Cleanse data
  • Join data (though personally a simply import into SQL Server and query against this is sometimes better)
  • Easily import / export data to databases, Excel, CSV and HTML (all of which I have needed)
  • Manipulate data and perform calculations
  • Get unique values

The list is much, much longer however the above are all things that it is use for commonly. For the data analysis and report automation that I have done I have used Pandas in every single case. If you Google it you can probably find examples of what you want to do and this information helps make Pandas so useful. Here is an example of some of the steps I have used pandas for in a single report:
  1. Read an Excel file
  2. Import data to SQL Server
  3. Extract data from SQL Server
  4. Format data, mainly dates to be in correct format
  5. Sort values
  6. Perform a join
  7. Get unique values
  8. Iterate through these values to create tabs on an Excel spreadsheet


Over time each of these subjects will be covered, with examples, individually and in more detail. 

Other Resources: 

No comments:

Post a Comment