First time here? Check out the Help page!
1 | initial version |
@Eric Ringold mentioned Python and Pandas already, but I thought I'd add my two cents as I've been a very heavy user of Python/Pandas for years (and a casual source code developer for Pandas), which I used for many and various things including looking at energy model output.
The following is my own opinion, and I'm heavily biased towards Python.
I'd go 100% with Python and Pandas personally. Pandas (which was purposefully built to handle time series) offers the same concept as R dataframes (R data.table
package if I'm not mistaken). It's amazing, makes tasks like up/downsampling and filtering really easy or filling gaps (DataFrame.resample, DataFrame.fillna, DataFrame.interpolate among other methods), and it's fast (most core functions are vectorized and written in Cython). I almost never use Excel anymore...
It's true that R has, still today, more statistical support. But it doesn't mean that Python doesn't: most functions 99% of us will ever need are already in the statsmodel
package or another like it.
On the other hand, Python is a more versatile and broad spectrum language than R (meaning there are more well-developed and maintained packages for extremely varied things, so it's usually easier to do something non-statistical with Python), so if you have to learn something now, go with Python.
For example, say you wish to find a way to use scripting to change an EnergyPlus input file. I doubt you'll find a package in R to do that (could be wrong...). But in Python you'll find eppy!
If you have never used Python before, and especially if you're on Windows (*Nix users have Python installed by default), perhaps one of the simplest way to get started is to install an Anaconda Distribution which will make it real easy to install a Python distro with hundreds of scientific packages and a package manager. The (smart) alternative would be to use virtualenv, but I wouldn't start by this: if it looks complicated to you, just leave it for your next machine.
Then, to learn some decent amount of Python, I'd start by taking an online interactive tutorial at least (plenty out there, learnpython.org for example), and perhaps and online course, such as the free University of Michigan Programming for Everybody on Coursera. I'm quoting this specific one since I have taken it quickly when I wanted to recommend a course for colleagues, and it's a very good course (perhaps too basic on some aspects if you have extensive programming experience in other languages, I can't really tell since I took it while already experienced in Python).
Then look into Pandas, and use Jupyter notebook for exploratory work.
Other packages worth looking at: I also really like Seaborn for plotting, and if I want to create interactive graphs that I can share with anyone, I tend to use Plotly since it's easy to use their API and it'll upload the graph on their platform so you can share the link (you can also plot offline directly in a notebook, also look at cufflinks which adds methods to Pandas dataframe to create plotly graphs). Obviously Matplotlib is the core thing behind most plotting you'll do, but pandas has easy convenience functions to create typical graphs from dataframes/series.