How to start learning R (part 2)

Let’s keep talking about R! This is the second part of the post about how to start learning this programming language. In the first part, I tried to highlight that the most important thing about learning R is wanting to learn, no matter what your background is. It doesn’t matter if you have a technical or scientific background (although it helps) as long as you are willing to learn.

If you already know about programming or statistics, this Data Science specialization offered by Johns Hopkins University through Coursera will be much easier for you. If you don’t have any previous knowledge about it, don’t worry, you’ll learn. Just focus on wanting to learn and that’ll keep your motivation up.

Having said that, in this second part of the post, I explain in more detail about each of the courses. In the previous post, I talked through the common points within the 9 courses, but I forgot to add a couple tips that were useful for me to do the quizzes and assignments:

– Typical phrase that we all are tired of hearing: Google is your best friend. If there is something you don’t know, google it. You could change Google for Yahoo or the library in your town, it’s up to you, the thing here is that although most of concepts are explained in the weekly videos, there are specific parts in the quizzes and assignments that require of some investigation to solve them. It doesn’t have any difficulty because if there is something always at our reach, that’s information.

– The forums and the other students in the course are also very good friends: every time I’ve asked anything in one of the forums, it’s been answered almost immediately. I know it’s online, but I’ve perceived a lot of fellowship and eager to help.

The specialization has 9 courses that can be done independent from each other. However they are related and having done one course makes it easier to do the ones after. I’ve done 2 each month, in the same order that they are on the site on Coursera, and the same that they are explained in this post.

1. The Data Scientist’s Toolbox:

This is the easiest course. It’s worth it to start the second one at the same time because it’s very easy. The videos in the first week explain what the other courses are about, and the one in the second week do about what tools will be used during the specialization (Git, R Studio, etc…). It has 3 quizzes and a very simple assignment.

2. R Programming:

Probably the most difficult course for me, as I didn’t have any previous programming knowledge (html and CSS don’t count…). The videos are easy to understand but my problem was when I started to do the first assignment. In this case there are 3 assignments, 1 in the 2nd week, 1 in the 3rd and 1 in the 4th. Something I didn’t know is that watching the videos of week 2 is very helpful to do the assignment in week 1, and watching the videos in week 3 helps to do the assignment in week 2 (and so on). Usually that’s not the case, usually you don’t need to do that in the other courses, but in this one, it’s useful.

In the end I finished R programming with distinction (you get it when your grade is higher than 90%), and for me everything started to make sense when I realised that a programming language is like any other language and sometimes you just have to think in a logical way. Obviously, this is only suitable for simple functions, because to have an advanced knowledge of R (as for any language), you need to learn more and have more experience, but anyway, there are many (beautiful) things you can do by having a basic knowledge of this language.

The next 3 courses, until the Statistical Inference one, weren’t difficult.

3. Getting and Cleaning Data:

Here you learn to clean and organise data sets. In case they have NAs or you need to filter the data or subset a part of it, this course explains how to do it. It has 3 quizzes and a mandatory assignment. There is an optional assignment called Swirl that gives you extra points and also helps to clarify the concepts learnt.

4. Exploratory Data Analysis:

This course shows how to build plots in R to analyse the data. It’s very important because plotting is a recurrent subject in all the next courses, and is as well of a big matter when analysing data. As an analyst, making plots is essential when starting to analyse the data, both to detect outliers and to understand the relation between the different variables. Visual representation is always mentioned as part of communicating the findings, but it is as important for yourself as the analyst when starting to analyse.

It has 2 quizzes and 2 projects.

5. Reproducible Research:

This course shows how important it is that the analysis and the code are available to other people so they can reproduce the same research. It has 2 assignments and 2 quizzes.

6. Statistical Inference:

This course has a different teacher than the previous 5 and things get complicated again. Many think that it’s complicated because of the teacher, but for me it’s a mix of the difficulty of the subject in a big part, and in a much smaller part due to the way it’s explained. Brian Caffo teaches this course and Roger D. Peng taught the previous ones. In the first part of the post you can find my opinion about the 3 teachers.

Statistical Inference is like checking again the notes you took in uni in the subjects about statistics. In my case I had statistics as an optional subject while doing the degree on Advertising, and then as mandatory subjects when studying the Market Research degree. The course talks about probability, variance, distributions, coefficient intervals, t-tests, p-values, etc… The videos and quizzes require more time than in the previous courses. I did this one at the same time than Reproducible Research, and tried to finish Reproducible Research in the first weeks so I could spend more time on Statistical Inference.

It has 4 quizzes, 1 mandatory assignment and 1 swirl assignment. The videos are available on Brian Caffo’s Youtube channel. There are also some videos called Homework that were created to help do the quizzes, which are kind of difficult.

7. Regression Models:

This course, with the next one (Practical Machine Learning) and the R Programming one were my favourite, as well as the ones that took me more time. If you like statistics, you’ll really enjoy them. I’d remove from this course though the mathematical part that is used to explain some of the functions. It’s optional this time (not in Statistical Inference) but it can be confusing to understand the explained in the videos, and also the course don’t go that much in depth to necessarily understand the Maths behind. Despite that, Regression Models is a beautiful subject and this course is about analysing the relation between a dependant variable with other independent variables. It focus a lot in linear models.

This course has 4 quizzes, 1 mandatory assignment, and 1 swirl assignment. As the Statistical Inference course, the videos are available on Brian’s Youtube channel.

8. Practical Machine Learning:

Jeff Leek is the teacher in charge of this course. It could be said that this is the practical part of Regression Models, and it focuses in building predictive models in R, including models as Random Forrest, Boosting, Forecasting, etc… It requires more work than the first courses, but it’s totally worth it. It has 4 quizzes and 1 project.

9. Developing Data Products:

Nineth and last course, currently the one that I’m taking. After doing the last 3, I consider this one much easier. To summarise a lot again, it shows tools to build interactive applications and dashboards with Shiny, RCharts or GoogleVis, and presentations with Slidify or RStudio presenter. It has 3 quizzes and 1 assignment.

The specialization has a final Project (the Data Science Capstone) that is only available when you finish the 9 courses. I haven’t been able to sign up to do the project yet. Will share how it goes when I do!

I don’t have much more to add, just to highlight again that this specialization is for everyone that really wants to do it, even though your background is not technical. If you want to develop your career as an analyst, I totally recommend it. I won’t lie, it requires time and it has some parts that aren’t easy, but there is nothing impossible and it’s great to learn something new that you feel passion about.

As said in the previous post, if there is anyone interested in doing the specialization or has any question, I’m always available on Twitter, Linkedin, on this blog (you can leave a comment here) or by email 🙂

Digital Analytics and a bit of R

by Bárbara Mackey

Bárbara Mackey

Leave a Reply

Recent Posts

Categories

Archives