Should I learn Python now I know R?

In response to a query from a colleague:

It depends what you need. I would still recommend R as the first programming language to learn for anyone working in epidemiology or academia - I would say it is the best option for biostatistics, data visualisation, report automation and cutting-edge statistical methods.

Once you have got to a reasonable level in R then learning some Python is not difficult and can be valuable. Python is a general-purpose language with much broader applications than R, and has the advantage in some areas like machine learning, data engineering, and app/Web app development. Python is probably the easiest programming language to learn (R can be quirky and inconsistent). And a lot of what you learned with R is transferable. There is a Python module (equivalent of an R package) called pandas which allow you to write Python code that is often similar to tidyverse or data.table R code – see e.g. https://gist.github.com/conormm/fd8b1980c28dd21cfaf6975c86c74d07.

I wouldn’t recommend everyone learns both (I know both and still manage to do everything I need in R, but can read/run Python code if I need to), but possible advantages of learning some Python as well when you are a competent R coder could include:

  • It opens up a world of things you can’t (or shouldn’t try to) do in R.
  • It helps with collaboration across the diverse data science communities. Data science is increasingly bilingual. Some projects might need both. It is not really either/or any more.
  • It might make you a better programmer. In the Python world there is much more emphasis on quality code than in the R world.
  • You will possibly learn more about computer science through using Python.
  • You can easily mix R code with Python and get the best of both worlds – see e.g. the reticulate package.
  • For junior data scientists, having both Python and R skills increases employability across different sectors.

I started off in base R, then changed to Python for a few years, then came back to R when the tidyverse and data.table were coming in – I have only used data.table since then. I suppose your other option if you are happy with your tidyverse level would be to learn data.table, the main advantage being that it runs so much faster than tidyverse, especially with larger data sets, and does more with less code, but if you don’t need speed or to work with large data sets then there is probably little to gain.