Data science is a multidisciplinary field where you need knowledge of programming, statistics and domain knowledge. At times you can do away with domain knowledge because chances are that you will be working with a subject matter expert. But the other two aspects, programming and statistics are must know if you want to be anywhere near data science.
If you unsure of your programming or statistics prowess, you can take the following quiz, just to know where you need to focus.
When it comes to programming part of data science; Python and R are two extensively used languages. Both are open source and free to use plus they offer a lot of libraries for data science. Knowing both R and Python is a real bonus, but for now, we will focus on Python.
Let’s skip the usual part of the history of Python and stuff like that and start with writing the code. But where you will write the code?
Prepare your computer for Python
There is a hard way to set up Python and install data science libraries and install some IDE or text editor and there is an easy way. Let’s take an easy way. I am not even going to talk about the difficult way. Most of the people who want to venture into data science are non-programmers and are immediately put off by initial setup.
That’s enough now. Download and install Anaconda to set up Python and other required stuff on your computer.
Make sure you are selecting a proper operating system. Click on the download button under Python 3.6 version. Python comes in two flavours, Python 2 and Python 3. There are a few differences between the two.
Since we are on the nitro boost at the moment let’s skip the differences between Python 2 and Python 3 as well. Once you download Anaconda setup file, install it. The installation process is straightforward
Once you have installed Anaconda
Now that we have installed Anaconda, let’s start programming. When you download Anaconda, a tool called Jupyter Notebook is installed along with it. We are going to use Jypyter Notebook to learn Python.
You can open Jupyter Notebook from Start Menu if you are using Windows but if you are on Linux or Mac, you need to execute the command ‘jupyter-notebook‘ in the terminal. Jupyter Notebook opens in your browser and this is how it should look. Of course, the directories will be different.
Click ‘New’ button on top right of the Jupyter Notebook. A drop-down will open. Click on Python 3 to open a new Python 3 Jupyter Notebook. If you have to create a notebook in a certain directory then first navigate to that directory and then click new.
Once you create a new notebook, you should see the following.
You can change the name of the notebook by clicking the text ‘Untitled’ and then typing the desired name. You can save this notebook by using Shortcut Control + S or Command + S in case of Mac.
What after this?
Now we are done with the preparation part. Whew! That was kind of tedious. It is time to start writing the code now.
Write following code in the notebook.
The green highlighted part is called as a cell in Jupyter Notebook. When you will select a cell it will be highlighted with a green colour. You need to press Shift + Enter to execute the cell. You can type multiple lines in a cell by pressing Enter.
Now press Shift + Enter to execute the cell. You should see the following output. What you have done here is you have entered a Python command and then executed it. The output of the cell is printed below the cell and then a new cell is automatically created.
Now that you have added two numbers, how about trying other operations like multiplication, division and subtraction. Check the following table for your reference.
Now that you have carried out a single operation on two numbers how about multiple operations on multiple numbers. Like 5*2-6/(2+1)
Just remember Python follows rules of mathematics in solving such equations. First, bracket then division then multiplication then addition and then subtraction.
Comments are an important part of any programming language. In Python whatever typed after # is not executed. Also, anything that is included between “”” “”” (between three double quotes opened and closed) is also not executed.
Check following image.
When you executed the first cell there was no output. But when you executed the second cell, there was an output. Because three double quotes opened and closed is also used to write multiline string. This part will come in next part of this series.
Remember, all this is very easy only when you will type the code yourself. Reading the code is not going to help you. More so if you are a non-programmer.
Click here to see the Jupyter Notebook, I created for your reference.
You have taken the first step towards learning Python for data science. If you have any question, just shoot in the comments.