Intro

TRANSCRIPT

Are you working on a data science side project -- but you don't know how to get the actual raw data for it?

You are not alone, this is a very typical problem, but don't worry because I have a solution (well, actually five different solutions) to fix this problem.

And I'll show them to you in my online course called the Data Source course.

Hi, I'm Tomi Mester from data36.com, and I see many aspiring and junior data scientists starting to work on hobby projects. That's awesome! We all know why these projects are awesome:

  • they are great for practicing your Python and SQL skills,
  • they are an excellent way to boost your CV and stand out when it comes to job hunting,
  • and they are a lot of fun!

But as I just learned, for beginner data scientists, there is a huge roadblock in these projects right at the very first step. Namely, they simply just don't know how they could get access to datasets they could work with.

If you are watching this video, probably, this is an issue for you, as well…

Well, not anymore because I've created the Data Source course to solve this very problem.

In the Data Source course -- well, as the name suggests -- I'll give you access to a lot of different data sources.

I put them into five modules:

1) The simplest solution. I'll just go ahead and give you immediate access to datasets from a few of my real life projects. These datasets are unique and can't be found anywhere else on the internet. And more importantly, they are from real life -- so you'll see all kinds of exciting things in them that you usually can't see in pre-prepared datasets from other online courses. In this module, I published three datasets -- and I might expand this in the future:

  1. An educational online game's usage data -- including the usage data of more than 100,000 rounds generated by real players.
  2. A blueberry plant's growth data -- including moisture levels, light levels and photos of the plants.
  3. A slice of the Data36 blog's traffic data -- the log of the article reads on all SQL-related articles in a five-month period -- which means 130,000+ rows, 1,000,000+ data points to be analyzed.


2) In the second module, I'll show you how APIs work. With an API, you can query real-life data from different online applications. In this course specifically, I'll show you:

  1. how the Openweather API works, so you'll be able to get various weather data from different locations,
  2. how the Twitter API works, so you'll be able to download and analyze tweets,
  3. how the Yahoo Finance API works, so you'll be able to get stock prices and
  4. how the Coinbase API works, so you'll be able to query and analyze the prices of crypto currencies


3) I'll also give you access to a few randomly generated datasets. I created these with Python-based random generator scripts that I built specifically for this course. This includes a simpler dataset called "dogs vs. cats." And a dataset of a simulated online e-commerce shop.

As an extra, I'll also give you access to the random-generator Python scripts themselves, so you will be able to modify and re-use these random generators to create as much raw data as you just need -- also to see my Python code and based on that figure out how you can build similar things for yourself.


4) I'll show you a web scraping example. You can already find pretty detailed step by step web scraping tutorials on my blog, data36.com. But in the course, I'll show you one more example, well, in fact, the most popular web scraping example: scraping wikipedia. If you learn how to get access to wikipedia pages with Python, you can get access to the raw data of over 6 million articles.


5) And if these four are not enough, in the last module, I'll also give you an exhaustive list of open datasets, so you can go ahead and browse for more raw data from all around the internet.


So, I guess you get the point. If getting raw data was a problem before, after this course your only issue will be having access to too much data.

Let me just highlight an important thing that you might suspect already.

By taking this course, you won't just get the fish, you'll learn fishing, too.

These five modules -- these five different ways of getting access to datasets -- are great starting points.

But I really hope that after finishing this course:

  • you won't stop at those APIs that I show you,
  • you won't stop at those datasets that I generated for you,
  • and you won't stop at those few web pages that I scrape in the course...

...because you'll understand how to apply this knowledge to other data sources -- and build, query and get various types of raw data by yourself.

Okay, I hope you are excited because I'm excited!

Now, it's time to dig deeper!