Highest Rated Comments


UMichiganAI5 karma

There are a couple of reasons. First Python is wonderful specifically for data science - lots of great libraries for machine learning (scikit-learn), natural language processing (nltk), network analysis (networkx) and basic visualizations (matplotlib). The data analysis and cleaning ability of python is great - I (Chris) am regularly writing up pandas manipulations to clean and transform research data.

Python also is a comprehensive programming language, so if you're a software developer you've got a full toolkit including multiprocessing and cloud computing libraries and not just a specialized stats language.

But we also took a look at what exists out there for free educational data science material - there are lots of great resources in R, but I think the python world was a little underrepresented, so we figured we would share our workflows (though I think all of us use a variety of tools when solving data science problems!).

UMichiganAI3 karma

Today has been filled with insightful conversations around data science and python - thanks to all who participated this AMA through posing questions and sharing their thoughts!

If we haven’t gotten to your question yet we apologize and will try to circle back on it soon. For those interested in learning more about our work, check out our Applied Data Science with Python Specialization on Coursera: https://www.coursera.org/specializations/data-science-python

UMichiganAI3 karma

Chris here: Python is certainly one of the top data science languages, along with R. There are many other tools of course, SPSS, SAS, STATA, etc. Python is particularly nice because of the large toolkit support (nltk, networkx, scikit-learn, pandas, matplotlib) for data science workflows.

For publicity we rely on word of mouth, the coursera portal, and of course activities like a reddit ama. For supplementary material I feel that there are plenty of solid data science resources on the web we can link learners out to - kaggle is a great example, where someone might want to take this specialization then get engaged in kaggle competitions to hone their skills.

UMichiganAI2 karma

All here: We are very interested in this demographic, and talked about how to support these learners at some length in course planning. This course is more introductory, so it depends on the kind of job you are seeking, and what other background (current employment, previous academic background, etc.) you might have. For instance, if you're a programmer who is looking to shift positions away from (say) front end development to business intelligence, we hope this specialization is for you. That's of course just one example of a job seeker!

We also hope to support students who are thinking of going into graduate school, and want some solid skills to put on their application process.

And, while we don't have an omnibus capstone, instead each of the courses ends in a larger project assignment. My experience in talking with learners who had done data science MOOCs was, even if they paid for the specialization, they tended not to do the separate capstone project. So we wanted to try larger projects on a per course basis to see if this would help create a compelling portfolio for learners!

In the end, I think the best bet for a job seeker is to differentiate themselves by applying their skills to a novel project that is wholly their own!

UMichiganAI2 karma

Chris here: R is great, and a number of us use it regularly in our research work. It depends what you're doing in R as far as replacing it with python. R's got amazing stats libraries, and you can pretty much be guaranteed that when a new approach comes out, especially if it's a statistical approach, there will be an R library for it. I think that as data scientists there's a need to be a bit of a polyglot. At the University of Michigan students in the School of Information learn both R and Python depending upon the course.

I'll be teaching some data vis in the second of the five courses, looking at matplotlib and maybe seaborn and bokeh. The focus will be on charting and graphing, not really moving into 3D vis or highly interactive visualizations. Covering things like heatmaps, scatterplots, violin plots, etc. As well as some theory - tufte and cairo references will abound! This is still under development, so feel free to influence our path!