More and More

Thursday, January 21, 2016

Interview with a Data Scientist Part 1

This “interview” sprung out of a random chat with an old friend. I hadn't spoken with him in a few years and saw his name in my address book. Figured I would drop him a quick message and catch up a bit. As all good conversations between old friends should go, we went far and wide on what was going on in life, family, hobbies, etc. I had no idea what he has doing professionally when I sent the message so it was a great treat for me to get to talk to someone in the field of Data Science that I knew BEFORE he started off. It makes it a conversation between friends and NOT just grilling someone.

Note: I have cleaned it up a bit, but not much. punctuation and grammar are what they are. A couple of comments have been rearranged to help with continuity, some haven’t. Names and specific references have been changed since people either weren’t involved to defend their good name, etc.


Z: what are you doing for your company?


K: im a senior data scientist for university relations. i find patterns in donor giving behavior
Z: nice. I actually just started a data science course.


K: which one/where?
Z: coursera/Johns Hopkins


K: ahh yeah i've seen that one
it's a very en vogue title right now
Z: yeah.
it fits my "everything is interesting/ why the hell did that just happen," thought process.
plus i get to ask insane questions of data.


K: which classes in specific? yah that's the allure, but what people most often miss is that you need a heavily quantitative background for any of those questions to have any meaning
Z: goal is to bend it toward business intelligence before too long. get fully out of sales


K: word, sounds like a good plan
Z: want something on the analytic/project side that still lets me use my business ed/exp
and a better paycheck is a bonus


K: so do you know any advanced statistics? econometrics? that is key. otherwise it will be tough to get a foothold. i hire and work with a lot of DS and there is a big gap between those that have taken online stuff to try to break into it and those who get in naturally
yah i took the course for a bit
Z: used to be better at stats. it was the only math that ever made sense.
have a better than most economics background, but never put the 2 together.
Admittedly i am intentionally diving in to make some big changes in life.
love economics, was good at stats once, and am a computer geek at heart.
as well as a scientists curious mind.


K: well, good luck with it all. if i were you i would find a sample project not in the course. data you’ve worked with before, and dive in on your own. it'll be much more powerful and impactful and work in tangent to the coursework. the thing is, people that hire analysts don't actually care if anyone has done 1 or 100 of those online classes so you'll need a project to point to. and you can just say 'hey i did this on my own at my last job' and they will be impressed
Z: cool. i have been told that before and started looking through the publically available datasets for opps just like that. create a portfolio.
a friend that hires for positions like this said that he doesn't even look at resumes that don't have a github/portfolio attached.


K: yah you can easily start with BLS, ACS, stuff like that
eh, he's an idiot then
because most people can't share production code
and git stuff is meaningless unless you're a consultant
Z: i don't think anyone is gonna roll over on a $500 online series. i want the programming and fundamentals so i can start digging.
hmmm... good to know


K: yah. i learned r through having projects/google. i found the JHU course teaches you tons you never use.
a good project is to build a basic web scraper in python and do some twitter analysis
never used based on industry they are in vs what you are in or just to esoteric?


K: it would be 1. a huge dataset, 2. give you exposure to a scripting language and 3. it's easy
so R is weird
much of the base functionality is slow as fuck
so people have written packages to do specific operations much better
Z: right on.


K: so like the apply family for transforming matrices has been replaced by Data Table and DPlyr
both of which are wrappers for SQL
Z: just did some of my first sql coding last week.  didn't feel totally foreign so that was a confidence boost.


K: so like when i use R, which is all day every day. i never ever use apply. its simply too slow
but for whatever reason jhu teaches it.
for loops are another thing r is slow at
so if i need a loop i use python
Z: python is the scripting language i am going to start with, for a number of reasons.


K: and some of the more conditional logic is a bit tough to write because R has really esoteric syntax
yah don't get in too deep with it. i'm trying to find a book to recommend real quick that walks you through all of what i'm talking about
Z: o'reilly has a web scraping with python that i saw


K: fuck. i'll have to find it later, meeting. best of luck and don't let anyone tell you there is some template for DS/analyst work. i mean, my background was in the social sciences not programming. there are lots of different types of DS but the key commonality is not what skills someone "knows", but the ability to ask the right questions before even starting with the 'analysis' portion. so demonstrate that and you'll be leaps ahead of a lot of people
right now what i see are shitloads of new programmers that have all the coding skills, but are so overeager to apply fancy methods they dont stop and think about what they're even doing and their work, while done well, usually is worthless
Z: thanks. questioning is a strong suit.


K: same here, but my biggest professional blocker was probably not pre-sorting through the questions inmy mind and questioning too much. that turns people off a bit.
as i got older i just got too lazy to do that out loud haha
Z: overloads them?


K: yes, people don't like to be questioned. they don't like to think
and often they don't have the time to think through things
so you get one shot
maybe 1 paragraph
Z: hit em up front?


K: i was always giving them pages and pages. too much
K: i wrote many 150-300pg papers in grad school. easy. the 2 page memos were the worst
Z: punch em in the mouth in leave?
K: i wish!
K: oh and if you wanna do business analysis, i'd honestly spend much more time on sql than r
most companies use sas or tableau
and sql is so versatile
if you can write good sql you often don't need many data cleansing tools. specifically window functions and the sql analysis tools (rank, lag, lead, etc)
Z: good advice. I have access to the full MS BI suite through work. Benefits of working for a for profit that has to draw people in.
K: nice! abuse it!
it's good you are taking initiative and learning
Z: either that or i start selling cars
when i started looking at new avenues i had basically 4 criteria. use my business background, same or better money, can see myself involved in 15-20 years, and flexibility to work from home occasionally.

data/analysis seems to be meeting all of those

No comments:

Post a Comment