More and More

Friday, January 22, 2016

Interview with a Data Scientist Part Duex

Here is Part 2 of my talk with an old friend turned Data Scientist. You can find Part 1 here. In typical fashion he comes right out with the straight dirt. Without knowing it he got right to the point of a problem I was having. Would Data Science or Business Intelligence be the best path to take? This conversation was a turning point for me. It clarified a couple of important (vital) points and also gave me perspective on what the market and industry are like.


Note: Once again I kept the editing minimal and only changed the order of a few posts to improve readability. Names and what not were changed to protect the innocent… although RocketMan’s innocence may be up for questioning considering his association with the likes of me. 

Bolding is mine.


K: Hey man, got to thinking about this yesterday.  Please do not be offended by what I'm about to say.  I was thinking about your career pivot and would highly recommend against the data science/analyst route.  I think you should pursue the business analyst/BI route, which will emphasize more sql, tableau, SAS, very basic stats.  Coursera has some great series on the subject my wife recently took.  From a DS perspective I checked your linkedin and you just don't have the academic/hard skills background that is table stakes.  From there there is no experience in applying anything related plus age is working against you.  For every one of you there are 20 of me who are younger, trained in various applied statistical modeling methods, and have experience applying these skills at a very high level.  I'm just thinking from a practical perspective it will be very frustrating since the JHU course even combined with your intelligence (and know that i will always respect you) isn't going to necessary make up for this
Z: i appreciate your honesty. There is a reason i reach out to people that i know when i look into these things.
What course did she go through?
K: https://www.coursera.org/specializations/excel-mysql
i'd build out the sql more
Z: you have also potentially saved me a lot of heartache and headache from trying to crash through the side of a building instead of going in the front door.
K: i mean right now if you can't tell me when to apply a logistic vs linear regression model without looking it up, that's a big concern and dorks like me have had to do this for years. but you have the sales, people, and business side, which honestly will be longer lasting once the ds trend dies down. gartner already has it sliding down its hypecycle
Z: good to know, and again thank you.
K: tableau isn't the sexiest, but it's the most popular BI tool at the moment. maybe doing a course in R that uses some of the packages. datacamp i s great for that they have a few courses that i think kick ass vs JHU
and then read statistics for dummies, SQL or postgres for dummies, you'll be set
Their courses are fast, cheap, to the point
and in the r packages you would use on the job as a BI if you ever used r, which you likely won't
since most companies never run r in production
Z: they (Datacamp) look to be almost exclusively R
K: yup, a bit of python, but their r courses are the  most directly useful i've found
they skip the fluff of the base functions you'll never use or need to know
you would need to know and use them if that's all you worked in developing models, but chances are you won't do that
plus, a good DS or BA will always start with the most basic tool first. if my dataset isn't over 1mil rows, i use excel before anything
or i'll do everything in sql, then excel. r for modeling, plotting, further exploration. sometimes an R extension such as Plotly.
Z: right on. BA/BI is much more in line with what i am interested in. actionable analysis, if you will.
K: so word of advice, don't use that term professionally
it's one of those things people say that has zero real meaning
Z: point taken.
fall over from accounting
K: oh man from govt work i have so many similar terms
'capacity building'
like, wtf does that actually mean
Z: duh... building capacity...
cuz reversing the words isn't just the equivalent of saying it louder
K: haha
another bad one is 'finding insights'
we used to have a buzzword dictionary at my old company of things to pepper proposals and reports with
"actionable research"
Z: that is hilarious
so in BA/BI still suggest scraping with python?
K: so my wife took that course you sent. i looked at her syllabi and assignments and thought that it was honestly pretty good for an applied course. mainly in that it did not add complexity she wont use in her analyst role
Z: the data warehousing?
K: oh the book 'automate the boring stuff' is great!
Z: adding it to my amazon list
K: i use Beautifulsoup to scrape
Z: ran across that the other day
K: DS is almost an annoying title at the moment
the underlying definition seems to have been diluted significantly
i mean, before 2011 the term didn't really exist. literally. it was googled under 1000 times
ha ha
the best definition i've heard is that "Data Science is statistics done on a Mac"
Z: heard a data scientist was a data analyst that lives in San Fran
K: a statistics student that can use the adobe creative suite
Z: nice
stats + pretty graphics
K: they gave me the title at work and i took it bc median salary is higher for that than 'data analyst'
LOL
i proposed 'dungeonmaster of data'
Z: data overlord?
K: oracle of awesome
Z: nice
K: we should've just been rocket scientists like RocketMan
always going to need rockets
Z: speaking of which he says hello
K: oh i was gonna ask how he's doing
quick indeed search says BI pays well
not as well as DS, but still pretty good if you get the right gig
i really lucked out
they thought i was much better than i was and i did a lot of learning on the job
Z: finding a lot in the 70-80 range
K: sounds about right for entry level analysts
add PM to that and get a bump
Z: quick requirements scan makes it look like i need to learn to use SAP products
K: eh, yes and no
Z: expand pls
K: SAP BI tools are &@^#$%*!
oracle and MS the hitters
Hana is expensive, about to be outdated. the DBs use basic nosql or sql interfaces
the thing is you need to know the querying language. from there it's easy
so like we use oracle, which uses plsql for most stuff
aka sql with like 3 extra commands
Z: ha ha
MS is T-SQL i believe
K: yah it's all essentially the same
SAP has some map reduce stuff, but it's the same as any other mapreduce language
Z: which version of oracle are you guys using?
K: couldnt even tell you.
i usually use a local DB on my computer
and then do major operations through a proprietary tool that uses GPU mapping to vectorize functions
if you wanna make sick money
go learn cobol and work for a bank
it's the oldest, most boring language. but most needed right now
Z: ha ha ha i know some people that do that.
K: dude, they're all retiring!
and young guns don't learn that shit
my buddy got a job for B of A as a cobol developer for $195k starting. granted he is an awesome CS masters student, but still
FAA, SEC, IRS, basically everything in government and finance is all still cobol main frames. healthcare too
Z: ha ha nice. i can't remember the language but i once had to unearth a guy who could write in the previous embedded language and confirm against C++
jovial
new code quit being developed in the late 60s early 70s
helicopter control systems
guy made 80+ an hour
K: that's it?
that's pretty low for something that proprietary
Z: military contract
K: ah gotcha
Z: rural tx
K: makes sense
i miss billing hourly sometimes just to get an idea of what my time was worth
yah govt work is the worst. never fun, always pays atrocious wages

Thursday, January 21, 2016

Interview with a Data Scientist Part 1

This “interview” sprung out of a random chat with an old friend. I hadn't spoken with him in a few years and saw his name in my address book. Figured I would drop him a quick message and catch up a bit. As all good conversations between old friends should go, we went far and wide on what was going on in life, family, hobbies, etc. I had no idea what he has doing professionally when I sent the message so it was a great treat for me to get to talk to someone in the field of Data Science that I knew BEFORE he started off. It makes it a conversation between friends and NOT just grilling someone.

Note: I have cleaned it up a bit, but not much. punctuation and grammar are what they are. A couple of comments have been rearranged to help with continuity, some haven’t. Names and specific references have been changed since people either weren’t involved to defend their good name, etc.


Z: what are you doing for your company?


K: im a senior data scientist for university relations. i find patterns in donor giving behavior
Z: nice. I actually just started a data science course.


K: which one/where?
Z: coursera/Johns Hopkins


K: ahh yeah i've seen that one
it's a very en vogue title right now
Z: yeah.
it fits my "everything is interesting/ why the hell did that just happen," thought process.
plus i get to ask insane questions of data.


K: which classes in specific? yah that's the allure, but what people most often miss is that you need a heavily quantitative background for any of those questions to have any meaning
Z: goal is to bend it toward business intelligence before too long. get fully out of sales


K: word, sounds like a good plan
Z: want something on the analytic/project side that still lets me use my business ed/exp
and a better paycheck is a bonus


K: so do you know any advanced statistics? econometrics? that is key. otherwise it will be tough to get a foothold. i hire and work with a lot of DS and there is a big gap between those that have taken online stuff to try to break into it and those who get in naturally
yah i took the course for a bit
Z: used to be better at stats. it was the only math that ever made sense.
have a better than most economics background, but never put the 2 together.
Admittedly i am intentionally diving in to make some big changes in life.
love economics, was good at stats once, and am a computer geek at heart.
as well as a scientists curious mind.


K: well, good luck with it all. if i were you i would find a sample project not in the course. data you’ve worked with before, and dive in on your own. it'll be much more powerful and impactful and work in tangent to the coursework. the thing is, people that hire analysts don't actually care if anyone has done 1 or 100 of those online classes so you'll need a project to point to. and you can just say 'hey i did this on my own at my last job' and they will be impressed
Z: cool. i have been told that before and started looking through the publically available datasets for opps just like that. create a portfolio.
a friend that hires for positions like this said that he doesn't even look at resumes that don't have a github/portfolio attached.


K: yah you can easily start with BLS, ACS, stuff like that
eh, he's an idiot then
because most people can't share production code
and git stuff is meaningless unless you're a consultant
Z: i don't think anyone is gonna roll over on a $500 online series. i want the programming and fundamentals so i can start digging.
hmmm... good to know


K: yah. i learned r through having projects/google. i found the JHU course teaches you tons you never use.
a good project is to build a basic web scraper in python and do some twitter analysis
never used based on industry they are in vs what you are in or just to esoteric?


K: it would be 1. a huge dataset, 2. give you exposure to a scripting language and 3. it's easy
so R is weird
much of the base functionality is slow as fuck
so people have written packages to do specific operations much better
Z: right on.


K: so like the apply family for transforming matrices has been replaced by Data Table and DPlyr
both of which are wrappers for SQL
Z: just did some of my first sql coding last week.  didn't feel totally foreign so that was a confidence boost.


K: so like when i use R, which is all day every day. i never ever use apply. its simply too slow
but for whatever reason jhu teaches it.
for loops are another thing r is slow at
so if i need a loop i use python
Z: python is the scripting language i am going to start with, for a number of reasons.


K: and some of the more conditional logic is a bit tough to write because R has really esoteric syntax
yah don't get in too deep with it. i'm trying to find a book to recommend real quick that walks you through all of what i'm talking about
Z: o'reilly has a web scraping with python that i saw


K: fuck. i'll have to find it later, meeting. best of luck and don't let anyone tell you there is some template for DS/analyst work. i mean, my background was in the social sciences not programming. there are lots of different types of DS but the key commonality is not what skills someone "knows", but the ability to ask the right questions before even starting with the 'analysis' portion. so demonstrate that and you'll be leaps ahead of a lot of people
right now what i see are shitloads of new programmers that have all the coding skills, but are so overeager to apply fancy methods they dont stop and think about what they're even doing and their work, while done well, usually is worthless
Z: thanks. questioning is a strong suit.


K: same here, but my biggest professional blocker was probably not pre-sorting through the questions inmy mind and questioning too much. that turns people off a bit.
as i got older i just got too lazy to do that out loud haha
Z: overloads them?


K: yes, people don't like to be questioned. they don't like to think
and often they don't have the time to think through things
so you get one shot
maybe 1 paragraph
Z: hit em up front?


K: i was always giving them pages and pages. too much
K: i wrote many 150-300pg papers in grad school. easy. the 2 page memos were the worst
Z: punch em in the mouth in leave?
K: i wish!
K: oh and if you wanna do business analysis, i'd honestly spend much more time on sql than r
most companies use sas or tableau
and sql is so versatile
if you can write good sql you often don't need many data cleansing tools. specifically window functions and the sql analysis tools (rank, lag, lead, etc)
Z: good advice. I have access to the full MS BI suite through work. Benefits of working for a for profit that has to draw people in.
K: nice! abuse it!
it's good you are taking initiative and learning
Z: either that or i start selling cars
when i started looking at new avenues i had basically 4 criteria. use my business background, same or better money, can see myself involved in 15-20 years, and flexibility to work from home occasionally.

data/analysis seems to be meeting all of those

Sunday, January 10, 2016

Learn SQL the Hard Way

Ahhh... the notorious Zed Shaw. Well, maybe not notorious, but definitely a character and a hell of a guy by all accounts. Zed has a boat load of useful information out there for all of us, especially those new to the field and/or learning a new language to do whatever with.

As it related to Data Science he has two offerings that are immediately useful and a 3rd that is a close contender. The first two are his Command Line Interface Crash Course and Learn SQL the Hard Way. Both have free versions you can complete online, but he offers a paid version, on the cheap, that puts everything together in a nice clean package. The runner up is Learn Python the Hard Way. (This link goes to the pay version, a scant $29.95, but if you are strapped for cash there is a link to the free HTML version below the testimonials.) Are you seeing a trend here?

Zed has a different take on teaching. He loves repetition and isn't afraid to say so. He also writes in much the same way, I would imagine, he speaks. To the point, without fluff, and willing to call you out on your BS.

Zed is also one of the masterminds behind Programming Motherfucker. An excellent resource not only for specific coding languages, but also computer science and coding as a whole. The term used on the site is "Language Agnostic."

Enjoy, Buy one of the excellent PMF t-shirts, and keep on keepin' on.

ZK

Friday, January 8, 2016

Letter of Intent

Someone told me i needed to start a blog to document my foray into data science. Not so much to hype up my Data Science skills,which are currently in their infancy at best, but to aggregate the resources i find and use and hopefully help others that are interested in finding out more.

I have never been a good hype man for myself, despite a professional history with more sales jobs that anything else. Despite an occasional flicker of success in that arena I have never felt comfortable in the role.

One thing that has always been there is my interest in finding out why. The interest in digging into the behaviors and numbers to discover what works. Then convey that to others who are better at the implementation of this new information. I honestly believe this is why I am a better sales manager than salesman. Go figure.

The last year or more has been spent researching different aspects of data, data science, DBA roles, business intelligence (BI), BI technology, and on and on.

My goal is to make this site useful to others in my boat. My plan is to update  and post regularly, likely weekly in summary of what has been happening in the last 7 days of class, study, work, and research. Sometimes it may be a quick note attached to an article of site that i found useful. It might be me losing my mind because of an assignment or technical issue.

In the end I hope it is useful, first and foremost, reassuring in  that it lets others know that they aren't alone drifting in the DS ocean… there are tons of us a drift just out of site of each other…

A couple laughs, some random info, and all the other things that inevitably bleed into these types of projects will present themselves along the way.

Thanks and Enjoy,

Zak

Tuesday, January 5, 2016

CRASH COURSE!!!

Last week i enrolled in and completed the Johns Hopkins Crash Course in Data Science class through Coursera.

The introduction says that it will likely take a week to complete or 2 days if you are able to set aside large blocks of time. I was able to knock it our in 3 days of lunches at work.

Overall it was a very engaging class and it is exactly what it claims to be, an crash course overview of the growing field of data science.

I did the free version, lets face it i am a) cheap and b) was just putting my toe in the water and wasnt inspired to pay $49 for a certificate of completion.

Would I recommend this to someone else? Yes, if there is even a passing interest in Data science, big data, etc. this is definitely worth the time invested.

More soon. - Z