|Data Science News|
|University Data Science News|
University of Oregon professor Mas Subramanian created a new pigment of blue while looking for new materials to use in electronics. Called YInMn blue because it is made of yttrium, indium, and manganese, we should not be surprised that Crayola is hosting a contest to come up with a better name. Subramanian, true to his engineering background, has filed a patent. Subramanian will not be the first to own a hue of blue, joining artist Yves Klein who 'owns' International Klein Blue, though his rights are only recognized in France.
Ed Lazowska, professor of computer science at the University of Washington, was named Geek of the Year at the GeekWire awards. Congrats Ed!! Well-deserved. I think. I don't quite know what it means to win Geek of the Year, but it sounds (mostly) good. UW has truly been a leader among the three Moore-Sloan Data Science Environment schools.
Let's get it together when it comes to the positive results publishing bias.. Tania Bubela, a professor at the University of Alberta in Canada, found that only 45 percent of completed stem cell clinical trials were ever published in academic journals. Unsurprisingly (but so not OK), "Bubela and her colleagues found that 67.3 percent of the studies reported positive outcomes even though the trials were early-stage, safety-focused studies."
Daniel Katz from the law school at Illinois Institute of Technology used a random forest model to predict US Supreme Court case outcomes with 70 percent accuracy. This is better than what human legal scholars have been able to do, but I know some of you reading this are thinking you can do better with a different model. Right? ;)
UC-Berkeley is bridging a known training gap by offering data science short courses to professors and other instructors.
The University of Pittsburgh has a new School of Computing and Information to which Paul R. Cohen was just named dean. Cohen was previously a program manager at DARPA specializing in AI.
Daniel Larremore (Santa Fe Institute) and co-authors published a paper in Nature last week that proves community detection in networks will never be 'solved' by a universal algorithm. Hallelujah. In a nutshell, their argument is: "any algorithm that’s exceptionally good at finding communities in one type of network must be exceptionally bad at finding communities in another.."
The University of Texas system bought a piece of land in Houston for $215m to house a University of Houston data science institute. Instead, the pricey purchase seriously upset Texas lawmakers and UT heavyweights who shut the plan down. It may be resuscitated, this time with Rice, Texas Southern University, Texas A&M University, and UT joining UH. University politics can be dark and twisty, but this one seems to have much larger financial implications than most. Everything is bigger in Texas.
Bad management is rumored to be a problem for Alphabet company Verily which has just lost Thomas Insel, its lead behavioral scientist and former head of the National Institute of Mental Health. His departure was preceded by Vikram Bajaj Verily's chief scientific officer at the time and Dr. Mark Lee. STAT news reports that Verily CEO Andrew Conrad frustrates his colleagues by launching projects that aren't feasible and being imperious. Good management is absolutely key to any fledgling endeavor.
Biotia, a Cornell Tech and Weill Cornell Medicine start-up, aims to monitor microbial environments in hospitals by swabbing and genetically sequencing "their high-risk environments, monitoring hygiene, identifying pathogens and tracking antibiotic resistance". The company is ready for a Series A round of funding as soon as 2018. This is an excellent example of the current and near-term explosion of precision medicine applications coming to fruition.
At Stanford, a precision medicine application allows doctors to titrate drug dosages to individuals in real-time. In this case, the individuals are mice.
|Company Data Science News|
Paperspace is a start-up offering cheap machine-learning-as-a-service (MLaaS) in secure shell or web-based Linux instances. Students and postdocs may want to give it a try - it's easier and cheaper than AWS. While some are skeptical that machine learning practitioners are thwarted by getting set up in AWS, Caroline Sinders makes the case that machine learning desperately needs better UX. Her argument is congruent with those calling for more algorithmic transparency and explainability.
Time Warner is trying to make its three-year-old Turner subsidiary the Spotify of TV. They call themselves a "fan engagement company" and intend to fight for our attention by remastering the mobile experience - clickable CNN feeds?, allowing us to create TV playlists, then using our playlists + machine learning to predict/construct live linear feeds of content snippets drawn from across the network.
Speaking of Spotify, Wired tells the story of how a relatively unknown artist, Starley Hope, hit the Billboard top 100 chart thanks to Spotify's human + algorithmic cooperative boost.
Brendan Frey, a University of Toronto professor and founder of Deep Genomics, is pivoting to drug development. Frey will start by focusing on "drugs for Mendelian disorders, inherited diseases that result from a single genetic mutation". I can see a strong case for allowing professors to found start-ups. This is typically frowned upon because it takes professors away from teaching. I imagine many postdocs will not only receive excellent training as a result of working with Frey, but Deep Genomics may be better positioned to offer them full-time, permanent employment than academia.
Entrupy is a start-up that uses computer vision algorithms to detect counterfeit products. Here's a kicker: Yann LeCun is an investor. Cashing in after surviving the bitter years of AI winter. Nice.
Grammarly raised $110m from VCs to instantly correct grammatical errors as users type. (With the lessons we've learned from auto-correct and Microsoft's talking paperclip Clippy, I anticipate a mix of comedy, confusion, irritation, and occasional gratitude.) The company seems to have dropped a serious ad-spend on youtube (source: my personal viewing experience). As a freshmen writing professor, I wonder how much impact dropping $110m on better grammatical instruction would have. I find my students eager - even demanding - when it comes to pumping me for information about the rules of grammar and style. It is possible I have the world's best students.
Microsoft Ventures invested in start-up Agolo out of Columbia University, a company that wants to "defeat information overload". Don't you love how aggressive the language is in tech? A synonym might be: summarize. Or, if you're feeling meditative, which is so hot right now, you may prefer distill.
Facebook is launching a convolutional neural net (CNN) approach to language translation.
Nvidia is opening the Deep Learning Institute from which it aims to train 2,000 developers and data scientists in applied AI techniques. Courses cost only $30! Where academia cannot keep up with STEM educational demands, industry may step in and carry some of the training load.
|Government Data Science News|
The National Health Service in Britain has been attacked with ransomware, impacting at least 25 medical facilities, including hospitals. Doctors cannot access patient records or, presumably, critical digital diagnostic tools. (Where is the sphygmomanometer? Are nurses comfortable taking blood pressure manually?) The same attack is also impacting Spain’s Telefónica and Russia’s MegaFon phone companies. The attack has hit 11 countries so far. According to the New York Times, "the computers all appeared to be hit with the same ransomware, and similar ransom messages demanding about $300 to unlock their data" stemming from "a vulnerability that was discovered and developed by the National Security Agency". The Times covered the NSA breach last August, placing tentative blame on Russian intelligence.
An aptly timed investigative report on the impact of 2bn euro cybersecurity spending bluntly states that the funds "are mostly good for one thing: filling the coffers of the security industry.... Only rarely did a project lead to concrete, sellable technology. Most of them stopped at a prototype, a study, a report, or a wiki page." Priorities may need to be reconsidered.
Closer to home, Sam Biddle at The Intercept reports that highly sensitive documents detailing the hardware design of WindsorGreen, a program likely designed to brute-force crack encrypted data (like passwords), were left on the web, in the clear. No encryption. No password. NYU Tandon looks to be the guilt-stanky partner. The documents were hosted on servers in the Institute for Mathematics and Advanced Supercomputing, though the article stops short of assigning blame. I bet if this had happened at IBM or the NSA, the other two collaborators on the project, someone would have been fired. But this is academia; that will not happen.
Interest rates on new federal student loans will increase quite a bit this year:
The US Census Director, John Thompson, suddenly quit last week likely due to frustration with significant underfunding of the 2020 census. We will likely undercount groups including immigrants, poor people, and racial/ethnic minorities which corrupts our ability to even approach ground truth in social science research. As we've seen previously in this newsletter, if we can't rely on the Census, we may be able to approximate with Google image data or satellite data.
New York has a new database for tracking and helping homeless people. Called StreetSmart, it has been somewhat uncomfortably described as "a customer relationship management system for the homeless". (Aside: Why do we see our customer status as a more rightful entitlement to good treatment than, say, our status as citizens, residents, or workers?) The new system makes it easier to identify unsafe shelters and to provide appropriate services to homeless people who may move from one borough/database to another. The data are collected by humans, not by sensors.
London is hiring its first Chief Digital Officer
|story time in data science land|
Professors like Mariana Mazzucato who work in the UK on EU passports are so fed up with the citizenship application process they are considering personal #Brexit. A colleague of mine just left a great tenure-track job because his partner couldn't get permission to work in the US for who knows how long. Everyone I know thinks of the scientific community as a global community, but we are not immune to political boundaries or political games. This twitter thread is choice. How do you think Brexit and America's current anti-immigrant climate will impact science?
The Economist has a provocative op-ed asserting that data are now the world's most valuable resources and that current anti-monopoly policies are a terrible match for the agglomeration of knowledge-power wielded by firms like Google, Amazon, and Facebook. Spoiler: the article uses the word "googlet" to refer to a baby Google. The Economist also wrote about how much easier tacit collusion in price setting is when dynamic pricing meets perfect, instantaneous information about competitor's prices.
Seth Stephens-Davidowitz argues that social scientists should use search data, not survey data, to reveal people's true intentions. Why? Because people lie on surveys but reveal their truths in search. He expands on how this would work and what it means for social science in his new book Everybody Lies.
Because Tim Berners-Lee has a unique perspective on all things internet, it's always good to read his interviews about what the web has wrought. He talks about the organizational dynamics that allowed the idea of the internet to gestate into being: "It was all unofficial, zero budget, but Mike allowed my 20 percent time to expand to 100 percent time." Dreamy.
Steve Miller of IBM writing for KDNuggets has a comprehensive new report on data science jobs. He chastises academia for focusing on producing data scientists but ignoring, "the much larger demand for data-savvy managers (1.5 million new positions)". The report also identifies the fastest growing data science roles, the top cities, and top industries.
If you aren't reading The Pudding, you should be. This week on their elegant website they determine whether the lyrics in pop songs are becoming more repetitive using a dataset of 15,000 songs from 1958-2017.
Will AI eliminate jobs? We have heard this question and prognostications quite a bit. John Horton, Microsoft Research alum and NYU Stern professor, explains why we may need to worry about accountants being displaced more than we need to worry about truck drivers. Not entirely sure I agree with all of his reasoning, but his writing and thinking style are deep, clear, and engaging. At this point - 10:30 on a Friday night - I'm wishing AI might make part of my job redundant.
|Data Visualization of the Week|
|Twitter, Alexⓐnder Grossmann from May 12, 2017|
|Tweet of the Week|
|Twitter, Edward Snowden from May 09, 2017|
|Data Science Fundamentals summit |
Urbana, IL May 16 at NCSA. [free, registration required]
|Moore-Sloan Data Science Lunch Seminar Series |
New York, NY Wednesday, May 17, Matteo Riondato from Two Sigma Investments, 12:30 p.m. at the NYU Center for Data Science, 60 5th Avenue, 7th Floor. Lunch provided. [free]
|JupyterDay Philly |
Philadelphia, PA Thursday, May 18-19 at Bryn Mawr College. [$$]
|Precision Medicine World Conference |
Durham, NC May 24-25 at Duke University [$$$$]
|Big Data in Biomedicine Conference set for May 24-25 |
Stanford, CA May 24-25 at the Stanford University School of Medicine [$$$]
|Winner Takes All: How AI Is Learning About Us By Playing Classic Video Games and Why the Results Are So Shocking (workshop) – Tech2025 |
New York, NY Interactive workshop featuring guest instructor, Julian Togelius (Assoc. Professor, Artificial Intelligence in Games, NYU Tandon School of Engineering), on training algorithms using various types of games. May 25 at location TBA. [$$]
|Cascadia R Conference |
Portland, OR June 3 at OHSU Collaborative Life Science Building [$$]
|Time and Causality in the Sciences 2017 |
Hoboken, NJ The Causality in the Sciences conference series at Stevens Institute of Technology brings together philosophers and scientists to explore various aspects of causality. June 7-9. [$$$]
|Collective Intelligence Conference, June 15-16 |
Brooklyn, NY Collective Intelligence 2017 will emphasize research in service of the public good and projects that address societal problems. The conference will take place at New York University, Tandon School of Engineering. [$$$]
|JuliaCon 2017 - Accepted Talks & Workshops |
Berkeley, CA Tuesday, June 20. Conference runs June 21-23. [$$$]
|The 50 Years of the ACM Turing Award Celebration |
San Francisco, CA Registrations are now open for ACM's celebration of 50 years of the Turing Award. Registration is free of charge for ACM members. Space is limited. June 23 - 24. [registration required]
|Vancouver Sports Analytics Symposium and Hackathon |
Vancouver, BC, Canada July 8-9 at Simon Fraser University, Harbour Centre. [free]
|SciPy 2017 Accepted Talks and Posters |
Austin, TX July 10-16. [$$$]
|KDD 2017 |
Halifax, Nova Scotia, Canada August 13-17 [$$$$]
|Spark Bootcamp |
Atlanta, GA September 1-3. Labs will run on Databricks Community Edition, and will focus on Apache Spark functionality, not Databricks enterprise features. [$$$$]
|ASA Statistics Project Competition for Grades 7-12|
The ASA/NCTM Joint Committee on the Curriculum in Statistics and Probability and the ASA’s Education Department encourage students and their advisers to participate in its annual Project Competition (written report). Deadline is June 1.
|FT Future of Fintech Awards|
The awards recognise and reward companies able to demonstrate innovative ideas capable of creating lasting change in the financial services sector, on a global scale. Deadline for submissions is June 4.
|Face Recognition Prize Challenge (FRPC)|
From the Challenge.gov webpage, participants will be directed to register with the National Institute of Standards and Technology. Registration closes on June 15, 2017.
|The 2017 World Science Festival is now accepting applications for volunteers! |
New York, NY Organizers are looking for science pros and students in NYC to volunteer at World Science Fest, May 30-June 4.
|PyOhio 2017 Call for Proposals|
Columbus, OH PyOhio 2017, the annual Python programming conference for Ohio and the surrounding region, will take place Saturday, July 29th, and Sunday, July 30th, 2017 at The Ohio State University in Columbus, Ohio. Deadline for proposals is May 25.
|Astro Hack Week 2017|
Seattle, WA Monday-Friday, August 28-September 1, at the University of Washington. Deadline to apply is May 31.
|Macfang - Complexity Foundations and Applications of Network Geometry|
Barcelona, Spain The Macfang workshop focuses on the role of space in complex networks. We bring exciting speakers from around the world to foster a leading collaborative view on the emergent field of Network Geometry. Deadline for abstract submissions is June 26.
|Big Data on Human and Social Sciences – History, Issues and Challenges|
Lisbon, Portugal The Instituto de História Contemporânea / Institute for Contemporary History and the History Lab at Columbia University will be hosting an international conference to examine the challenges and impact of ‘Big Data’ in the human and social sciences. Conference is November 6-7. Deadline for paper submissions is July 31.
|Ford-Mozilla Open Web Fellowships|
The Open Web Fellows program – a collaboration between Ford Foundation and Mozilla – is an international leadership initiative that brings together technology talent and civil society organizations to advance and protect the open Web. Deadline for applications is May 21.
|AMIA Informatics Workforce Survey|
AMIA invites all informatics professionals and students to participate in an important and unique opportunity to help shape the future of informatics. Survey closes May 24.
|Digital Humanities Advancement Grants|
Digital Humanities Advancement Grants (DHAG) support digital projects throughout their lifecycles, from early start-up phases through implementation and long-term sustainability. Deadline is June 6.
|NSF Big Data Hub and Spoke program invites spoke applications|
Guidelines are available from NSF with details for Northeast, Midwest, West, and South applications. See what has been funded. Letters of Collaboration are required; deadline 19 June 2017.
|DARPA Broad Agency Announcement- Lifelong Learning Machines (L2M)|
DARPA just released a Broad Agency Announcement on Lifelong Learning Machines (L2M) with a June 21, 2017, response date.
|NYU Center for Data Science News|
|Bloomberg data scientists bring real-world experience to New York City universities|
|Tech at Bloomberg from May 11, 2017|
"Institutions of higher learning often tap experts with real-world experience for part-time teaching positions. This spring, two Bloomberg data scientists in New York City are serving as professors, leading-graduate level courses in machine learning and data science. From the Office of the CTO, David Rosenberg is teaching “Machine Learning & Computational Statistics” at New York University’s Center for Data Science, while Gary Kazantsev, head of the Machine Learning Group in Bloomberg’s Engineering organization, is co-teaching a class at Cornell Tech titled “Data Science in the Wild.”
|Tools & Resources|
|Releasing the World’s Largest Street-level Imagery Dataset for Teaching Machines to See|
|Mapillary from May 05, 2017|
"We present the Mapillary Vistas Dataset—the world’s largest and most diverse publicly available, pixel-accurately and instance-specifically annotated street-level imagery dataset for empowering autonomous mobility and transport at the global scale."
|Machine Learning Pipelines for R |
|Alex Ioannides, When Localhost Isn't Enough blog from May 08, 2017|
"Building machine learning and statistical models often requires pre- and post-transformation of the input and/or response variables, prior to training (or fitting) the models. For example, a model may require training on the logarithm of the response and input variables. As a consequence, fitting and then generating predictions from these models requires repeated application of transformation and inverse-transformation functions – to go from the domain of the original input variables to the domain of the original output variables (via the model). This is usually quite a laborious and repetitive process that leads to messy code and notebooks."
"The pipeliner package aims to provide an elegant solution to these issues."
|Kubernetes clusters for the hobbyist.|
|GitHub - hobby-kube from April 18, 2017|
This guide answers the question of how to setup and operate a fully functional, secure Kubernetes cluster on a cloud provider such as DigitalOcean or Scaleway. It explains how to overcome the lack of external ingress controllers, fully isolated secure private networking and persistent distributed block storage.
Be aware, that the following sections might be opinionated. Kubernetes is an evolving, fast paced environment, which means this guide will probably be outdated at times, depending on the author's spare time and individual contributions. Due to this fact contributions are highly appreciated.
|FMA: A Dataset For Music Analysis|
|Michaël Defferrard from April 28, 2017|
"The dataset is a dump of the Free Music Archive, an interactive library of high-quality, legal audio downloads."
|Voyager 2: Augmenting Visual Analysis with Partial View Specifications|
|Kanit Wongsuphasawat, Zening Qu, Dominik Moritz, Riley Chang, Felix Ouk, Anushka Anand, Jock Mackinlay, Bill Howe, Jeffrey Heer from May 08, 2017|
"Visual data analysis involves both open-ended and focused exploration. Manual chart specification tools support question answering, but are often tedious for early-stage exploration where systematic data coverage is needed. Visualization recommenders can encourage broad coverage, but irrelevant suggestions may distract users once they commit to specific questions. We present Voyager 2, a mixed-initiative system that blends manual and automated chart specification to help analysts engage in both open-ended exploration and targeted question answering."
|Tenured and tenure track faculty positions|
|Assistant/Associate professorships in Data Science|
IT University of Copenhagen; Copenhagen, Denmark
|Professorship in Cancer Genomics|
Ecole Polytechnique Fédérale de Lausanne; Lausanne, Switzerland
|George J. Klir Endowed Professor, Systems Science|
Binghampton University; Binghampton, NY
|Full-time, non-tenured academic positions|
University of Washington, School of Law, Tech Policy Lab; Seattle, WA
|Data Visualization Engineer |
University of Colorado, School of Medicine; Aurora, CO
University College London, Department of Security and Crime Science; London, England
|Researcher in Digital Geography|
University of Oxford, Oxford Internet Institute; Oxford, England
|Research Scientist - Applied Statistics/Biometrics|
CSIRO Data61; Canberra, Australia
|Postdoctoral Fellowships - Biomedicine and Bioinformatics - Job Opportunities - University of Cambridge|
University of Cambridge; Cambridge, England
|Postdoctoral researcher in Natural Language Understanding|
University of Amsterdam; Amsterdam, The Netherlands
|Research Fellow - Brain and Mental Health Laboratory|
Monash University, Brain and Mental Health Laboratory; Melbourne, Australia
|Full-time positions outside academia|
|Digital Security Coordinator|
Tactical Technology Collective; Berlin, Germany
|Director of Engineering|
Peritus; Palo Alto, CA
Flatiron School; New York, NY
The Information; San Francisco, CA
|Research Scientist, Core Data Science NYC|
Facebook; New York, NY
|National Climate Assessment Program Coordinator|
ICF International; Fairfax, VA
| SDE II, Decision Services team|
Microsoft; New York, NY
|Remote engineering and design roles|
Kolide; Boulder, CO
|Head of Research|
Spotify; New York, NY
|Senior Full Stack Developer|
Lilt; Munich, Germany
|Visual Information and Production Specialist|
Congressional Budget Office; Washington, DC
|Economist- DEU |
Federal Trade Commission; Washington, DC
|Senior User Experience Researcher|
Bloomberg, Professional Financial Products group; New York, NY
|Internships and other temporary positions|
|MPEDS Project -- Graduate Research Assistants|
University of Toronto; Toronto, ON, Canada
|Software Engineer Internship|
Machine Intelligence Research Institute; Berkeley, CA
|Cyclotron Road Applied Research Fellows |
Oak Ridge National Laboratory and Lawrence Berkeley Laboratory; Berkeley, CA