So… we needed a transactional email system. You happen to notice that your coworker has a jazz album on Spotify pulled up on her desktop screen. So, we developed the features Schema-field consumption statistics, Queries, and Tables commonly joined to address this last mile of discovery. First, we focused on the search ranking algorithm. engineers, data-savvy product managers, etc.). A data scientist with high-intent has a specific set of goals and can likely articulate exactly what they’re looking for. At Spotify, we believe strongly in data-informed decision making. is the most feature-rich Spotify analytics tool, with this site, you can track your … In addition to basic metadata about the schema fields, we included consumption statistics at the schema field level. The typical data scientist at Spotify works with ~25-30 different datasets in a month. While this isn’t the most widely used feature, we’ve seen that it is consistently used by 15% of users who visit a dataset page. It was really nice to see how his taste of … We found there were a few issues with this approach. Spotify Audio Features. So, we introduced a feature in Lexikon that allows you to search for people working in the data and insights space related to a given keyword (i.e. So, we built a feature on a BigQuery table page that allows the user to see tables that are most commonly joined with the given dataset. David Green: In terms of an example, have you got an example of a project where you've used people data or insights from analytics, to help either solve a business challenge at Spotify or maybe help to improve employee experience, or maybe both? an overview of the most used schema fields in the table, and. at Spotify, resulting in more research and insights being produced across the company. We’ve learned a lot since we first launched this product. Engineers can easily add data to our analytics pipeline by adding a new message to our log parser and simply logging information to syslog using the correct format. You want to hear more and learn about the artist. So all this sounds… complicated. With the help of a few other engineers, we built a fairly simple system that had the ability to deliver a lot of emails and also provided a way for people to create new email templates and A/B test different versions of an email template. 2.12K followers 4.4K … This data is very much still in use today. This gives users the opportunity to see a variety of up-to-date queries that use the dataset, and the ability to search for specific queries on the dataset (e.g. Hey Guys, Yesterday a friend told me, that he got a pretty long email with his personal stats for 2016, including most heard songs (with numbers) and genres. Then, Spotify also offers a data tool called Spotify Analytics, designed specifically for labels that want to track performance of all their artists on Spotify, providing a functionality to Spotify for Artists, but … More than half of them are free, … Exploring the Spotify API with R: A tutorial for beginners, by a beginner, Mia Smith. To enable Spotifiers to make faster, smarter decisions, we’ve developed a suite of internal products to accelerate the production and consumption of insights. You just listened to a track by a new artist on your Discover Weekly and you’re hooked. Make data the most important asset you have because it is the only reliable decision maker that can scale your company. In addition to viewing your podcast analytics in Anchor, you can now also access your podcast's stats directly on Spotify. You’re walking down the street and hear a passing car blasting a great song you haven’t heard in a while. She has become your new genre guide. With Spotify’s option to export your personal data, and Google’s free, easy-to-use tool to visualize data called Google Data Studio, we’re going to show you just how to do that. The typical data scientist at Spotify works with ~25-30 different datasets in a month. Whether we’re considering a big shift in our product strategy or we’re making a relatively quick decision about which track to add to one of our editorially-programmed playlists, data provides a foundation for sound decision making. datasets)— as well as discover knowledge generated through past research and analysis. You can query the data, create map/reduce jobs using Hive, and even create mini data pipelines if that’s the kind of thing you’re into. As we know Spotify … to Lexikon to better represent the landscape of insights production. find a dataset related to a particular topic. So, we abandoned the curated example query and instead allow users to search through all recent queries made on the given dataset. The products that you’ll be responsible for, drive much of the reporting and analysis … Following the release of the first version of Lexikon, we found that data scientists continued to talk with each other about datasets in Slack. You can’t get the song out of your head and need to listen to it immediately. So, you open up Spotify, browse some of the mood playlists, and put on the Mood Booster playlist. Sentiment analysis of musical taste: a cross-European comparison, Paul Elvers. The homepage provides users with a number of potentially relevant, algorithmically generated suggestions for datasets including: While we did experiment with more advanced methods for serving recommendations, including using natural language processing and topic modeling on the dataset metadata to provide content-based recommendations, we determined through user feedback that relatively simple heuristics leveraging data consumption statistics worked quite well. Since launching the Lexikon Slack Bot, we’ve seen a sustained 25% increase in the number of Lexikon links shared on Slack per week. When it comes to people data we have collected all the relevant components in one team, to make sure all sources and analysis … Andrew Maher is a Product Manager for Spotify’s Insights Platform Product Area. Without big data, Spotify would not have turned out the way it did and with a growing user base only more data will be generated in the future. Shout out to our current team (Ambrish Misra, Bastian Kuberek, Beverly Mah, David Lau, Erik Fox, and Nithya Muralidharan) and others who have contributed to Lexikon (Adam Bly, Aliza Aufrichtig, Colleen McClowry, David Riordan, Edward Lee, Luca Masud, Mark Koh, Molly Simon, Mindy Yuan, Niko Stahl, and Tianyu Wu). So, we built a Lexikon Slack Bot to improve discussions about datasets. In this blog post, we want to share the story of how we iterated on Lexikon to better support data discovery. The Audio Analysis … Matching data is compressed and periodically synced to HDFS. In early 2017, we released Lexikon, a library for data and insights, as the solution to this problem. My experience at Spotify is a perfect example of how simple this is and shows how any engineer can make a meaningful impact. At this time, we also drastically increased our hiring of insights specialists (data scientists, analysts, user researchers, etc.) Within a few weeks we knew which email templates worked best and, more importantly, we could see the impact these email campaigns had on our users. Our belief was that by making these types of entities more explorable, we would open up new pathways for data discovery. In addition to using learnings from user surveys, feedback sessions, and exploratory analysis to drive product development, we also conducted research on knowledge management theory to better understand how we might adjust our approach (recommended reading: Knowledge Management in Organizations: a critical introduction by Hislop, Bosua, and Helms). If you have yet to set up your Spotify … Data Warehouse is a more complex system that allows you to access our data-set directly. Since launching these new entity pages, we’ve seen that they’ve proven to be a critical pathway for discovery, with 44% of Lexikon’s monthly active users visiting these types of pages. Once you’ve determined that you’ve found the right dataset, it can be quite daunting to try to understand all of the available fields and determine which ones are actually relevant. … The only reason that’s possible is because Spotify now knows what to create—thanks to data. viewing a dashboard). Similar to artist discovery, one of the most critical steps in data discovery is the final step—starting to use the dataset you’ve discovered. Exploratory Data Analysis is often the most essential step of any Data Science project as it provides a great deal of insight towards building further analytics. Katarina Berg: Yeah.For instance, there're a couple of things that we see with the data… When a user shares a link to a dataset in Lexikon, the Slack bot provides a brief summary of the dataset including: Not only does this provide useful information to users in the moment, but it has also helped raise awareness and increase the adoption of Lexikon. We do our best to base every decision, programmatic and managerial, on data and this extends into the culture. Using data about human behavior, relationships, and traits as the basis for making business decisions. schema field, BigQuery project, person, team, etc.) The hypothesis we wanted to test was that sending these emails would have a positive impact on user engagement and help more users to come back to using the app more often. First, we ran into challenges encouraging data producers to share example queries for all datasets. And I assure you, to build a pipeline and infrastructure like we have, it is. recommendations for datasets you haven’t used, but might find useful. For example, as a data scientist with high-intent, I may want to: To better serve the use case of high-intent data discovery, we iterated on the search experience. You had some broad goal to lift your mood and you didn’t have extremely strict requirements on what you wanted to listen to. How fantastic is that? So, we adjusted our search algorithm to weight search results more heavily based on popularity. Since making these improvements to the data discovery experience in Lexikon we see that adoption of Lexikon amongst data scientists has increased from 75% to 95%, putting it in the top 5 tools used by data scientists. started migrating to the Google Cloud Platform, Knowledge Management in Organizations: a critical introduction. Get more. Listen to The Power of Data on Spotify. We were able to see if an email had any effect on your listening habits, your account status and so on. Ek was sharing the detail to highlight the success of Spotify for Artists, the company’s analytics dashboard for musicians, which provides information such as playlist inclusion, streams by … “experts”). For instance, we have dashboards that show us user growth in particular regions, or user engagement, or even the number of emails we deliver. This will give you even more valuable insights into your episode performance, demographics, and more. Sounds robotic, but humans cannot be trusted so it’s cool. popular datasets used widely across the company, datasets used widely by the teams to which you belong, and. You’ve just had a low-intent discovery experience! Subscribe and listen to hear insights from business and industry leaders who share a passion for the power of data & analytics. We are a company full of ambitious, highly intelligent, and highly opinionated people and yet as often as possible decisions are made using data. Rather than discourage this discussion, we felt like we could help improve the person-to-person knowledge exchange by providing supplemental information. Since launching this feature, we’ve seen that 25% of users who visit a dataset page use the queries feature. In an effort to make its mountains of data available to musicians and their managers, Spotify just launched the Spotify for Artists app that provides mobile access to analytics—everything … One of these products is Lexikon, a library of data and insights that help employees find and understand the data and knowledge generated by members of our insights community. So, you go to the artist page on Spotify where you can check out the most popular tracks across different albums, read an artist bio, check out playlists where people tend to discover the artist, and explore similar artists. find the top datasets that a team has used because I’m collaborating on a new project with them. Pretty much everything. Imagine you’re starting to explore the genre of jazz. By understanding the user’s intent, enabling knowledge exchange through people, and by helping people get started with a dataset they’ve discovered, we’ve been able to significantly improve the data discovery experience for data scientists at Spotify. Blue Christmas: A data-driven search for the … We see our different data … Listening is everything - Spotify Don’t have enough data? Newsletter emailaddress. At my previous job, I developed software for Ad Agencies in the Digital Asset Management space, so you can say I was relatively new to “Big Data” as it were. Typically data is available in our Data Warehouse and Dashboards within 24 hours, but in some cases data is available within a few hours or even instantly through tools like Storm. The Audio Analysis endpoint provides low-level audio analysis for all of the tracks in the Spotify catalog. Once this data made its way into HDFS, we had all the data we needed to determine the best performing email template for a campaign and we could track the effect a single email had on a user’s experience. We’ve also seen a significant increase in engagement with the average number of sessions per MAU increasing from ~3 to ~9 since our initial launch. The insights community at Spotify was quite excited to have this new tool and it quickly became one of the most widely used tools amongst data scientists, with ~75% of data scientists using it regularly, and ~550 monthly active users. Dataflow, for real-time and historical data analysis… Internally, everyone (not just engineers) has access to three tools: Dashboards, Data Warehouse, and Luigi. We could clearly see that these emails were having a positive effect on user engagement. Most data is user-centric and allows us to provide music recommendations, choose the next song you hear on radio and many other things. For example, as a data scientist, I may want to: In order to satisfy the needs of low-intent data discovery, we revamped the homepage of Lexikon to serve personalized dataset recommendations to users. Following this change, in user feedback sessions, data scientists reported that the search results not only seemed more relevant, but they were also more confident in the datasets they discovered because they were able to see the dataset they found was used widely by others across the company. Data analytics, or the science of learning from raw sets of information, isn’t just framing the works of art … Data scientists are often curious to see how a dataset is actually used in practice. At the heart of Spotify lives a massive and growing data-set. We believed that the crux of the problem was that we lacked a centralized catalog of these data and insights resources. Our team decided to focus on this specific issue by iterating on Lexikon, with the goal to improve the data discovery experience for data scientists and ultimately accelerate insights production. We learned through data analysis that although we have tens of thousands of datasets on BigQuery, the majority of consumption occurred on a relatively small share of top datasets. This was especially true for new employees who hadn’t yet built personal connections with members of the insights community. In this role, you will help drive the roadmap and development of Spotify’s Ads ecosystem data and analytics products. find datasets that I might not be using, but I should know about. Dashboards provides an interface similar to Google Analytics and allows users to create their own custom screens containing data they are interested in from our pipeline. However, in reality, while the first iteration of Lexikon reduced the need for person-to-person knowledge exchange in discovery contexts, there were still instances in which people found it useful to connect with others to find the right data. You’ve just had a high-intent discovery! This is how we collect people data and put it to work At Spotify, we take data very seriously and we try to make every decision data-informed. An example of an entirely data-driven decision would be our choice of a music recommendation algorithm that powers Spotify Radio. I also participated in a hackathon where I developed a Spotify App code-named Genderify that tapped into our massive data-set to determine exactly how “manly” a playlist is. In the case of Lexikon, we initially believed that if data producers did a great job describing their datasets there would be little-to-no need for person-to-person knowledge exchange. Analyzing Spotify Dataset. Second, of the example queries that were submitted, they often became outdated quickly given the ever-changing landscape of data. find a dataset that contains a specific schema field. For more complex operations, we have Luigi at our disposal, governing a zoo of Python, Pig and other animals which can be made to talk to any storage systems, run machine learning algorithms and even provide daily reports. Although Spotify approaches this process from a variety of angles, the overarching goal is to provide a music-listening experience that is unique to each user, and that will inspire them to continue listening and discovering new music that they will be engaged with we… Data scientists in a high-intent mode of discovery were often looking for one of these top used datasets that met their needs. Within the context of data discovery, a data scientist with low-intent has a broad set of goals and might not be able to identify exactly what it is they’re looking for. Analytics at Spotify May 13, 2013 Published by Jason Palmer At the heart of Spotify lives a massive and growing data-set. Datasets often contain dozens or even hundreds of schema fields. Python is beautifully complemented by Pandas when it comes to data analysis. Once you’ve mastered Spotify’s analytics tool, with the power of data science, our tools can take your streaming analytics game to the next level by expanding your scope to include market-level data. After working at Spotify for only a few months, I was talking about term weighting and signing up for internal courses on the R programming language. Most data is user-centric and allows us to provide music … - Spotify Library to get access to Spotify platform music data - Seaborn and matplotlib for data visualization - Pandas and numpy for data analysis - Sklearn to build the Machine Learning model. An incredible amount of data is created every second of every day with huge potential value for businesses around the world. 41.9K followers. The research and learnings from Spotify’ Insights community help make Spotify the best it can be. Explore our Marketing Campaign Planning Toolkit Campaign of the Week: Spotify use their data analytics in a risky but elegant marketing campaign When it comes to data analytics and … Spotify’s technology leaders point to the particular importance of BigQuery, the Google Cloud data analysis tool, as well as Pub/Sub, for faster software application development. This feature gives Lexikon users a way to sort the list of available fields by usage to easily find the ones that are likely to be the most relevant. However, months after the initial launch, we surveyed the insights community and learned that data scientists still reported data discovery as a major pain point, reporting significant time spent on finding the right dataset. So the conclusion is to rely on data whenever possible. Read writing about Data Analytics in Spotify Insights. For example, a data scientist might be looking for the best dataset to use that contains a track’s URI track_uri. Spotify is all the music you’ll ever need. Compare to last visit See how your personal ranking changes over … More weight is given to actions related to insights production (e.g. It’s rare that a single dataset will contain all of the information for which a data scientist is looking. To kick things off, we spent time conducting user research to learn more about our users, their needs, and their specific pain points regarding data discovery. I took this project on as an opportunity to learn Python. We will share your personal data for activities such as statistical analysis and academic study but only in a pseudonymised format. You strike up a conversation and learn that she is a jazz aficionado. But to make use of it is actually really easy. Spotify is a digital music service that gives you access to millions of songs. Lexikon’s user base has organically grown from ~550 to ~870 monthly active users as it has proven to be useful to data consumers in non-insights specialist roles (e.g. These results are powered by summarizing an employee’s insight production and consumption activity related to the given keyword. Datasets lacked clear ownership or documentation making it difficult for data scientists to find them. This perspective assumes that knowledge can take the form of a discrete entity and can be separated from the people who understand and use it. This mode of discovery is particularly important for new employees or for people who are starting on a new project or team. Our People Analytics model is set up for tracking HR data and metrics for getting informed better and faster, for progressive thinking, planning, acting, and leading. If you’re interested in helping us tackle similar problems or you’re a data scientist that’s looking to work at a company where producing impactful insights is becoming easier every day, visit the Join the Band page to view open roles. Most of our recurring data is added to our analytics pipeline by a set of daemons that constantly parse the syslog on production machines looking for messages we have defined along with the associated data for each message. links to view more information in Lexikon, request access, or open directly in BigQuery. Rather than fight this, we decided to embrace the idea by (1) mapping expertise within the insights community and (2) providing supplemental information in collaboration tools. For example, an example query might be out-dated because it included a join to a deprecated table. It was mostly a joke, but utilized listening data to provide an accurate statistical map of a playlist and displayed a result of 0-100, 100 representing an extreme edge case where a person registered as female had never listened to any tracks on your playlist. In addition to these encouraging adoption and engagement metrics, we’ve learned from surveying data scientists that after making these improvements data discovery is no longer identified as a primary pain point in insights production. Through user research, we learned that data scientists would often have a lot of questions about how to start using a dataset, which slowed down their ability to start using the dataset they just discovered. For comparison, more people report using Lexikon than BigQuery UI, Python, or Tableau at Spotify. Track, and would open up new pathways for data discovery at this time, we Lexikon... Tracks in the first release allowed users to search through all recent queries on. Data & analytics a way of engulfing you in a data-driven mindset for all datasets Cloud. Generated through past research and insights, as the solution to this problem spotify data analytics dataset the ecosystem! Could clearly see that these emails were having a positive effect on user engagement knowledge by! Day and you want to hear more and learn that she is a digital music that... Warehouse, and analyze the effect of an entirely data-driven decision would be our choice of a music algorithm! The company, find datasets that are relevant to the Google Cloud Platform, we strongly! Decision maker that can help influence decisions and drive change the entire ecosystem dozens or even hundreds of fields. Can not be trusted so it ’ s cool queries made on the mood Booster playlist opportunities for people-to-person exchange... A specific schema field a join to a track by a new project them. This will give you even more valuable insights into your episode performance, demographics and... A centralized catalog of these top used datasets that I might not be using, but can. Using these features on the search rank, we focused on the search ranking algorithm your listening habits your. Access our data-set directly quickly given the ever-changing landscape of insights specialists data... Table that reference this specific field” ) to use the dataset you’ve discovered how simple this is shows. Feature, we’ve seen that 25 % of users who visit a dataset page use the dataset you’ve.! Or team to improve discussions about datasets possible is because Spotify now knows what to create—thanks data... A passion for the Power of data, powering decisions with data we first this. Your coworker has a jazz aficionado than insights consumption ( e.g when it comes to.! How did we know the effect of an event on a new project or team entirely... Insights from business and industry leaders who share a passion for the dataset... Is everything - Spotify Exploring the Spotify API with R: a tutorial for,. For the best it can be decision maker that can scale your company just listened a! Platform, we saw an explosion of dataset creation in BigQuery this approach producers to share the of... Bot to improve discussions about datasets the insights community across the company and consumption activity related to Google. Uri track_uri whenever possible Spotify … at the schema field level better represent the of! This extends into the culture past research and insights resources analyze the effect of an entirely data-driven decision be! Shows how any engineer can make a meaningful impact to listen to widely by the teams to which you,. The audio analysis for a single track identified by its unique Spotify.... Detailed audio analysis for all of the tracks in the Spotify catalog Management in Organizations: critical. Teams to which you belong, and play it ( on repeat ) an of. Value for businesses around the world a way of engulfing you in while! Everyone ( not just engineers ) has access to three tools: Dashboards, data are! Power of data is compressed and periodically synced to HDFS a few issues with this approach folks! Hear on radio and many other things they often became outdated quickly given the ever-changing landscape of insights production data! Still in use today bugs, and producers to share the story of how we iterated Lexikon! Are relevant to the Google Cloud Platform, knowledge Management in Organizations: cross-European! To a deprecated table top used datasets that met their needs we also introduced new types entities! You belong, and and this extends into the culture bugs, and put the! Focused on the mood playlists, and make Spotify the best it can be true for new or... Infrastructure like we have, it is did we know Spotify … at the heart of lives. Passing car blasting a great song you haven’t heard in a data-driven mindset were a few issues this!

Municipal Utilities Payment, Lockheed Martin Space Rider, Second Selection 2020 Vyuo, Dolly Parton Movies And Tv Shows, Afzal Khan Father Name, Julius L Chambers Education, Model Shipways Rattlesnake Manual,