Season 4 Finale
Speaker 1: This is Catalog and Cocktails, presented by Data. World.
Tim Gasper: Hello everyone, welcome, it's time for Catalog and Cocktails. It's your honest, no BS, non- salesy conversation about enterprise data management with tasty beverage in hand. I'm Tim Gasper, longtime data nerd and product guy at Data. World, joined by Juan Sequeda.
Juan Sequeda: Hey Tim, I'm Juan Sequeda, principal scientist here at Data. World. And as always, it's a pleasure to spend Wednesday, middle of the week, end of the day, to go have our chat about data, our honest, no BS chat. And today is a special, special day.
Tim Gasper: It is.
Juan Sequeda: No guest, because it's the takeaway of the takeaway episodes.
Tim Gasper: It is, it is the best of season four, provided to you by Catalog and Cocktails and presented to you by Data. World.
Juan Sequeda: All right, so there's a lot to go through because we have, what, 15, 16 episodes?
Tim Gasper: Yes.
Juan Sequeda: I don't know, we've been at this since August. This is our 112th episode. And it's been amazing what we've been able to go do. And last week we were live from DGIQ in Washington DC.
Tim Gasper: Yes.
Juan Sequeda: It was so cool to meet so many people, to have that live show experience with everybody. So first of all, thank you to all our guests and thank you to all our listeners, because we're here for you and I'm so happy we get to do this.
Tim Gasper: Yes, we're so happy to have such great guests, such great listeners, loyal listeners, and doing that live show last week, that was really cool. I felt like I was Conan O'Brien or something like that, doing a live show, sitting on the couches and all that. So very excited, and we'll recap that as part of our show today as well.
Juan Sequeda: That's true. So let's kick it off with our warmup question that our producer Carly has given us. So how have you seen your co- host grow the most over the last four seasons of this show, and what's the one thing you'd like to tell him?
Tim Gasper: Well, I'll say that I think Juan has grown his heart three sizes, as he is now a father. Congratulations, Juan. So excited for you, that is such a personal accomplishment for you and for your wife and you, just so excited for you.
Juan Sequeda: Well, thank you very much. It is, it's been a total life- changing amazing thing. I'm super, super happy. And for you, I was thinking you have grown, I don't know how, you've probably grown per personally for sure, but professionally...
Tim Gasper: We don't know.
Juan Sequeda: Professionally you have grown into an executive, I don't think we've actually announced this here at the show, but Tim is our chief customer officer on the executive team at Data.World. So we have seen you really grow executively.
Tim Gasper: Thank you so much. Yes, still data nerd and product guy, but now chief customer officer at Data. World.
Juan Sequeda: There you go. All right, well, let's kick this off. We have a couple of themes that we've gone through over the last four months now. And let's kick off with the first one, which is culture. Culture is actually a big topic that came up, and actually the first episode we have this season, which was from Gartner, was with Vip Parmar, who's the global head of data at WPP. And we started off saying, " How do you understand what things are going on with an organization?" You've got to ask a lot of questions. And through the takeaway of the takeaways, there's a lot of what I would call obvious advice, but people are not doing it. Ask why. Ask how. How is this going to help? Keep going, why, why, why until you get to the root of the problem. What is the problem you're trying to go solve? And have grown- up conversations. Be honest, no BS when you're talking to folks. Qualify those wants and those needs. And we're talking like, how do you establish this type of culture? We need the business to become more data literate, but we also need the data folks to be more business literate and come together. This has been a big theme that we've been pushing around. I've been personally just pushing this so much about business literacy, and I'm glad to see this resonate with all our guests. The data translators, yeah, they can be like unicorns, but if you can find them or establish them, then they can be fantastic with the organization. But this is more than just new roles, because they can make more people think into somebody else's jobs. And we need to really embed the responsibilities as well as to, it's not just somebody else's job. We need to really own this. When it comes to team structures, you should have embedded analysts and data scientists in various departments. I think that's a big change that we're starting to go see too. And I think, how do we start enabling this data and business literacy? A lot of examples talking to folks, and Vip was saying specifically is that they're creating educational courses, they're actually teaming up with universities in the UK. They're teaming up with the Oxford business training to be able to go create data, AI, a bunch of tech training around that, having guest speakers organize special workshops. So I think that is something that we're seeing a lot about. Also, on employee onboarding. So when employees onboard with your organization, you need to empower them with everything they need to know, not just on the data side but also on the business side. So the employee onboarding is a very specific opportunity right there.
Tim Gasper: Yeah, couldn't agree more with that. And really connected to that topic and connected to this idea of workshops and training, we talked to Roel and Valentijn from Vopak. And in that episode we talked a lot about how you can really supercharge what you're doing around enablement and around training. And they talked a lot about this idea of a data academy within the organization that's a cross- functional team, and you train these people to have a common language around data and around your business. And then when those people come out of the data academy, they go out to all the different groups that they're a part of and they bring those learnings and that alignment and those objectives everywhere. And that is a huge impact for an organization like them, it has really paid massive dividends. And they mentioned that even smaller companies can take advantage of training. You can just keep it simple and start small, maybe focus more on onboarding like we mentioned in the talk with Vip, and that's something that can have a big impact. In general Roel mentioned that you need to get out of what you think you know, get out of your comfort zone, and really jump into new topics. So I think that's some good advice around training and enablement.
Juan Sequeda: And then episode we had with Joe and Matt, Joe Rice and Matt Housley were the authors of the data engineering O'Reilly book. I love how they say we have this shiny object syndrome, this magpie syndrome. We always want to go get into the next cool stuff. And it's resume driven development. So I think this is a culture that we probably need to start breaking a little bit. There's this lack of emphasis on people and process, because hey, tech is the easy part. And it's getting easier and easier to use. And hey, people and process are people and process, that's complicated. Well, hey, life is complicated. This is the opportunity we have. So that was with Joe and Matt.
Tim Gasper: Yeah. So the next episode that we talked about was with Loris Marini. And one of the big things that we talked about as part of that was knowledge and how to really foster knowledge in your organization. And one thing that we talked about as it relates to culture is, how do you have genuine conversations? And he really recommended, you've got to be curious, you've got to approach it assuming that you will learn and you must learn. The more you know, the more you learn that you don't know, and you have to coach people to all approach it in this way. It's not just going to happen naturally. You have to coach people. And he mentioned that from a cultural standpoint it's a little bit of a pessimistic view, but it's honest, right? Is that humans can be lazy machines. We favor the path of least resistance. And so if you can keep things tight, short, concise, think like Twitter, he said, you can really get people to share their ideas more, consume ideas more, and overall it's okay if things are a little messy. Just getting that knowledge sharing going on and those conversations happening is really the most important thing.
Juan Sequeda: And also chatting with Laura Ellis, I like she said it's tricky to work with data because you have to break down the business problem into a data problem. So I like this example. You break down a word problem as a child, you have five apples and you have five oranges. So how many fruits do you have? Well, you've got to define the objects, you've got apples or oranges, you've got to know apples and oranges are fruits, and you can add them together. This is knowledge acquisition, this is how we've been doing this type of work for the last 20, 30 years. So thinking about your data problem is really a business problem, breaking it down, I really like that type of analogy over there. And then also we've been having this conversation over and over again about marketing and data marketing, and it's more about enablement of that data. So documentation is a key part of that. If we want the data to be used, we need to go market it, it needs to have great descriptions to document what this is. How do we get people to go use it? People are like, " Go create workshops, create swag around your data work, the logos around that stuff, promote it." I was talking to folks that create happy hours and stuff about the team, that this is how we're going to do stuff with data. We have to be creative about how to go use data. I think another example is have newsletters, not just what the platform team is doing with the data, but what's happening within the ecosystem. Let's go share and celebrate what other people are doing with the data. That's just marketing.
Tim Gasper: Yep, I totally agree. And also around culture, we talked with Gabby Steele and Leah Weiss of Prequel, the founders of Prequel. And the topic goes around putting the business in charge of their own data. And as part of that we were talking about their previous company, which is still very active right now but focused more around consulting, around data culture, called Data Cult. And what was the process of being able to nurture a great culture around data? And they mentioned a few things. They were mentioning, hey, you should really do some technology teaching. So do things like enable people with SQL, with self- service BI tools like Looker, Tableau, et cetera, even to those that don't have a data background,'cause they can benefit from some of those skills. But even more importantly or just as important is really building ambassadors in the business, people who can be the reference points, who can be the subject matter experts, bring the different people from across the business, and bring them together. And one thing that they also mentioned was this idea of, try to move quickly. Think about, if you bring business expertise and you bring technical expertise together, what can you build in just a couple of days, in just a few days? Because that can be a big impact in moving the ball forward and being able to show what data culture and what the right data tools and the right data approach can do for your business and for your use cases. Finally, we talked about with Gabby and Leah, find the shadow IT and don't shun them, embrace them, empower them. Because actually maybe that's passion right there. There's people there who want to force cultural evolution, and so we should find the right way to nurture that and put them on the right tracks.
Juan Sequeda: Yeah. Well, I think also on that track about, just to wrap up culture, we have all these data teams and AI teams, all these teams together. What we really need is to make sure that they are aligned and have shared KPIs. And that was one of the points that Theresa Kushner was doing, is have KPIs together. I think that's something that's a very critical thing. That's culture right there. All right, cool. Talked a lot about culture.
Tim Gasper: Culture.
Juan Sequeda: Another thing that we talked a lot about throughout and I think it's the theme of 2022, how it's ending in my way and how it better be next year, is business value. And we've got to follow the money. And so things that Vip said, I remember, don't do anything you are asked to unless you understand how it affects the bottom line. How does it provide value to the organization? You need to understand the value and how to measure it. Make it clear that you aren't trying to be a difficult person. You're asking because you want to help how to make things better. And this is empathy, empathy comes here. Make it clear that you have good intentions around that. As a manager yourself, ask your team what they're working on and why. Why are they working on those things? There is also responsibility for the people asking for things to explain why they need that. If you can't explain why you need the data for some other team to work on something, then why are you even asking it from the first place? This is an important thing. I think this is so much about just why, why, why, why. Everyone needs to know the strategic objectives of the company so you can tie it back to the strategy you're doing. I think this is something I've been talking to a lot of people, is everybody in an organization needs to know what are the operational goals of the company, what are the OKRs in the departments? They need to be aligned to those things. And if not, then something's wrong right here, right? So we need to check to see if the people think that their objectives are aligned with the company's goals. Why, why, why? Align everything to the goals of the company.
Tim Gasper: Yeah, goals and metrics and ROI, as we'll see in a couple other episodes here. So for example, Joe Reese and Matt Housley, when we were talking with them, they mentioned to really understand failure and the why, the reasons why. And so data projects don't fail for technical reasons. It's because the data teams are not aligned with the people they need to serve. So that is really an impactful statement and it's something for us to think about, about how we get the data and the business more aligned, especially to those metrics. And I think Rupal from Penguin Random House actually said it really great in our episode around the struggles around governance, that you need to understand what you're selling to your business. And to do that, you need to understand your business. What is important to them? What are the anecdotes that are going to resonate with other people? You need to really know that or else you're going to struggle to convince other people of what you're understanding about the data and really driving the right momentum forward on initiatives. And she mentioned that ROI is really the key. A lot of data people don't know how to think about or measure ROI, which is funny because they are data people.
Juan Sequeda: So ironic. So ironic.
Tim Gasper: And yet we struggle to communicate about that value. And so one analogy to think about is, think about in America this idea of the shark tank, pitching your ideas to the investors, or in Canada it's the dragon's den. And think about, okay, if you were a data person trying to pursue a data project, or maybe in her case a data governance person, think about, those people on the other side, they're sitting in those chairs and you have to convince them. You've got 10 minutes to do it, you'd better have your metrics, you'd better have your business plan, you'd better have your story. And that can be impactful to get good at that.
Juan Sequeda: And then following on understanding the business, another theme is, ask the questions and follow the money. This is something that's come up a lot, is let's understand how the business makes money, and then where do you pour money in and where does that flow? Let's make sure we're aligned about that. And also talking with Lori Ellis about business literacy and data literacy, this should be formalized onboarding within your organizations, to explain to employees, " This is how this company makes money. This is how we generate value for our customers. These are all the most important business concepts. Here's how the products work, here's how they all work together." And by the way, this should probably be in your catalog. So that is wrapping us up around when it comes to business value.
Tim Gasper: Yeah. And then I think the next big topic is, so we talked about culture, talked about business value, it's around people. Who are the right people, the right roles? How do we motivate people, get the right people? And in our episode with Joe and Matt we were talking about data engineers, and how can we really make sure that we bring on the right data engineers and nurture the right skills of data engineers? And as many of you know, Joe and Matt have a wonderful book that they've launched around data engineering, fundamentals of data engineering, which we strongly recommend. So the big skills they mentioned were assessing questions. So the curiosity and the ability to ask the right questions and then take the next steps based on those questions. Assessing technology based on business problems. So not technology for technology's sake, not technology because you went to the conference last year and you thought it was really cool and saw a good talk, because LinkedIn gives good talks, et cetera, et cetera, right? It's the business problem. What is the technology that solves the business problem? And then thirdly, schema, modeling, and cataloging. That today enterprise data engineering tends to focus too much on schemaless, on streaming, on more open- ended warehousing principles, lake principles. That's all really important and those have been huge advances, but we've swung the pendulum very hard in a certain direction. Schema, modeling, cataloging are super, super valuable, they're like lost arts that are really resurging now in a strong way. Get on that train.
Juan Sequeda: Well, talk about with Rupal, Rupal's advice here is, ask for opinions and anecdotes. So as a person inside in this data space, go from department to department to keep building up that story. Understand the different business use cases. Basically get out of your governance building type of approach. And I love how they, let's go create some sort of workshops around this, but don't bring everybody together. Because look, let's be honest, people are going to think about it, it's boring. So don't chase people down because it's their responsibility. Start with those who are actually really interested, who have that problem, who have that passion, have that spark, who want to get in. And it's okay if you start small, because it really comes down to that personality of the people. People are really persistent around this stuff. So I think that's really interesting, goes all about finding the right people to come to bring together. And especially, let's be honest, people don't like to be in these meetings. So don't just start something with like, " Oh, here's this big project we're going to go do and we have to go create all these meetings and stuff." No, you've got to start small. Implement the right stuff just in the perfect timing. You get to know with the right people, take it little by little.
Tim Gasper: Mm- hmm, don't boil the ocean.
Juan Sequeda: Don't boil the ocean.
Tim Gasper: Loris on the people front really emphasized connections. Follow the connections, relationships between people. There is so much interesting context around how people interact with each other and what their relationships are with each other. Value is based on people, not on machines. One of the things he advocated for was, when you think about investing in and nurturing your people, consider not just the stack and getting skills around the stack. Think about what he said, curriculum driven development, which is really thinking methodically and thoughtfully about, what is the progression that a person should have to mature? Not an implementation focus but a business focus. And he also said maybe we need an organizational psychologist. Because people are messy, people are uncomfortable, people have people issues. But if we can overcome those issues, that can really help us. And an organizational psychology does more than just help people overcome their people issues. It should help to really diagnose and provide therapy for the organization as a whole. Where are the problems? If we just had curiosity and we were exploring within the organization, what would we find? What questions would we discover that our normal business operations don't? So I think that's a very interesting concept.
Juan Sequeda: So Laura Ellis, who's a VP of engineering at Rapid7, we were talking about how to address these data user experiences. So it was very fascinating to know that they had this internal user researcher who was actually going off and talking to all the different analysts and all the different business people. And that made their data team so connected with their user base. And I find it fascinating. " Wait, how did you find this user researcher, did you hire?" Well, no, actually they had somebody in IT who was just passionate about this topic, about user research. And that person took that initiative and was reading about how to go do user research and focused that on, " How do we understand the people within the organization? How can we make their lives better within the company?" And this person did it as almost a pet project and it became their official role, and it's providing so much valid to understand more people within the organization. So I think that's a fascinating idea. And this was an example of, we talk about this data therapist and stuff. It was like, yeah, actually happening. People are really doing this. So I think if you're thinking about it, just go do it. I'm sure there's somebody within the organization who's thinking about it. And the other example was, again, it's all about curiosity. So hey, if you're seeing a presentation and you're seeing these numbers, a report, go ask the person, " How did you do that? Was it easy? Was it hard? What was easy? What was hard? How do we make this accessible? Who else can make those types of reports like that?" Let's go solve things if it's hard. So I think it's just so much about talking to people, that's really it.
Tim Gasper: Mm-hmm. So much of the data, what seemingly is a hard skill domain is actually led by those with the strongest supposed" soft skills." I air quote, for those that are listening, because we all know that it's underemphasizing something that's so, so important.
Juan Sequeda: Definitely.
Tim Gasper: Finally, on the people front, let's move up the corporate ladder a little bit here. Let's talk about the chief data officer. So with Theresa Kushner we talked about the role of the CDO, but also the growing and emerging role of a chief data and analytics officer, or anybody who's the dual role of data leadership and analytics leadership. And one of the things that she mentioned is that as a trend CDOs have tended to be, they trend a little bit more IT, they're a little more technical, not necessarily business people. There are obviously exceptions, but there's a trend towards more technical people. And yes, a good CDO and a good data leader needs to understand the value of data, needs to understand the value of technology. But that doesn't mean that you can only be an expert on data infrastructure and data tooling. That is not enough. That is not enough to really make an impact. And that this trend towards a chief data and analytics officer is a really good thing, and that's going to help all the people in the organization because it's going to align them more with value. And it's also because she said managing data isn't always a sexy job. Not everyone wants to get into just the data side of things, but analytics, everybody wants to get into analytics because it drives the cost savings and the revenue production and the new products. So maybe that's the new sexy job of the 2020s, is going to be the chief data officer. To cap things off, Theresa said data teams want to go deep and understand. Encourage them to broaden out, understand the big picture, the entire landscape of things. Empathy, curiosity, the big picture, important things here for your data team and your people.
Juan Sequeda: Yeah. Well, another topic that came up a couple of times was real time.
Tim Gasper: Real time, real time. And we got to discuss with a few folks about, what does real time mean? What is the impact of real time, how fast is real time? And so one great conversation was with John Kutay of Stream. And we talked a lot about in that episode what is streaming, what is the power of streaming? He mentioned that streaming is about collecting data as it's new and processing it in a sequential manner, capturing it and leveraging it in an event driven way. And so that was a nice overarching definition to capture what is the streaming approach. And I think what was really great about that episode is, John was very honest about, where is streaming a good use case and where is it not really a good use case? You could do it, but do you need to do it? And so when is streaming not the right thing? Well, if you're using a data warehouse and you're servicing a report in a BI tool, maybe that's sufficient. Because streaming, maybe you could get a little faster, maybe instead of every 30 minutes or an hour, that report can come in five minutes or three minutes or 30 seconds. But if getting to that required you to pay 10 times as much, is that really worth the ROI? Maybe that report in three hours was just fine. So I think that's an interesting way to think about, when is streaming important, when is it relevant? And we talked about, man, wouldn't it be nice to have the knob? I know you were very excited about that. You want the knob where you could say, " Hey, let's go streaming. Hey, let's go batch." And I think there's some different technologies that are, whether it's streaming warehouses, whether it's some of the new architectures around hybrid batch, hybrid streaming that are starting to think about, how do we expand the window and start to do batch versus tight- knit and do more real time? So that was a very interesting conversation with John. And then also Frazier, we chatted, Frazier Harris, who's the head of product over at Fivetran. And he mentioned that as soon as things go sub 30 seconds you're imposing a 10X cost and a 10X complexity. Do you really need that?
Juan Sequeda: That's the honest, no BS right there when it comes to streaming and batch. But I remember a couple months ago I attended the Kafka Summit.
Tim Gasper: That's right.
Juan Sequeda: And it was great to see, I've acknowledged that streaming isn't something I'm familiar with and I was really eager to go attend the Kafka Summit to learn more about it. And it's really interesting how you see everybody saying, " If I start from scratch, I would start from streaming." Because it's like, yeah, I'm eventually going to need it, or for some use cases, let me just go start with that right now.
Speaker 4: I'm not sure I understand.
Juan Sequeda: But one of the things right there, I don't know...
Tim Gasper: Siri doesn't understand.
Juan Sequeda: But one of the things that I was a little bit afraid is that there's so much wheel reinvention when it comes to all these types of schemas and constraints. They're basically reinventing databases. I'm like, " Oh man, that's going to be a big pain." So again, we do a couple steps forward and we do many steps back. I guess we reinvent the wheel a lot.
Tim Gasper: I'm going to interject real quick and say I find it funny how we get so excited and infatuated with certain things. I'm going to be a little bit sarcastic here, and it's like, oh, data mesh comes out and all of a sudden it's like, " Oh, everything should be data mesh." Or streaming comes out. And I was like, " Oh, everything should be streaming." It's like, everything? Should everything be that? I don't know, it's just interesting how we get so excited about these trends. And there's a lot to be excited about, but we also have to hold ourselves back a little bit.
Juan Sequeda: Yeah. Well, another talk about hypes and all this stuff is AI. And I think this is a topic that we haven't hit that much throughout the entire podcast in the last, I don't know, two years. But we actually had two great episodes about AI particularly. One, it was super great to go meet with Patrick Bangard, who's a VP of AI at Samsung. So we were like, " Hey, so where is the focus of AI right now?" And he's like, " Well, obviously where the money is," and the money is in autonomous vehicles. This has been the focus of AI, but it's actually a very solved problem right now. Now everything about AI in autonomous vehicles is just polishing the final things. It's manufacturing, it's the legal, it's the compliance part, it's the non- AI stuff. So where do we go next from there? Well, we've got to keep following the money. Remember, follow the money, follow the money. Well, one of those is going to be military, as better for worse it is. It's a prime candidate. But the other one is healthcare. And right now it is, how can we improve healthcare? So when it comes to healthcare, what are going to be the drivers within the healthcare world? It's the device manufacturer. They will be at the forefront, because it's either the consumer wearable devices, the watch that you have, devices inside of the hospital. I think all of these are generating a bunch of data that we're going to be using for AI. And at some point, the average doctor has, what, 70% accuracy on images? But with AI models you get up to like 98%. So if AI gives you that feedback it's instantaneously, while the humans actually it can take a little bit longer and it's not as accurate. So actually if you're going to go get an x- ray, make sure the AI tells you what's in it instead of the machine. Instead of the human, I mean. But today the focus so much on the AI is on the algorithm, on the whole math behind everything, how many layers and the transformers and all that stuff. But we've now been seeing the shift to what's called the data centric AI. The problem is not with the algorithm any more, the problem is with the data. And I think this is something that resonates great with Patrick, is, we need to insert more knowledge inside of our data. And it's not just about more labeling. And this is where the knowledge graphs and ontologies come around. The knowledge is a step into changing this AI for this everyday sense, our common sense that we need to go have. And we got into this really interesting discussion about, wait, but don't you need all this knowledge also to be able to go do autonomous vehicles? And it was fascinating Patrick's answer, saying, no, the amount of knowledge to pass your driver's test is minimal at most. You really need to know a small number of rules. And actually if you look at it, if you're going to look at the textbook of driving, it's pretty small compared to a textbook on any other areas of knowledge. So it's more about skills. So we think about autonomous vehicles as the most amazing thing, but he said himself it's the bottom of the barrel, it's just a skill. And that really opened up my mind. So we were like, " Oh, amazing, autonomous vehicles." He was like, " Eh, it's actually easy."
Tim Gasper: It actually makes me a little sad, because I really want my Rosie Robot who can help me in my house. And I feel like if it's easier to drive a car than it is to wash dishes, then I'm sad.
Juan Sequeda: We're far away from that. Well, so another question is, where is this funding coming? Where is AI going to go next when it comes from funding? The issue is that a lot of the VCs, he said himself they're focusing on companies but not on real businesses. So they're focusing on just entertainment, on this funny thing, but there's no really groundbreaking AI that we need to focus on healthcare. Because hey, that's not where they think they can make the money, which is kind of sad. It was very specific. It's like, we focus on all these things, which is not where we need to go focus. We need to go focus more on these problems like healthcare and stuff. Unfortunately, they're a more specific vertical and probably there's less money or less of a ... there. But honest, no BS episode right there as always. But that was a very special one.
Tim Gasper: One of my favorite episodes. I'm very fascinated and interested in AI and he's one of the foremost experts in the field. Along that line of vertical AI versus horizontal AI, I thought that the episode with Andrew Eye, who's the CEO at ClosedLoop. ai, provided a lot of great insights. And his episode was very focused on this idea of, no one wants your data model, or nobody wants your model, your AI model. And there's this idea in the past that the model that was trained on the most data will win. And so I trained my model on 50 million cat videos, was the example he gave. And he was like, " If you've got cat videos and you pass it in, then I will tell you if there's a cat in it really well." And it's like, okay, well you trained it on lots of the data and then the input is very similar, it's exactly the same, it's very consistent, and then you get the output. And yeah, in a situation like that, just taking a model off the shelf or if it's open source or something like that, that can work great. But the problem is that most use cases, most situations and most data is not consistent in that way. Your data and my data do not look the same. And so therefore the idea of just taking a model off the shelf or using a tool that just has a built- in model that you can just hit go on doesn't mean that you're going to get good results. And so one of his big points is that in a lot of use cases, particularly around verticalized use cases, you need to build your own model. And so that was one of his big takeaways that he provided. He looked at a few different verticals. We talked a little bit about healthcare and how in healthcare there is a little bit more consistency with some of the records and things like that. So there are some use cases where maybe the models can be taken more off the shelf, but also a lot of use cases where it doesn't work. Maybe a little bit in retail you get some consistency because of things, but now you start to lose some of that repeatability and then from there things get more and more complicated. So knowledge of the data he said is also really important. You need to understand deeply, what are the semantics of the data, what's the context of the data? Because if you don't, your AI is going to be unpredictable, you're not going to get good results. Knowledge needs to be dynamic. Overall he liked the idea of vertical AI and built AI over horizontal AI and talked a little bit about the state of AI today. AI today is really involved in a lot of ads, and so you get a lot of martech value from AI, value.
Juan Sequeda: I like what he was saying, you have all these really smart people who are focusing all this amazing brain power so users can click on more ads, instead of that brain power to be used to go solve cancer and so many healthcare problems.
Tim Gasper: Yeah. So maybe this value is here, but what if we were putting that towards solving cancer? So I think one of the big takeaways of the episode is, how do we as a society actually try to move the value and the focus and the energy towards, how do we make that sexy? How do we apply all these people who are working on self- driving cars and they're looking for maybe new jobs or a new focus, how do we get them all to go to the healthcare industry?
Juan Sequeda: Yeah, and I think also a good call- out for, we need more education for people with diverse backgrounds. How do we deal with the biases and stuff? We need more diversity, I think that's a very, very key point. And then also chatting with Theresa Kushner talking about AI was more about the AI teams. And I think we also think about the data teams, but there's AI teams. And hey, she's honestly saying the AI teams are going faster and the data teams are just struggling to keep up with the AI teams. So we need to have more synergies between this, because guess what? The AI teams end up then doing their own prepping and cleaning of the data, but that's what the data team is supposed to be doing. And then the AI teams start creating their own feature sets, which themselves are additional data sets. And then there's no governance around that, why people don't know how to go use it and so forth. And then you have these different teams, are they on the same standards? Are they using the same processes? What are the approaches of how to get them together? So think about AI, even within your ML teams, your ML ops and ML engineers and you have your data engineers. We need to synchronize these two teams. So that's another good point to start thinking about. Then there's another topic, which is, I call it catalog and governance and semantics and modeling. There's all this metadata, which is the heart and the essence of what we are here at Catalog and Cocktails.
Tim Gasper: The context, the glue.
Juan Sequeda: Everything it is. Super, super thrilled that I was able to go spend time with Ole, who is the author of the upcoming O'Reilly book on enterprise data catalog. This was an episode that actually we did face to face when we were in Paris, which was pretty cool. And we started out as, " Hey, what should a catalog do?" He said very specifically, " Make sure it can effectively search and discover all your data, structured and even unstructured," which is an interesting point, how he brought in these two things. A catalog should have a very strong focus on search. But if you're going to focus on search, you also need to understand how you're going to go organize the metadata around that so you can have better search experience. And then there's a search in the data or you're searching for the data, and the catalog is about searching for the data. You're not searching actually inside of the things. The analogy is, I'm going to the library, I'm looking for the book, I'm not looking for what's inside the book. I always have this analogy about the shopping experience. The way you should look for data is like the shopping experience. And honest, no BS style, Ole disagreed. It's like, no, it shouldn't just be the shopping experience. Because shopping experience is, you know what you want, and maybe you don't know but you get recommendations. But at some point you need to search for stuff that contradicts with that shopping experience. You want to find stuff that's complicated to find. It has all these different paths, all these different rules, very complicated almost queries to go do that, which is not the shopping experience that you would have. So I think it's complementary to these approaches. And it's interesting that you want to go for regulatory purposes. I need to go find data that has been used in this time by these people and hasn't been done in this thing and so forth.
Tim Gasper: In that sense he mentioned it's not really like Google either. It's not like Amazon, it's not really like Google.'Cause Google is lowest common denominator, whereas it actually can get kind of complicated depending on what you're trying to find.
Juan Sequeda: Yeah. And one of the things I enjoyed talking so much with Ole is having this spectrum of search, all the way from simple keyword search all the way to something that he calls an information retrieval query language. But effectively it's almost like a query language over all your data. And if that's what you want to go search for within your catalog, you need to be able to go organize that metadata, and that's where the metadata modeling comes into place. Now, in addition to cataloging data, you want to be able to have the life cycle around the data. And I think this is something that we don't talk as much, and even us, not putting my salesy hat on, but just as a vendor or salesman in the catalog space, we don't hear that much. We don't talk much about life cycle.
Tim Gasper: No, people talk much more about the beginning, not the ongoing.
Juan Sequeda: Yeah. So how long should you keep the data? We're going to go catalog stuff in our data catalog. How long should that be there? Forever? When should it deprecate, when should we hide it? It depends on so many different factors. So the data life cycle needs to be managed within a data catalog. And that was a very interesting aha moment for me,'cause I'm realizing the market is not talking about that. Different approaches, POSMAD, plan, obtain, store and share, maintain, apply, and dispose when in the data. Look that up, POSMAD, that's something really interesting to go dig into. Another one to think about is DIKAR, D- I- K- A- R. So data becomes information, becomes knowledge, becomes actions, and ultimately becomes results. And I think this is one of the things that we need to start thinking more about.
Tim Gasper: Mm- hmm. So next, in Frazier's episode from Fivetran, we also talked about how Fivetran is getting a lot more into metadata and really the value of metadata as it connects to data transformation and data integration. And one of the things that he mentioned is that the fundamental challenge around data cataloging is, how do you get people to describe their business processes, how decisions get made, how a business functions and works? You hope that the data represents that as best as possible, usually via things like data modeling. But in order to really get it there and in order to capture all that context, you need a lot of cross- functional collaboration and you need to bring a lot of business expertise to bear. So I thought that's an interesting insight, that it's not a magic bullet. There's some work that needs to be done. And maybe we can gamify or we can make that work a little easier, but but you have to do that work somehow. And then data contracts he mentioned, and that this is really all about interfaces. And so I'm technical so I think of an API. An API, it communicates certain things and it expects certain things. That's really what these data contracts are about. There are two parties, what's the interface between them? As long as the provider meets the expectations of the interface, then things will work. So I thought that was a very simple way to look at data contracts in the context of broader metadata management. That was very simple and very clear.
Juan Sequeda: Yep.
Tim Gasper: Another great set of takeaways around metadata and around cataloging, semantics, et cetera, was when we were talking with Joe and Matt, where we talked a lot about data modeling. And they mentioned that data engineers need to be aware of data modeling and its concepts and maybe even the fact that it exists. Anecdotally Joe said maybe 20% really know it decently today. That number needs to go way higher. So we need to make a dent in that. There's a lot that you can learn from classical modeling approaches. So pick up that Kimball book and start familiarizing yourself with these different concepts. And whether or not they're applicable or not, you're going to learn a lot and you're going to start to see when you run into problems where you need to optimize, you need to model in a certain way and you start to realize, " Oh wait, wow, these are important tools to have in my toolkit." And what's next for data modeling? How does streaming data or graph data fit into data modeling paradigms? So today very SQL oriented, very procedural, what does it look like to actually do things more in streaming or graph context? And maybe allow all those things to work more interoperably with each other.
Juan Sequeda: Yep. So now we are back to my favorite topic of semantics. And this was an episode we had with Dan Bennett, who is the CDO of S& P Global Commodities. And talking about semantics and he's like, " Hey, you have a number. It's not just a float, it's not just that data type, it's actually a unit of measure. And by the way, that number in there is not just a unit of measure, it's actually the barrels of oil per day." So this context sits above the data. And that's his whole point, is that we need to be able to go make semantics first class citizens. Now, the question is, how do we scale the creation of these semantics? Companies like S& P, they have the incentives because they want to make the data usable as fast as possible. So when we think about semantics, it's really an extension of data governance, and an extension of data governance about how we can use the data. So if we think about data governance not just from the protective approach, but also let's make sure we're using the data, semantics is going to make the data much more usable. That's why we need to make it a first class citizen. Now, how do we go do this? We were talking about, maybe the solutions are just so simple. We look at data dictionaries, they're all very standardized. Is there a way that we can just extend the data dictionary to say, " Hey, give me another column in there where I can just give you a pointer to place that can give me all that very specific machine readable semantics around that"? We've been talking about this for so long, we just need these kind of techniques embedded within the vendors, and maybe it's not that hard. So I think the point here is that we need to have a network effect. I really like this as a conversation, just like the web. The goal is, one plus one is greater than two. Because traditional approaches, one plus one equals two, are actually one plus one equals less than two. And that's when we actually start generating debt. So how can we do something where one plus one is greater than two?
Tim Gasper: As we invest in the network, it is exponentially providing value.
Juan Sequeda: Exactly. Another episode, fascinating discussion with Allison Segraves, and what I loved about the discussion was because we kept this analogy of the pool going for a long time.
Tim Gasper: Yes. For those who want a really long analogy, please watch our wonderful episode with the wonderful Allison.
Juan Sequeda: Yes. So the whole episode was like, " We need to get into the pool and just jump into the pool. We need to be able to get in there and just swim with the data." The pool as being that data. The thing is that, hey, there are a lot of lifeguards. So what type of lifeguards do we have? There's a lot of lanes, in which lane? There's a deep end. Do I get dumped into the deep end or I get dumped into the shallow end? And there's so many lifeguards, there's the privacy lifeguard, the security lifeguard, the governance lifeguard. We have to be able to understand how to be able to get into this pool, but we can't make it so complicated. And we need to be able to enable people to say, " Yeah, just jump in, get in there. I am here to let you swim and go practice." And actually, you want to get good at data. How do you get good at data? It's using the freaking data. You can't learn to swim outside of a pool. There's so much theory you can do outside of a pool.
Tim Gasper: At some point you've got to jump in the pool.
Juan Sequeda: Jump in the pool and go do that. So that was was a fascinating analogy that we were having with Allison that whole time. Also acknowledging that the industry just has gotten so complex. We make this thing so complicated when we don't need to. And I'm happy to hear this constantly from the senior leadership folks that we talk to. This technology is too complicated, it shouldn't be. And I would argue sometimes that the younger folks are the ones who are making it more complicated. And that's something that we need to really change there. We need to go really understand the history, understand the past. And what needs to change here, following this swimming pool analogy, is that we need to feel comfortable in our swimming suit. We need to be able to put on our swimming suit and jump in the pool, and we should not be judging other people, how they look in their swimming pool, how they look, how they're actually swimming.
Tim Gasper: Yeah, wear what you want, right?
Juan Sequeda: Yeah, exactly. So have more freedom. Screw it up, try again. That's what it is. You have lifeguards that are going to let you swim. They're not going to let you drown, you're not going to die. They're there to let you swim and go learn.
Tim Gasper: Yeah. And we can have fun, swim well, learn a lot, and be safe. No dying.
Juan Sequeda: Exactly.
Tim Gasper: So great episode, definitely recommend that very highly. And then finally, in this area around semantics and really around governance, we were at the DGIQ conference last week by Dataversity in Washington DC, and we did a live Catalog and Cocktails episode on the last day there to really cap it off. And by the way, anybody who wasn't there, you missed out on free old fashioneds. So make sure you come to the next one. Anthony Alderman, who is the convergence platform lead over at AbbVie, and then Shannon Moore from Dardy joined us as guests. And we talked about governance, some of the challenges there and some of the opportunities there, what the outlook is around 2023. And one of the big takeaways was that organizations don't change, the people in them do. And so it comes back to that topic of people again and how important they are to really driving metadata, driving governance, and driving value for your organization. How do we get people to care more? Communication is key and you have to understand your audience. So say things, do things that help them relate and learn. Learn about the business so that you can communicate effectively. ROI has to all be around the business value of governance, not just governance for governance sake. And so tie it to business initiatives. Think of it as an ongoing function. They mentioned you should think of it like HR or accounting, not like this special project. No, can you just be like, " Oh, HR is here for six months and then we're done with HR"? No, you always need HR. Similarly, you always need governance and we have to switch our way of thinking of it. Maybe even we should change the name of it. Towards the end we were like, " Is it governance or should we call it data enablement?" Because maybe governance sounds a little too much defensive and like a project, when really we should be thinking of this as an ongoing thing. It's a function that we do no matter what. Execute the fundamentals, do the table stakes right, change about how you talk about it, think about how you talk about adoption, how you talk about use cases, talk about the quick wins, the success stories. Quick wins, speed to use. So important. Offense versus defense. There's offensive use cases around governance, which are more like, how do we use governance to create new data products, create new opportunity, create empowerment with data to make better decisions, create more differentiation, et cetera? And then there's defensive use cases around data, which are much more around privacy, security, compliance, quality. And do you have to choose between one or the other? Well, you probably have to specialize, but really you should understand both, you should invest in both, and hopefully we can move that needle a little bit. I asked the lightning round question to them of, " Hey, right now maybe it's like 10% offensive, 90% defensive."
Juan Sequeda: Is it going to be 50/50?
Tim Gasper: Yeah, in the next five years will we get to 50/50? And Anthony was like, " Hell, no."
Juan Sequeda: Going to get less.
Tim Gasper: My heart kind of dropped a little bit. I was like, " Oh no, but five years is a long time." But he made a good point, which is that the environment is changing. The privacy laws are getting harder and things like that. And it's a table stake. You have to do the defensive stuff, but if you can do the defense and the offense, then you're really differentiating. In 2023, lead by example. We need to measure, quantify, and understand data quality and use that to teach others, and then communicate outside of just your technical audience bubble.
Juan Sequeda: That's it.
Tim Gasper: That's it, we ran out.
Juan Sequeda: We ran out, that's it. Culture, business value, people, real time, AI, and then the catalog, governance, semantics, metadata space. We did a lot.
Tim Gasper: We did a lot. And that was an awesome season. We hit a lot of great topics, amazing. Both practitioners, leaders, partners, customers, thought leaders. It was really a great season.
Juan Sequeda: So going through all these takeaways, I found some of my favorite interesting quotes here. I'm going through this because I'll never forget my favorite quote from last season, which is Sanjeev's. " I never met a data I never liked." All right, that was so cute. I love Sanjeev.
Tim Gasper: It's one to take home.
Juan Sequeda: Thank you, Sanjeev. All right, I'll go first. One of my favorite quotes is from Vip, and he said, " The goal should be to make the CDO an irrelevant role because data is so embedded it becomes a part of everyone's job." That's a strong statement right there.
Tim Gasper: Yes, that is very wise.
Juan Sequeda: So I wonder, if you're a CDO right now, where are you going to be in the next two, three, five years? We'll see.
Tim Gasper: It's interesting question. So everyone knows the phrase, " Oh, that's a legacy system," or something like that. It's an IT square word, right? Well, Joe Reese had a very funny comment in our chat, which he said, " Legacy is a condescending way to refer to something that makes a lot of money."
Juan Sequeda: So spot on. All right, another one. Rupal, we love you, Rupal. This was just such a great quote, I have to go say this. People could take this wrong, but no. " Data governance has a bad rap. I blame banking, finance, and GDPR."
Tim Gasper: After she said that she was like, " I regret saying that."
Juan Sequeda: I think we're going to make some t- shirts with that, right? All right.
Tim Gasper: And then finally, Frazier Harris from Fivetran said, " All problems are people problems."
Juan Sequeda: That's it. I think that's it. I don't know, is that a good thing or are we screwed?
Tim Gasper: I don't know. Go hide in your house.
Juan Sequeda: All right, Tim, let's wrap up. I don't know, let's talk about predictions or what we've learned. What do you think about next year?
Tim Gasper: Yeah. Well, I think that one of the things that's a huge theme this year, and I'm curious about how it evolves into next year, is just this idea of curiosity and empathy and how important these people skills, these supposed soft skills, how important they are. And how important they are not just to the business side, how important they are to the technical people, the technical data governance people, the technical data engineers, the data scientists, the analysts, how you should ask questions. And that that is more important perhaps than being a super skilled or technically capable person. Obviously you have to have certain skills, but this is the thing that differentiates you. So I'm curious to see how that evolves and how we as an industry nurture that. Because I don't feel like we have a strong road map right now on how we nurture that. Today we talk about culture, but it's still very fuzzy. We have to get more concrete about how we make that happen.
Juan Sequeda: Yeah. We said a lot of why, why, why. I think that's something we need, which is empathy. And it goes back, credit where credit's due, that episode we had with Ergas, I think it was last season. We asked him, " What should data engineers be focusing on?"
Tim Gasper: Yeah, what's the number one skill or whatever they should focus on? Yeah.
Juan Sequeda: He said empathy and curiosity, which totally caught me off guard.
Tim Gasper: Mm- hmm. And it was a foreboding of a trend to come, right?
Juan Sequeda: There it is. Yeah.
Tim Gasper: And what about you? As you look forward and as you look back, what are you thinking?
Juan Sequeda: Well, this is not my prediction. This is more my desire, my hope. Because if I'm going to predict, I don't think it's going to happen, sadly. We'll see. Is, follow the money. If you are not able to very clearly, in a crisp and succinct way, explain how your data work is providing, making money, saving money, it is directly connected to the strategic goals of organization, you should be preparing a really great resume because you're probably going to get laid off, period. That's it. You understand how the business works, you understand the money. And for those folks having discussions, I'll be very direct with you. If you think that, " Oh, I'm data, we enable everybody. We don't have to go provide ROI," I'm like, " I'm sorry, too bad, you're not going to be a leader. You're going to be in the back of the pack and you're probably going to lose your job next year." So don't be that person, not just keep your job, but empower yourself to get better jobs and promote yourself, is knowing where the money is and how you are contributing to that. That's it.
Tim Gasper: Honest, no BS.
Juan Sequeda: Honest, no BS. Tim, as always, pleasure.
Tim Gasper: Always a pleasure.
Juan Sequeda: 2 1/2 years doing this.
Tim Gasper: Yep.
Juan Sequeda: Looking forward to season five. We're kicking it off January 11th with Bill Inman, the father of Data Warehouse.
Tim Gasper: Bill Inman.
Juan Sequeda: And we're going to put a survey out, so check our socials. We really want to get your input about what are the topics we should be listening to.
Tim Gasper: Yeah. So look at our social, look at all our different profiles. We're going to put out a survey, we want to get your thoughts. It'll be fast, it'll be a fast survey, but please fill it out. You all are the heart of everything that we do, so please help us get better. Thank you to Data. World for making this all possible. Thank you to Carly, she is our producer who works in the background and makes these awesome image popups happen and coordinates all this stuff. So thank you so much, Carly, and thank you to all our listeners. You all have really surprised us and delighted us. We never knew when we started this a little less than three years ago of what this would become. So thank you, thank you, thank you. And we really look forward to 2023 with you all.
Juan Sequeda: Cheers, everybody.
Tim Gasper: Cheers.
Speaker 1: This is Catalog and Cocktails. A special thanks to Data. World for supporting the show, Carly Bergoff for producing, John Loyans and Brian Jacob for the show music, and thank you to the entire Catalog and Cocktails fan base. Don't forget, subscribe, rate, and review wherever you listen to your podcasts.
Well, another incredible season of Catalog & Cocktails concludes this week with hosts Tim Gasper and Juan Sequeda.
Join in for the ultimate takeaways of the takeaways as Tim and Juan recap best moments, favorite hot takes, and the most controversial opinions over the last season.
Listeners, please submit your feedback: https://forms.gle/FdjMfarUaVnJ3SzB9