Takeaways with Joe Reis and Matt Housley of Ternary Data
Intro Announcer: This is Catalog and Cocktails. Presented by data. world.
Juan: Let's move to the lightning round, which is presented by data. world, the data catalog for your successful cloud migration. I'll kick it off. So you mentioned InMon, Kimball, they don't apply perfectly to the modern analytics landscape. Is there a new modeling paradigm that will emerge?
Matt: I'll say there will be that. So yes.
Matt: I think in the near future we are going to be-
Juan: Right, but George said first," TBD." Matt said first, yes.
Tim: Matt says yes.
Joe: I prefer-
Matt: I think it's coming.
Joe: Yeah. I mean, I'm working on something right now, so we'll-
Joe: ...we'll talk about it more when it's a bit more ready, yeah.
Tim: I like that you'll have to share more as that keeps evolving. I liked your comment about like," Well, how does streaming and some of this other stuff fit in?" So I think it's interesting to think about the big picture.
Joe: Oh yeah, this is a clue. Yes.
Tim: Yeah, interesting. So second question, you both kind of mentioned about data engineering like, as the tooling gets easier, as the technology gets more advanced, the data engineering maybe is actually going to disappear potentially as a title or as a role. Curious about, we're seeing this rise of analytics engineering, and curious to see, do you see this of moving to analytics BI, the business questions, a little bit more of that analytics engineering flavor? Do you see that as being a likely successor, or likely shift here? Maybe Matt, if you want to start.
Matt: What I'll say is that I see kind of a fragmentation of the data engineering role happening, and so maybe that's where the title is going to go away. I think analytics engineers are likely to take over a lot of work done by data engineers right now, especially making sure data is flowing appropriately into the business, into various teams. And then other parts of data engineering will probably either move under ML engineering, or get some new title that's like something about ML oriented data engineering that's a bit more specific. And then you're still always going to have these engineers that work on the guts of systems at Snowflake and Google and such, and so maybe we'll find a new title for them because their job is really quite different from what most data engineers are doing. Like Joe was saying, most data engineers have evolved out of doing that, and yet if you have these products then someone has to work on them behind the scenes. You need hyper specialists who are working on these systems and there's got to be some good title for that.
Tim: I mean, I don't see analytics engineering as being anything new. It's been around for decades.
Juan: Yep. What comes around goes around.
Matt: True enough.
Juan: Those who don't read our history are doomed to repeat it.
George: Oh, you just come with different names for stuff, right? Data engineering is the same way, but yeah.
Tim: Marketing is fun.
Joe: It is fun.
Juan: This is why I say we got to be critical, or you got to be critical.
Matt: Yeah, yeah, for sure.
Juan: Right. And the next question. Is the best way for a data engineer to learn data modeling, is it a hands on experience, or can reading your book, or a book do justice?
George: I don't think reading our book's going to teach you data modeling. It'll expose you to the concepts. But as we point out, there's a lot of books out there that are like 500, 600 pages long, so do the hard work. I would say read the books but also do it, practice it, right?
Matt: Yeah, I'm going to agree with Joe on this one and say you have to go read the classics and then synthesize them into something that you own through a combination of thinking and doing. And hopefully that story will improve over time where there's more of a guided journey so you don't have to go off on your own quite so much.
George: I mean, it's reading a book on dating at some point, you know can read books on dating, or you can go on dates and so...
Juan: Oh, my god. This is perfect, right? Okay, learning data modeling is like learning how to go date. You can go read the theory, but you have to go practice it.
George: Yeah, I mean, and please don't read The Game, or some stupid book like that.
Matt: You heard it here first, don't read The Game, apparently.
George: I know guys who have read that book. I just sit there just cringing, I'm like," Oh, man this is pretty bad."
Tim: It's all the secrets.
George: Still single after 12 years.
Tim: All last lightning round question here. So the 2010s kind of saw the rise, especially towards the end of the decade of the data scientist being this sexy, awesome, critical job. Maybe data engineering longer term is going to disappear, or something like that. But in the shorter term, is that really actually that sexy, critical job of the early 2020s? Is it data engineering?
Matt: I would say it has been for the last two to three years. And I think the open question is around economic transformations that we might or not be going through right now. I think we've seen a huge talent shortage since maybe at least 2017, 2018. Maybe going back further if you include big data engineers. And the question is... I don't know if we go through a recession, or something, does that change the conversation? Maybe it does.
George: Well, I mean the conversations I'm having with people where cost management comes up a lot, FinOps, I would say any data professional, engineering or otherwise that understands cost management, you're going to stay employed hopefully, unless your company implodes, which could also happen.
Tim: That's a good skill to add to our list here.
Juan: Yeah, I just added-
Tim: Cost management is a huge one. I would say cost engineering like that, the next wave of startups I think that in a data space you're going to be at cost management for cloud-
Matt: Yep, yeah.
Tim: ...tools is so opaque.
Juan: Yep, can definitely see that.
Matt: The problem is that there's no proper training for this, and if you came from the previous generation of data engineers, then you were taught performance management. So it's like how do I optimize more Oracle systems, and optimize queries, not cost optimization, it's a different problem.
Juan: And that's when you have to think about people and money and more things, right?
Juan: And that thing changes, this is a-
Tim: What's the ROI, right?
Matt: Yeah, exactly.
Juan: Yeah, excellent point. All right, T- t- Tim, take us away, your takeaways. Go first.
Tim: All right. So Matt, you kind of kicked us off with mentioning shiny object syndrome, or magpie syndrome. When we talked about where data engineering or data engineers might become very... either distracted, or very invested from a technology angle around tools. And I think y'all kind of brought up why did we get into engineering in the first place? It was being able to do cool stuff with cool technology, and so therefore we're technologists at heart. We're interested in this discipline of applying technology and as technology evolves, that's exciting and we want to jump in on the new stuff. And also y'all mentioned about resume driven development and how as this new tech comes out we want to take advantage of it. Putting it on your resume, whether it's for your own benefit, or because employers are looking at that kind of stuff, that it becomes a focus. And as we adopt new technologies, as companies, as enterprises, we want to," Oh, we want to implement airflow." Or, something like that. It's," Okay, well let's hire somebody who knows Airflow." And sometimes it becomes easy to kind of go in that direction, or you're wearing a Coursera shirt, right? We can take some courses and we can pick up some new skills, so it is nice how easy it is to do that now. And there's a little bit of this lack of emphasis on people in process and tech is becoming more and more the easy part. And that leaves open- ended some of the hard stuff, which is more the people in the process stuff. We talked a little bit about history, looking back to especially the big data Hadoop kind of phase of things, and that was a great example of where technology was a big part of the conversation, and as we've moved past that phase, now we can look back with open eyes 20-20 on what it really was, and it was valuable but there was a bubble that happened. And I think now those who know their history want to see that not happen again in the future. And we talked about what skills data engineers are really focused on and can get a lot of value from. And some of the ones that we wrote down were assessing questions, like really being able to look at questions and answer them, and figure out how to answer them. Assessing technologies based on business problems, so not just technology for technology's sake, but the applicability of technology given the kind of problem that you're trying to solve. And this mention of enterprise data engineering. So a lot of this activity around modeling, around cataloging, and governance, schema, there was a mention about Maslow's hierarchy. Maybe some of the basic blocking and tackling now is being made a lot easier, we're addressing a lot of that stuff and now we're being able to handle these and focus on some of these things that are a little higher up the hierarchy. So I think that's a good thing. Yeah. Juan, what about you? Takeaways?
Juan: Well really, yeah, let me continue on The skills one, we talked a lot about data modeling. This is something top of mind of all of us right now. Like you said, you're guesstimating data engineers, around only 20% of them know what data modeling is, right? And I would agree with you on this. We really need to update the classic techniques to the modern world of analytics right now. We were talking about how it's just the science and an art, and there's the stuff that we need to go figure out given the state of today. We need to learn how to go talk to people, communication is key. Like, where is this happening? Where are you actually teaching this? Like, if you get a computer science degree... I did computer science for a long time, no communications, but is this happening in MIS, or stuff? I think this is a key thing about communication. And one that just came up was cost management, this is been focused before performance management, cost management is next. We talk about, what's next for data modeling? And hey Joey, just put a hint there. Is it something about streaming graphs, or whatever? So what are the new paradigms on data modeling? Something you said that I fully agree with, data projects don't fail technical reasons, it's because the data teams are not aligned with the people who they go serve, so they fail for the people in the process, not for the technology. The whole life cycle of data, in the early days it was well understood, we had to go bring data, we transformed it, and we go use it. But I think we've given so much to technology right now and it's really distracted us and we need to go focus on self- actualization of data engineers. And now we have cataloging, and governance, and modeling. So how does this fit in the life cycle of data? We have all these roles where there'll be a consolidation of these roles in the future. Yeah, looks like, I think we agree that there will be. And then finally we talked about the tabular unstructured data, and it's so hard to pick up the subtleties of tabular data. And personally, I think there's a big channel to future opportunity there, and how much data actually needs to be tabular. One random thing that you said earlier on, I love this legacy is a condescending way to refer to something that makes a lot of money, I love that. I'm going to close with that. Matt, Joe, how did we do anything we missed?
Joe: That's good.
Matt: This was a great chat, thank you for having us on.
Joe: Yeah, it's been fun.
Juan: All right, throw it back to you guys. Three questions. What's your advice about data about life? Who should we invite next? And third, what resources do you follow? People, blogs, newsletters, books obviously go get your book, but what else?
Joe: Matt, do you want to give the advice part?
Matt: Yeah, I'll just give advice for aspiring data engineers, and it goes back to an internal conversation that we had at Ternary Data this morning. We were talking about lifelong learning, and how you really have to be a self learner and a lifelong learner to succeed in data engineering. And so going back to the conversation that we were just having about people and process, I feel like if you want to be a successful data engineer, you start bundling the people and process stuff, which hopefully you can learn from our book. Now this is what I'll tell you, our book will not teach you data engineering, that's truly bizarre for a book that's about the fundamentals. Rather, it's meant to give you foundations so you can start that lifelong learning journey and get into the profession. So learn about people and process, learn the big picture, and then embark on the journey of actually learning the technology and learning the practices to be successful. If that's what you'd like to do.
Joe: As far as who you'd invite on, I'm going to recommend Bill Inman. He's working on some really cool stuff with text right now, and he's a very good friend of mine, always inspired by him. I can only hope that when I'm his age I'll be contributing a fraction of what he is right now on a daily basis to the data world. I really feel like he's still at the top of his game, which is really cool.
Juan: I would be truly honored to meet him, and yes, look forward to connecting with him through you and have him on the show for sure.
Matt: That'd be awesome.
Juan: Thank you.
Juan: And finally, what resources do you all follow?
Joe: Let me see, lots stand up comedians...
Matt: I'll give you a short list. Yeah, yeah. So Ben Stansil in the data space. I think Ben is awesome.
Joe: He's good.
Matt: And he's very focused on the fundamentals. I'm going to give you two other names that you're probably familiar with that are not technically in data. One is Kelsey Hightower I think Kelsey Hightower mostly worries about containers and other technologies, but he's super, super pragmatic and so I think he has a lot of insights that impact data as well. One of my all time favorite Kelsey Hightower talks is a talk he gave about AWS Lambda when he worked at Google, which is an Amazon competitor of course. I always feel like I learned something from his talks about data, even if he's not focused on data. One more name on the FinOp side is Corey Quinn, you guys know who Corey Quinn is probably. So totally focused on cloud cost management, very entertaining, one of my favorite all- time data YouTube videos is his happy birthday to Larry Ellison video, but be warned it's not safe for work.
Matt: You know which one I'm talking about too.
Joe: Yeah, I know which you're talking about. People I'd recommend following, and there's a lot. I think in the LinkedIn filter bubble we're all in... Or, actually, you know guys are too I'd recommend, and I think it's awesome. Ethan Aaron, I like a lot of the stuff he's coming up with these days. I don't know, there's a lot of people I think of. So yeah, I would say follow all of us and then you'll be exposed for better or for worse to some great data people.
Juan: All right, then finally, go get your book. I am-
Joe: Get the damn book, yeah.
Juan: Literally, I've just been opening it up just random places and I'm like," Oh, wow." Just I'm very impressed, really excited about.
Joe: It's a gold mine. I mean, I hate to be shameless about our book, but I mean a lot of people have read at this point, I think it's universally gotten a book of really good recommendations. I think the only fault of it is, like somebody wrote on Reddit, and it's Reddit so take it for what it's worth, but it's like," Oh, I already knew all this stuff in the book, so I didn't really get anything out of it." And I was like," That's more of I think an admission of how awesome you are." But that's not a knock against the book, right? It didn't make you a worse person as a result.
Juan: I just opened up this right now to this page 196 on storage, you have a magnetic disk.
Joe: Oh, that's... yeah.
Juan: Right, my dad didn't finish his PhD, went off to IBM in 1970s and worked on this a lot. His PhD was all applied... So this is so cool, you guys even go into hard disk drives and stuff, so that's awesome. All right, well next week as we said, we're going to have Ollie Olsen, he's the author of the upcoming O'Reilly data catalog book, and I will be live with him.
Juan: I'm in Europe next week and I'm going to be with him. That's going to be fun because we're probably going to be 11: 00 PM live while we're drinking some wine. And Tim will be 4: 00 PM over here, so it'll be a fun coffee with that.
Joe: Good, inaudible.
Juan: With that, thanks as always to our sponsors data. world, we get to do this because data. world supports us, so the enterprise data catalog. Thank you data. world. And thank you Joe. Thank you, Matt. This was awesome inaudible.
Joe: Of course, anytime.
Juan: And also go follow you, follow your podcast and everything, we love it.
Joe: Thank you.
Joe: Yeah, Monday Morning Data Chat, cool.
Joe: All right thanks guys. Thank you.