Keeping it 100 about metadata; the data stack glue w/ Fraser Harris, VP of Product, Fivetran

Media Thumbnail
  • 0.5
  • 1
  • 1.25
  • 1.5
  • 1.75
  • 2
This is a podcast episode titled, Keeping it 100 about metadata; the data stack glue w/ Fraser Harris, VP of Product, Fivetran. The summary for this episode is: <p>Answering critical business questions relies on integrating data from a variety of systems. But it takes a lot of work to understand what the disparate data means and how it all fits together. How do we make data as reliable as an electricity?</p><p><br></p><p>Join <a href="" rel="noopener noreferrer" target="_blank">Tim Gasper</a>, <a href="" rel="noopener noreferrer" target="_blank">Juan Sequeda</a> and <a href="" rel="noopener noreferrer" target="_blank">Fraser Harris</a>, VP of Product at <a href="" rel="noopener noreferrer" target="_blank">Fivetran</a>, as they celebrate the 100th live episode of <a href="" rel="noopener noreferrer" target="_blank">Catalog &amp; Cocktails</a> and discuss how <a href=";highlightedUpdateUrns=urn%3Ali%3Aactivity%3A6978031455914323968" rel="noopener noreferrer" target="_blank">#metadata</a>, <a href=";highlightedUpdateUrns=urn%3Ali%3Aactivity%3A6978031455914323968" rel="noopener noreferrer" target="_blank">#datacatalogs</a>, and <a href=";highlightedUpdateUrns=urn%3Ali%3Aactivity%3A6978031455914323968" rel="noopener noreferrer" target="_blank">#dataintegration</a> act as the power source for your connected enterprise</p><p><br></p><p>Key Takeaways: </p><ul><li>[02:02&nbsp;-&nbsp;03:49] Cheers to 100th episode, good health, children, and sky miles</li><li>[04:10&nbsp;-&nbsp;05:11] Keeping it 100, Millennial and Gen Z slang</li><li>[05:12&nbsp;-&nbsp;07:05] What metadata means to Fraser, the data about the data</li><li>[07:07&nbsp;-&nbsp;10:00] Fivetran's new metadata API</li><li>[11:32&nbsp;-&nbsp;13:33] Action, enforcement, and results in understanding data management</li><li>[13:52&nbsp;-&nbsp;17:51] Data contracts and the interface</li><li>[17:53&nbsp;-&nbsp;20:07] Upstream notifications and transforming data</li><li>[20:55&nbsp;-&nbsp;24:01] Perspectives on having a system and record owner for data contracts</li><li>[24:31&nbsp;-&nbsp;30:38] Representing business process change in contract evolutions</li><li>[30:40&nbsp;-&nbsp;31:58] Cultures around data at newer companies</li><li>[32:15&nbsp;-&nbsp;34:34] The two main use cases of Fivetran's data and the impact analysis</li><li>[34:34&nbsp;-&nbsp;36:25] Two dimensions to data proactivity, data maturity and company size</li><li>[36:26&nbsp;-&nbsp;41:26] Steering data complexity to simplicity, business value behavior and technology costs</li><li>[41:28&nbsp;-&nbsp;43:47] Reliability and data pipeline</li><li>[43:52&nbsp;-&nbsp;45:10] What Fraser wants to see happen around metadata</li><li>[45:10&nbsp;-&nbsp;47:15] The process of migrating to the cloud and adopting new data policies</li><li>[48:47&nbsp;-&nbsp;55:25] Lightning round</li><li>[55:28&nbsp;-&nbsp;01:00:43] Tim &amp; Juan's takeaways</li><li>[01:00:57&nbsp;-&nbsp;01:02:49] Three questions for Fraser</li><li>[01:04:43&nbsp;-&nbsp;01:05:14] Next week's guest, Rupal Sumaria from Penguin Random House</li></ul>
Cheers to 100th episode, good health, children, and sky miles
01:47 MIN
Keeping it 100, Millennial and Gen Z slang
01:00 MIN
What metadata means to Fraser, the data about the data
01:53 MIN
Fivetran's new metadata API
02:52 MIN
Action, enforcement, and results in understanding data management
02:00 MIN
Data contracts and the interface
03:59 MIN
Upstream notifications and transforming data
02:14 MIN
Perspectives on having a system and record owner for data contracts
03:05 MIN
Representing business process change in contract evolutions
06:06 MIN
Cultures around data at newer companies
01:18 MIN
The two main use cases of Fivetran's data and the impact analysis
02:18 MIN
Two dimensions to data proactivity, data maturity and company size
01:50 MIN
Steering data complexity to simplicity, business value behavior and technology costs
05:00 MIN
Reliability and data pipeline
02:18 MIN
What Fraser wants to see happen around metadata
01:17 MIN
The process of migrating to the cloud and adopting new data policies
02:05 MIN
Lightning round
06:37 MIN
Tim & Juan's takeaways
05:15 MIN
Three questions for Fraser
01:52 MIN
Next week's guest, Rupal Sumaria from Penguin Random House
00:31 MIN

Announcer: This is Catalog& Cocktails, presented by data. world.

Tim Gasper: Hello, everyone. Welcome to Catalog& Cocktails presented by data. world, the data catalog for leveraging agile data governance, to give power to people and data. We're coming to you live from London. It's an honest, no BS, non- salesy conversation about enterprise data management, with a tasty beverage in hand, and lots of Big Data London going on. I'm Tim Gasper, a longtime data nerd, and product guy at data. world, and this is Juan.

Juan Sequeda: Hey, Tim. I'm Juan Sequeda, Principal Scientist at data. world, and we are here. I am finally traveling with you...

Tim Gasper: We're both traveling.

Juan Sequeda: Yes, last week I was in Paris. I can't believe that, now I'm in London with you, and this is so freaking exciting. There's so many reasons why to be excited. Number one, our guest. Our guest today is the VP of product of Fivetran, Fraser Harris, Fraser, how are you doing?

Fraser Harris: I'm rocking, rocking to my own tune.

Juan Sequeda: Awesome. Well, here's so much stuff that we're excited about. One, Fraser's here. Fivetran is one of the super awesome companies who've really revolutionized data integration. You guys, because you're such a center of the modern data stack, you recently launched your whole metadata API, which we really want to go talk about this, because I think that is a game changer right now. We are in London, in Big Data London, and it is our 100th episode, live episode that we've done, and I cannot believe that we've been doing this for over two and a half years. This is such a big, freaking amazing moment. So glad, Fraser, you can accompany us on this special day here today, and cheers, I'm just super excited to kick this off.

Fraser Harris: Yeah, absolutely.

Tim Gasper: Well, congratulations gentlemen, that's a lot of perseverance, 100 episodes.

Juan Sequeda: So in that spirit, we're like, okay, so tell and toast, so what are we drinking today and what are we toasting for? You go first.

Fraser Harris: Well, I'm drinking coffee because it's two o'clock on the Pacific coast here. And I have very small kids, so I need all the coffee I can get.

Tim Gasper: Nice.

Juan Sequeda: And what are you toasting for today?

Fraser Harris: Good health. Let's just keep this good health going, yeah.

Tim Gasper: Love it.

Juan Sequeda: All right. So Tim, what are we drinking here today?

Tim Gasper: Well, I'm going to drink to traveling in- person with you, and being able to do some fun networking and learning and presenting. We gave a talk today, and just hanging out with all these great folks in London around Big Data. So I'm going to cheers to that, and I'm going to cheers to 100 episodes.

Juan Sequeda: 100 episodes. And we're both drinking, we're sharing a Guinness right now. So again, we're at their hotel in London, at the Hilton Olympia and we're trying to get a drink downstairs, and they said," No, you'll have to order at the table." It's just too much drama so anyways.

Tim Gasper: So mine looks like an Irish coffee, yours is a Guinness.

Juan Sequeda: I got half of it.

Tim Gasper: Yeah.

Juan Sequeda: And I'm going to cheers, not only to our 100th episode, but on my way over here, I flew over my one million mile mark, on United.

Tim Gasper: Wow.

Juan Sequeda: I am officially now a one million miler. It took me 16 years, even though I didn't travel for two years during the pandemic, so I made a million miles in approximately 14 years, and that was on my way over here, so it's a million miles plus a 100th episode, a lot of ones and zeros going on.

Tim Gasper: Mm- hmm.

Juan Sequeda: So let's the cheers to.

Tim Gasper: That's awesome. To miles, to 100 episodes, to being in- person and traveling, and also to good health.

Juan Sequeda: To good health, cheers, and to children.

Fraser Harris: I'll cheers to that. I don't know if a million miles is, it's kind of an anti- measure in some ways, that's a lot of time on a plane.

Tim Gasper: At some point it's a bad thing, right?

Juan Sequeda: On my way over here, the captain let me go into the cockpit and he made an announcement, and everybody was congratulating me for a million miles, and I'm like," Well yeah, this is really cool, but was this a good thing?" I don't know, but anyways, that's a topic for another day. All right, we got our warmup question today. We learned from our producer, Karli, that keep it at 100 is slang for being truthful and honest. Honest, no BS here. So what's another millennial or Gen Z slang that you've learned, loved, and what does it mean to you?

Fraser Harris: Mine's actually the peach emoji. I just love that. It's like," Yeah, that is a little cute bum right there, I guess."

Tim Gasper: That is a great one.

Juan Sequeda: Okay. You win. You know what? The funny thing is that we just told Fraser this question, literally two minutes before we went live, and I had no idea that he was going to come with an answer, and I can't follow up on that. The peach emoji and the eggplant together, right?

Tim Gasper: Oh, my goodness. Yeah.

Fraser Harris: This is a work event. Let's not talk about the eggplant, but the peach emoji.

Juan Sequeda: All right, all right.

Tim Gasper: No egg plants, no egg plants.

Juan Sequeda: All right, let's dig in. Keep it simple. Fraser, honest, no BS, what does metadata mean to you?

Fraser Harris: Yeah, it's the data about the data, or really in our case, the data about what's happening. And so it helps you understand where data's coming from, where it's going, how it's changing, and who's responsible for those changes. I'm sure there's much more metadata you could talk about in terms of when data at rest and who has access to it, et cetera. But for the data movement piece that we're responsible for, it's all about that, who's changing and why?

Juan Sequeda: So I love what you just gave it, a very small, but actually very important twist right there, because people traditionally say," Yeah, metadata is data about data," but it's like, what's actually happening? And I think this is something that when we think about it, it's not just the static world of, I got data here in one place and tell me things about it, is that data got here in some different ways, we need to understand how that got that movement around here. And to that you said who's responsible? It's something that we bring up a lot, it's not just about, again, the data itself, but it's the people, the processes behind that stuff and explaining, the why. So I think that's where the metadata is key here, I mean, I call this a lot, the glue that puts it all together.

Tim Gasper: Yeah. The glue, the context.

Fraser Harris: Yeah. And one major use case is around data discovery, which you folks are intimately familiar with, and then the second major use case is about compliance and governance, so be that like legal compliance is like socks are for public companies, or when you're getting financial controls, and then down to data quality level of, is this data correct, can we trust it? It has an obvious interplay with data discovery. There's three data sets, they all look kind of the same, which one is everyone using and why should I use it?

Tim Gasper: Yeah, which one's the good one, right?

Fraser Harris: Yeah.

Tim Gasper: And you all recently launched a metadata API, and that was some pretty big news in the data world this week. Do you want to mention just quickly about what that is?

Fraser Harris: Yeah. Sure. After coming from product, after an enormous amount of research, we realized just the value of powering an ecosystem of tools like data. world, in terms of what we understand about the data, what we can comprehend. We know where it's coming from, where it's going, we can call level lineage, and then we can tell you about the changes to the data. So this really unlocks the first time to be able to do data governance of data in- flight. So before, when PII is coming through the pipeline, before it arrives at the destination, you can say," Hey, wait a minute. What are the policies that should apply here, and should we allow this?" And that's really groundbreaking for organizations, enterprises. I'll give you a concrete example of that. Citibank was fined$ 400 million and the text from the regulator was a failure of data governance compliance. And so they thought they were doing data governance by cataloging their data, but that's only step one. It's like," Oh, great, now you're aware of where it is." But how do you actually ensure that those policies that you're creating are being enforced on the ground level, and that's where it all breaks down. And traditionally you build up these big programs involving data stewards. So you usually have legal, and your security teams generating policy documents. And then these data stewards are supposed to go through and each data set manually be like," How should we apply these policies?" And it can be pretty dang complex. It's like, which policy applies it comes from? Where is the data coming from? What is in the data? Who has access to the data, based on where it is at rest? And ultimately these programs often failed. So you would end up with small pockets within an enterprise that are effectively doing data governance. And then other people are just like," I don't have time for this." And so what we're hoping to ignite with the metadata API is this idea of automated data governance. How can we represent the data or the metadata as structured data that tools can consume? And then you can take action on those, you can look at that data and say," These are the policies that should apply." And then using the API in the reverse direction, you can inform us as the data movement layer," This is what needs to be applied to this data." So the key there is that Fivetran, we're not a data governance tool, but we're a key place of enforcement. And we ultimately see all the tools in the ecosystem adopting this. So we're really driving towards an open standard that everyone can get behind to create these customer outcomes that are so freaking painful.

Tim Gasper: Yeah, absolutely. And data. world is very, very excited to be launching our Fivetran integration as part of all of this, with integration to that metadata API that you just mentioned, focused a lot around this sort of use case around discovery, and compliance to some degree too. And I think what's exciting about all of this, especially, is that I think that a lot of times people think of cataloging more traditionally from a static and more of a warehouse and data lake perspective. And I think what we find these days is that a lot of the most interesting transformation logic and actual changes to the data, they're happening in the integration layer, they're happening at the modeling layer, they're happening in the transformation layer. And so places like Fivetran is where a lot of that really interesting metadata actually resides, things like DBT have a lot of really interesting metadata. So I think there's definitely a shift here away from, in addition to just the static kind of places where data lives, to also these places where there's a lot of data movement going on.

Juan Sequeda: Yeah. And I want to unpack a couple of things that you said there, specifically, you said about action, and then you brought up about enforcing these contracts. So I want to get to these two things. One actually, our episode last week with Ole Olson, he's the author of the enterprise data catalog book upcoming in O'Reilly. We were talking about the data knowledge, or let me get this right, data, knowledge, action, and then results. And all I kind of need are results. And I think at the end of the day, you're tracking all this metadata because you need to go do something with that. That's the action, but there needs to be a final result of what that is. And that result can be," Oh, we're complying to regulations and stuff like that." But I think we need to be very clear about what these results are. And I feel that sometimes we keep it really kind of like," Oh, the result is a notification. Oh, my result is, you get this thing in JIRA or Slack." That's still a means to an end. So I'm very curious to ask you is, how are you seeing, what is the next thing from that action like? What is the results that we're expecting to have with the metadata? In addition to things like Fivetran, they're moving the data, you're releasing all this metadata, so other catalogs us can go do things with it. What are the results that the organization are expecting to go have with this?

Fraser Harris: Well, at the very highest level, the result is that your CIO or your compliance officer can sleep easy at night. They can say with confidence, the CEO," We have protections on our data and we are not exposed to this risk." A breach is not going to result in, what was it that Home Depot had, a billion credit cards, like 300 million credit cards, some insane number. And they ended up paying, what's that compliance? PCI. PCI charges you a dollar per credit card that is exposed. So that was a$ 300 million fine for that. And those are like, that's real money we're talking about this. And then you've got the reputational damage as well on top of that. So people are kept up at night. Data is absolutely an asset, but you also have to think about as a liability until you have the controls in place that you really have control. You feel confident that you're not taking unneeded risks.

Juan Sequeda: All right. I like it. It's kind of very, very simple, but very powerful. And I bring this up, I ask this question always is, what keeps you up at night? And these are the things that can definitely keep up executives at night because there's a very big dollar sign associated to that. And so connected to the next part on the contracts, this is a topic that we're seeing more and more on data contracts. So first of all, what is your definition of a data contract, and how do you see the data contract? Where does it fit within the entire ecosystem? Because I can say, is it within a data integration tool like Fivetran? I mean, the contract itself is metadata should probably live in a catalog tool, but how does this all look like within your view of the data ecosystem?

Fraser Harris: Data contracts, it's a new word for something that's pretty old and software engineering, you would call this an interface. So you have two different parties, what's the interface between them? And that's an agreement from the provider, as long as you do your work program against this interface is guaranteed to work in a certain way. And so in the case of data, it's like, well, if you're running SQL or some kind of workload on top of this dataset, I'm guaranteeing that it's going to work this particular way. It's a very important concept. It gets at the heart of what's difficult about data integration in any reasonably complex organization, is that the person who's creating the data has no responsibilities with respect to the person who's consuming or working off of the data. And what I mean by that is, the most important data is usually living in some kind of system of record or production database, and the work you're doing there in terms of the changes you're making. We all know that our production database or your engineers have this crazy testing environments and CICD, and changes are rolled out extremely slowly, and we have SRE around that to make sure that any failures are rolled back, et cetera. But then all those things that are happening, are not being communicated at all to the data team, who's way off somewhere else, sometimes in a different building, or like in today's age, it's a different Slack channel. But you're not communicating at all, and as a data team, you're just on the receiving side of these changes. And this is the hardest problem in data. We can say," Ah, data contracts." It's like,"Well, it's not that simple," because fundamentally it's a people problem and a who's responsible problem. And so, there's some ideas here of like," Well, if you put some intermediary technology, like Kafka in the middle and apply as a schema through, you register a schema on that data, now that's your contract." And it's like," Okay, but you have achieved an interface now, but who's actually responsible for fixing that?" So the engineering team deploying to the production database is now responsible for going and fixing things in Kafka. What we've also seen a different pattern is, well, you land the data in what, very commonly we call that a raw or a bronze schema, and then you transform that, or expose it as via view or transform the data, and then expose the transform dataset. And then that's the interface that your analysts are working off of. So you can separate this physically, I don't recommend it, these processes, you can separate it logically within a data warehouse or a data lake. It's much simpler to do it that way. But again, it's really about how do you get the people making changes responsible for downstream? And it's just really hard. What we do internally, because we have this problem, or everyone has this problem. What we do is, we run tests through DBT on data and then failures are, it's a Slack notification. So if different datasets fail, we actually notify the person who's responsible upstream. So there's a person from engineering who's the responsible party, who's always Slacked when there's errors between the two, and then the person downstream and the data team. So that's our solution to it. It's definitely a ripe area for us focusing on as a industry.

Tim Gasper: That's interesting. So can you say that one more time? So Fivetran, you're addressing this by slacking the person upstream, and how else does that work?

Fraser Harris: So when the test fails, all the tests are associated with specific people. And so when the test fails, it pings a slack channel with information about this failed, and it tags the people who are upstream responsible and downstream responsible, and then they can start a conversation right there," What did you change? This is no longer working. What expectations should be here?" And the reason we went with that approach is it wasn't feasible to integrate all of our downstream data work into the engineering CICD process. And I suspect that almost every company would make a similar decision. If you're coupling those two things, it's still a valid approach, but it causes a lot of extra work.

Tim Gasper: Yeah. That makes sense. And we have somebody in the comments here saying that if you mention data ops or data governance in my office, you have to take a shot, and that is not a fair game, we're not going to play that game.

Juan Sequeda: Michael's the one that made that comment, data ops and data governance. So cheers.

Tim Gasper: Why you doing that to me, Juan?

Juan Sequeda: Yeah. All right.

Fraser Harris: Do you solve similar problems for

Tim Gasper: Yeah. We end up having to solve a lot of similar problems, and a lot of what you're talking about right now really strikes me a lot around lineage. The upstream, downstream has a lot to do with the lineage of, how is the data being transformed? How does it actually get derived and moved through the data pipeline, whether it's into a bronze zone, then into some other sort of a transformed zone or a silver, a gold zone or whatever it might be, into ultimately either the normalized model that you're trying to expose or to, we're finding these days, a lot of folks actually are going back to then a wide table, more analytics- ready types of tables to do a lot of analysis. And this kind of ties to a question that I want to bring back to you, which is that, interestingly, you talked about responsibility around the person who wrote the tests, and then there's sort people upstream and downstream. And I'm kind of curious about, whether it's your own experiences as a Fivetran, or if you think about some of your customers and things like that, who should the contracts apply to who's responsible to those contracts? And when you say upstream or downstream, who is that? Are these sort of the owners of those tables in more of a governance sense, or do you think of it differently?

Fraser Harris: Yeah. Well, I'm thinking of it in a world where upstream is some kind of production system or system of a record, and downstream being some data destination via a data warehouse or a data lake. And that it's just the constant tension in a organization.

Juan Sequeda: But I think we also need to consider that sometimes, well, where do these contracts go? So you can start to find them in different places between the entire pipeline of things. Now, if you are the actually owner, creator, that system of record, then those contracts, it's in your power to be able to have them as close to where the data's actually generated, but some points, depending on how things are being separated and who's taking that responsibility, taking that ownership, that's where that cultural aspect says," No, no, it means this thing here, but it means this other thing here." So this is where the semantics and the meaning of that contract, say it can be different. So this is a mix of cultural aspects who takes responsibility for that. And where is that going to be implemented and what technology, and so forth. And I think this is something that I'm seeing a lot kind of people not, there's no clear answer to this, that's one thing. And the other thing that I'm observing too, get your insights on this is, when do I start doing this?" I'm too a smaller company, I don't need to do this, I'll do this later." And you hear all this back and forth conversations about it." No, set up these contracts like that, this is slowing me down right now. I just need to go move faster right now," and so forth. What's your perspective on this?

Fraser Harris: Yeah. Lots of perspectives. Well, first of all, having the system of record owner or that business unit own the contract, I think is ultimate. If you can get that, great, but usually as a data organization, you don't have that influence. It's just the reality that they're like," I have bigger problems, I'm doing these schema updates to create these changes for this business process, and you just have to accept that I'm changing that process." That's often just the reality. I'm thinking of a major insurance company we work with that everyone knows the name of, and they have 50 different business units, and the business units are just like," We accept that we have to expose our data to you, but whatever you do with it is up to you, and we're just making changes." And that's the tension in the business. And I'm not actually sure that's a bad thing, they are affectualizing the business outcomes that we want. They are the ones creating revenue or doing whatever. And ultimately data is extremely important, but we shouldn't think that we're more important than that overall outcome of the business.

Tim Gasper: Yeah. That seems a little risky. This idea of when you said that, the first thing that came into my mind was, you can use this data, but at your own risk, good luck. And that seems like maybe in certain cases that can work, but a lot of cases that can't work. And so I wonder how we deal with that as an organization. How do we deal with the fact that, well, no, actually sometimes people do really need to rely on this data, it needs to be a utility that they can depend on.

Fraser Harris: Yeah.

Juan Sequeda: Another thing is, up to now, I think the tone of our conversation has been more about on the protective side, let's make sure that all the data comes out in a way that we're not going to get fined and these contracts are there, so everything is good. Let's talk about some of the offensive opportunities to go do with metadata. I got a lot to say here, but I'm curious to see what's your perspective?

Fraser Harris: Before you move on, something really important is that the data contract, we're talking about the underlying system evolving, and that means often the contract has to evolve regardless. And so the contract has to represent what the actual underlying business process change is. I guess this is the key question is, how valuable is it setting up a very static contract that you're going to end up updating anyways, versus just being more reactive to when the contracts change, or when the assumptions about the business are changing, but just making sure you really understand when that happens. I guess something we think about a lot.

Juan Sequeda: So before we switch the topic, being more kind of opportunities to do more offensive stuff with metadata, you said something which is really important on the business processes. And I think we touched about this last time we chatted this is, how do you see metadata here being kind of connected with business processes because we need to keep track of that too, and how is that being tracked? Because a process change, and then that happens before it actually gets reflected within the data ecosystem and then shows up in the metadata. What is your perspective on keeping track and cataloging and getting the metadata of the business processes themselves?

Fraser Harris: Yeah. The extremely hard problems, is the short answer. And this is the fundamental challenge of data cataloging is how do you get people to actually describe the business processes, and you do your best to make the data representative of that data, that business process in a really reasonable way. And ultimately that's the modeling of the data, that's the job of either data engineer or now the analytics engineer, which requires a lot of cross- functional collaboration and building up of domain expertise so that you can effectively do that.

Juan Sequeda: But I mean, if we look at what you just mentioned, like the analytics engineer, I think my impression, and please correct me here if you've seen this differently, is that they're disconnected from how the business works, the business processes around things. They understand there's this question and they need to go understand what that question means and go deliver data for that, but they ideally should, and I believe that they're not, and this is the problem, is understanding the context around this and saying," Okay, this business unit, who's asking for this data to answer these questions, well, is that really what they need? Because this data's coming from this other process, that's coming from this other system, then it's going somewhere else." And they lack this context around this. So I feel that who really understands business processes and are they actually entering and being cataloged in data catalogs today? I would say, no, they're not happening. And this is a very bad thing. And I think this is the opportunity to go improve, but this is how I'm seeing it because we've talked to all our prospects and customers, and I tell them all the time," Start by cataloging your business questions, what are the questions that keep you up at night, and who are they?" And they're like," Oh yeah, I never thought about that. I thought a data catalog was just about understanding what was my data tables and columns and the lineage of the column." I was like," Yeah, but let's get the business context around it." Like," Oh yeah, I never thought." Come on, then why are we just so focused on this technical stuff? So I feel that there's this big disconnect, which is a big source of the problem of why we just keep in our technical bubble. Anyways, I started to rant, I'll shut up.

Fraser Harris: Well, it's getting at the heart of, in my onboarding of new Five trainers, I have this slide, it's a quote from, I think Gartner. 80% of BI projects fail. I've got this flaming forest, it's a great visual. And it's like," Wait a minute, people are spending billions of billions of billions of dollars on this, everyone's doing BI and 80% of projects fail?" And I was like," What?? How does that make sense?" And it really comes down to traditional ETL where, well, really the data warehouse used to be this very constrained resource. And so you would do as much transformation in- flight. And so what that meant was, data engineers with a crystal ball, trying to forecast what you want to do with the data. And then they're understanding the business processes and writing all the transformations. The data arrives, and the analyst starts working with them going," Wait a minute, this doesn't make sense." This isn't how other people describe these business processes to me, and it ended up being this really iterative, extremely slow process of getting to the point where you actually have data you can work with. And so that was the key driving force between ELT is, we're just going to make a complete replica of that underlying data. And then you can very quickly iterate on that as you're understanding of the business and the business processes are evolving, you're just updating SQL very quickly and rerunning it. And so that was the first step of getting us into a much better place in terms of that comprehension. But you can't be a data analyst or an analytics engineer and operate in a silo and not be collaborating with the business units. You've got to be social, you're not going to learn about how the business works by not getting out of the room or out of your own room.

Juan Sequeda: Okay. We're 100 percent in agreement with that. Is this happening in practice?

Fraser Harris: It happens at Fivetran.

Juan Sequeda: Good. Great. I mean, I love this. But I do worry, I mean, let's honest, no BS. We're seeing people," Oh, I'll change my title to analyst, analytics engineer," and stuff like, because you're doing a bunch of SQL and stuff, but wait, are you going out? Are you talking to the business? Do you understand this? I'm definitely not seeing that as much as I wish I would be seeing it.

Tim Gasper: Well, and maybe part of it also is, I feel like companies like Fivetran, companies like data. world, we, I think are newer companies. And so I think when you're a newer company with newer people and things like that, it's sort of a different data culture. Maybe it's actually a little easier to say," Hey, data engineers, analytics engineers, you need to get out of the room, you need to be talking with the other parts of the business," and things like that. When you're a company that's been around for 100 years, the average tenure of people in the data organization is 15 years, 20 years, maybe the dynamics change a little bit, and now you're asking for something a little harder, it's still very important, but a lot harder.

Fraser Harris: Yeah, definitely. Well, I like to say that all problems are actually people problems, when you boil it down. And so a lot of failure that happens in analytics is because the lack of collaboration. And we now have the tools that it's harder to blame the lack of tooling to have those outcomes that you want, but it's hard.

Juan Sequeda: This is a very good quote, I love this," All problems are people problems." It's a lack of collaboration right there. So I do want to get into thinking about being more offensive, being more proactive with metadata.

Tim Gasper: And just before you dive into that, just a real quick comment, data. world, the data catalog for successful data migration, with data. world, you can ensure business continuity and visibility at every stage of the migration process. Thanks data. world for presenting our episode today.

Juan Sequeda: Yeah, 100 episodes. Cheers. Yeah. So I've been asking people lately is, when it comes to, we get all your metadata, we get your lineage, obviously what are the two main use cases? It's impact analysis, if I change this column, what's going to affect that? Yeah, totally get it, we need that. Perfect. And the other one is, where does this come from? This dashboard, I don't trust this number, this number looks weird, or where did that come from? So those are the two kind of basic traditional use cases. What else is out there when it comes to gathering all that metadata, looking at the lineage and all that stuff. What do you think? And what are you seeing, besides those two traditional use cases, what are the opportunities?

Fraser Harris: Yeah. Well, what we're working on internally, and the product roadmap is really profiling the data and doing PII detection, so that you get a much more richer understanding of the data before you have to decide what policies should apply to it. And this is all about being able to be proactive about enforcement of your policies before the data lands in a destination. So traditionally in a lot of tools are scanning the data at rest and then saying," Wait a minute, there's PII here," and that's a very reactive approach. It can also end up being expensive, if you're saying as a company it would be terrible if data's exposed for more than five minutes, that means on every single table you're running all of these profiling inquiries every five minutes, and those compute expenses can become non- trivial. And so our vision is, well, let's do it in the pipeline, give you that control. And also on the fly, you could have a dataset or a doing data movement, and then now email addresses start appearing in a column that didn't exist before. And so that flags, like wait a minute, someone should revisit this, because it's now out of compliance with the policy that you'd applied here. But the profile of the data's extremely important, just to give you a concrete example, this could be email addresses, but if they're all internal company email addresses, that's not nearly as important as, here's a bunch of customer email addresses. So you really have to get down to the layer of what is this data? And really thinking it through.

Juan Sequeda: And I think this brings up the whole notion of knowledge of what does this actually mean, and what is a dangerous email that can get exposed and stuff like that. So it seems like being proactive with metadata is also thinking about when we're going back to our contracts, it's like, let's actually go off and catalog what this stuff actually means, what it's supposed to mean, what are the expectations around these things, and what are good or bad things that can happen or how risky it is. And I think this is something that, how are you seeing this within your customer base? Is everybody starting to go do this, or just much more the mature forward thinking companies are thinking like this? How are they seeing being proactive with their metadata and data contracts?

Fraser Harris: Yeah. There's kind of two dimensions to that, and one is, we could call that your data maturity, but I think the other one really is just your company size. And I'll talk to that one at first, if you're a small company, you've got a five- person data team or something, I wouldn't be worrying about this, I would be worrying about just empowering the business to make yourself a larger company. That's just the reality of it. Focus on the problems at hand. But as you get larger, you start running into more of these compliances apply to you. And when you start, you become a public company or on the marks of being a public company, then you really have to, you're starting to think hard about, you've got general counsel or some kind of chief legal officer and then their compliance needs. You've got the security, your security group, your CISO and their needs. So that doesn't happen usually until you're above a thousand, above two, 3000 people.

Juan Sequeda: Now, one of the other things that I've personally been looking into, like metadata, should be able to go understand how complex things are. And so if you go catalog and you bring in all the metadata, so I'm like, we now understand what people are doing within a tool like Fivetran or what any other ETL tool and stuff, and it's like, wow, this is really complex, one, I don't think it should be this complex number, and the second thing is, it looks like there's a lot of repeated work being done because it's been done in silos. How are you seeing this within, again, your customers and prospects base is, are they doing things that are getting too way out of hand and getting too complex that this should be simpler, or it's really, I mean, life is complex, the world is complex, we just got to live with this complexity, and too bad.

Fraser Harris: As with anything, there's multiple answers. I think one, people love playing with new technology, and often that's a path of," It was really fun for you, did you create any business value going and doing that?" And that happens a surprising amount of time. All I could say is, yesterday, I was chatting with one of our customers, like one of the top five banks in the U. S., and I mentioned that Kafka is the most complex way you could build an ETL pipeline, and that got some roaring laughter out of their team, because it's just very true, people are like," Oh, we're just going to put it in Kafka, and then we can use some case SQL and do some stuff." And it's like you're imposing all these streaming restrictions on yourself, making it much harder to deal with the data just because you don't want to run a query on data at rest. There's much easier ways to do this outcome. Do you really need real- time? And what does real time mean at your business? And it's like," Oh, well, faster than our once a day batch loads." And it's like," Oh, God, that is not real- time, you just need 15- minute updates." As soon as you go below sub 30 seconds, you're imposing these technology costs on you, that bring about a 10X cost in terms of the technology you need for that, and then the team to maintain that. And there's a lot of like," We're bringing real- time to our companies." If you're a leader listening to this, be very suspicious of that and really dig into, is their business value behavior?

Tim Gasper: I love this. Yeah.

Juan Sequeda: No, Bravo on this. And I want to mark this as the 38- minute mark, it's something that we need leaders to listen to what you just said. I'll be, again, honest, no BS. We've had a lot of conversations about when it comes to streaming and with folks from streaming and I'm like, I get it, but then what you just said, do you really need that real- time? And then we'll come up with use cases that you need, and it's like, but is this the truly business critical thing that you go do? And then we kind of start over engineering around this stuff. We really need to be very critical about this, and I'm really glad that you're bringing this up, because we just make life complicated when it doesn't have to be complicated.

Fraser Harris: Yeah. And to be clear, you could do all of that with Fivetran. We have customers doing sub one second, running a billion dollar trading house, running sub one second latencies, there's actually powering their stock trading. We have people doing that, but oh my God, there's a cost and complexity challenge around that, so you really need it before you go down that path.

Juan Sequeda: And I think in the conversations we've had on this topic, it's some sort of a dial, it's like, well, is it more batch? Is it more real time? And at this moment, it kind of seems like there's two categories to industry or two types of companies doing that one, when eventually the end user doesn't really care. I just want to be able to go," I want this batch, or I just want this real- time," and eventually the technology, the companies will do that.

Tim Gasper: And as an industry, I think we've flirted with, how do we actually have these two things kind of coexist? There was the Lambda architecture, and there were a lot of different permutations of like," Oh, can we have streaming and batch kind of happen at the same time?" And I feel like none of those really have taken. And I think to come back to what you're saying, Fraser, it's starting to come back to just use cases, it's like," Well, what are you trying to optimize for? And what are you trying to do with the data?" And you might do a little bit of one for certain use case, and a lot of the other for the other use case.

Fraser Harris: Yeah. And again, Fivetran, you can dial between, just once a day, 24- hours and dial it all right down to one minute, for certain applications. And we're just adding more and more streaming capabilities, especially as the destinations, now there's Snowflake streaming. There's more and more capabilities to do that and just automatically figure out, without you having to tell us it's like," Okay, we can just be pending this data." And you're getting real- time for free because it pends, or practically free when you're ingesting.

Tim Gasper: Yeah, you're getting more event- driven when you do that.

Fraser Harris: Yeah.

Juan Sequeda: So one thing I want to bounce off you, this is actually an idea that I've been working on, and I'm really interested in your perspective, kind of on the topic of being proactive with metadata. My background has always been on graphs and semantics. And for me, metadata is just, I view it as a graph. It's all about how these things are connected and what I find fascinating about lineage is, it's just more of a graph like," Oh, this thing was derived from this thing," and so forth. One of the things I want to be able to go see more, this is what we do too, is that if your metadata is really a graph problem, then we start applying a much more graph analytics, graph types of algorithms over that. So even things like no degree or community detection or any bottlenecks and stuff like that, to be able to say," Hey, we've cataloged all your data warehouse, your data lakes and your ETL, and all your DBTs are transforms, and we see this as a graph," and I'm like," Wow, here are the bottlenecks, even no degree, there's a bunch of stuff that go into this node and a bunch of stuff that goes out of this node, that some job process, who's responsible for that?" Or," Hey, there's a bunch of communities around here, this seems to be a lot of work going on one side." Or," Hey, there's a bunch of orphan things not connected to anything." This is something I've been working and thinking a lot about, I'm curious, I'm just throwing your ideas and things that I'm working on, being innovated here. What are your thoughts about what I just said, is this interesting, or am I smoking dope here and going nowhere?

Fraser Harris: Well, I think the one part we're really focused on is the reliability of the data pipeline. And some of those workflows are really commission critical and some aren't. And I don't think that distinction is often very obvious.

Juan Sequeda: Okay, it's a good point.

Fraser Harris: Yeah. A lot of people use the term, data domain, but it tends to be not very granular. It's like this is all marketing data domain, so they have this SLA, but there's definitely something really interesting around that. When we talk about impact analysis, it's like," Okay, well there is this problem, we kind of expect it to resolve itself, is this actually going to violate any of those downstream freshness, SLAs or latency expectations?" Maybe that's too practical an answer, you were getting at something.

Juan Sequeda: No. No, no, no. I mean, again, I'm taking the opportunity here, we're talking about metadata and stuff, what you can go do with metadata. And I'm thinking about, what's next? I mean, yes, it's table stakes now, the two main traditional use cases, impact analysis and stuff. Having a visualization, that's table stakes now. So what else, what's coming next? And I like how you talked about the data profiling PII detection and stuff in metadata, and the scientist in me is like, respect.

Tim Gasper: So maybe put another way, Fraser, you've got your crystal ball here. What do you think needs to happen, or you want to see happen around metadata?

Fraser Harris: It's funny, so prior to Fivetran, I ran my own company for six years, like a startup, and I was always like," Oh, we need to add this feature and this feature," and the longer I've spent in tech, the more I've realized, it's just like, we just need to get one thing right, and then once we get that one thing really right, then we need to just do two things really right. My ambitions have gotten much more practical, and so I'm kind of stuck here being like," No, we really just need to get that data governance piece right so that people can sleep at night." We can do that, we've made people's lives materially better. I'm glad you're thinking about all of these ideas.

Juan Sequeda: At the end of the day, we're seeing a lot of companies who have a lot of legacy kind of data monolithic infrastructures, that they are not just migrating to the cloud, but they want to have complete cloud adoption. They want to go do things like move into Fivetran and Snowflake and using things like data. world, for example. But how do they go do that process? It's not a lift and shift and let's go take all the garbage we have in one place and move it into another one is, this is where the lineage would come in to understand this big mess, and how can we take that as an opportunity to take that very complex thing and simplify it, and once we're going to go push it into something like Fivetran and Snowflake and data. world, for example.

Tim Gasper: Well, maybe the way to connect the dots here is that, kind of what you said, Fraser, we need to get this data governance thing right. And if metadata accessibility, if leveraging graphs to actually represent that metadata in a richer way and analyze it and do automations around it, if that can make us finally get data governance right, then maybe that's what kind of ties this whole thing together.

Fraser Harris: Yeah. Something we're looking at is bringing descriptions of the columns from the source system and propagating them through as well. And partly you can just do this out-of- the- box, like Salesforce has very good descriptions of everything, and it is self- describing data. In some cases, people are really good about using comment fields and databases to create that self- describing schemas, but in a lot of cases they're not. And if we could call it," Hey, I always love anything under version control, this is phenomenal, let's do this." And anything that's self- describing is better than having descriptions that are removed and out- of- date. And so as an industry, if we could make that really work, going back to the conversation with data contracts, if we can at least have a description coming through, and we can tell the person this changed at this time, that's actually getting a long way to resolving these problems versus trying to enforce a static contract and then dealing with those changes.

Juan Sequeda: Yeah. I mean, today here at the Big Data London conference, Tim and I gave our talk on our data product, ABC framework, accountability, boundaries, contracts and expectations, downstream consumers and explicit knowledge. And on the explicit knowledge we say, what's the documentation, just give some example. And I very rarely see comments in the SQL EDL what this stuff means, and that's a bad thing, and that's actually kind of a low hanging fruit on how we can start providing some knowledge around this stuff. Let's just give a description around it, so the bar is kind of low, let's start working on this stuff. And again, as you said it, a lot of these problems are people problems.

Fraser Harris: Yeah, no one likes documenting period, except for tech writers.

Juan Sequeda: So we're going to have the new analytics engineers being married with some sort of, getting trained tech writers. This is an interesting thing of what is the ideal, I don't know, I guess unicorn or whatever of a role, who can really help make data governance be more effective.

Fraser Harris: Yeah.

Juan Sequeda: I think that's a topic for another podcast here.

Tim Gasper: That's a good question. Yeah.

Juan Sequeda: But all right, look Fraser, we can keep talking, I mean we've already been at this time for almost 50 minutes. I was very much looking forward to having more conversations with you and looking forward to the next conference that we'll be able to go do this. But it's time, let's go move to our lightning round.

Tim Gasper: Let's do it.

Juan Sequeda: This episode, and the lightning round is brought to you by data. world, the data catalog who support your data mesh. And I'm going to kick it off. First question. So one of the things that we talked about is kind of shift left. So imagine you have your data stack, on the right is BI dashboards, reporting analytics, ML and all that stuff. As you move left, you have the data warehouse, your data lake, you have data modeling, you have data integration. Then you have the source systems. Is the burden of good accessible metadata shifting left?

Fraser Harris: Yes.

Juan Sequeda: Yeah. So provide some quick context.

Fraser Harris: If we're generating the metadata and we're on the left, then yes, the burden's moving left. That's just an implication.

Tim Gasper: A quick follow- up question to that is, not just Fivetran, but also think about all the sources that are to the left of you, all your sources. Is there a burden, like you said, Salesforce has pretty good descriptions, but in general, do they need to do a better job on the left side?

Fraser Harris: It's such a hard answer. In an ideal world, yes. Some APIs are great, like Google ads and Facebook ads, they actually do a phenomenal job of describing their data. Just, it is a cost and it's an amount of effort, and so who is committing to that effort? It's hard to impose it on the business units, as I said,

Juan Sequeda: Well there has to be some incentives around that. So the question's, why does Google ads and stuff have really great metadata on that stuff? I mean, there's some incentives, well, they're different incentives, people want them to go use an understanding, so we were making money on this.

Fraser Harris: They've got about 30 billion in quarterly incentives right there.

Tim Gasper: Money talks.

Juan Sequeda: Money talks. At the end of the day, this is all about how much money we make, how much money are you saving?

Tim Gasper: Yeah. Right.

Juan Sequeda: Tim you go next.

Tim Gasper: The second question. So do you think that there will be a metadata standard that emerges in the next five years, like a popular standard?

Fraser Harris: Yeah. And we want to be a part of that. In building our metadata API, we kind of looked at the landscape. Apache Atlas was one that is pretty commonly adopted in the Hadoop world, but we really dug at that and it was pretty over engineered for what we're talking about. And so we're also talking, I think we're going to be joining open metadata and really trying to push that forward. And I am hoping that as an industry, we can standardize not only around metadata, the next level problem is how do you describe policies programmatically so that they can be enforced? And there's nothing today, what we're seeing today is Google Docs or Word docs, and it's just descriptions of," This is what this type of policy is, and this is how you apply it."

Juan Sequeda: And what I find interesting there too, is that you have these policies, but then you have things like the folks had great expectations, and that there's some sort of the quality gets combined. There's this thin line that kind of goes... At the end of the day, everything's metadata here, and then it's all connected. All right, next question. So you all mentioned that one of the missions of Fivetran is to make data as reliable as electricity. Are the data fabric or the data mesh trends helping accelerate that mission, or is it tangential or is it a distraction?

Fraser Harris: Oh, boy.

Juan Sequeda: Lightning round, lightning round.

Fraser Harris: Lightning round. Yeah, it's like if our PR person is here, they're like," Data fabric is important." I think ultimately people are just trying to get their jobs done, and these are useful concepts to keep in mind in terms of just framing of what you're trying to do, but I don't overly index on them. There's jobs to be done, you got to move data, you have to transform data, you have to analyze data, those are the really important jobs to be done.

Juan Sequeda: That's an honest, no BS answer right there. I always say, if you just zoom out and you look at the principles around things, you got to move data from different places, you got to do storage and compute on that data. You got to do some analytics, you got to use the data. Those are those principles that have been around for the last, I don't know, three decades, and we're still doing that stuff. So all these trends and all these fancy words around that continues around all these principles.

Fraser Harris: We're definitely getting better, and we're applying technology in much better ways.

Juan Sequeda: 100 percent, the way we're dealing with data today is way freaking better how we were doing it 10 years ago, 20 years ago and 30 years, 100 percent. But I think the principles behind that continue to be the same. And that's important because we're not reinventing the wheels here or anything, even though a lot of people are reinventing the wheels. Anyways, I digress, please go, the last final flight around questions.

Tim Gasper: Opportunity for another episode on that comment there. So the fourth and final lightning round question, is cloud data adoption the biggest driver of modern data integration tools, like Fivetran really becoming popularized, or are you finding that it's another use case or a related use case?

Fraser Harris: There's two big drivers. I joined number six at Fivetran, this was six and a half years ago now. And our customer base was tech startups in the bay area, that was it. And then we expanded to tech startups in New York. And then over time that was tech startups everywhere. And then we started adding in industry verticals, and now every industry under the sun is a customer of Fivetran. And so what I'm trying to get at is two things, for all of those other industries, it's about a digital transformation. They're moving to the cloud, they're revisiting their technology stack, and they're looking at tools like Snowflake or Databricks and what kind of transformational things they can do with data they never could before. And it's really exciting stuff, for all those tech startups it's just like green field, we're building a net new data stack, how fast can we get to insights that are actually providing value to the business?

Tim Gasper: That's interesting. That distinction is very interesting, and it's exciting to hear about the success you all have been having.

Juan Sequeda: All right, Tim, take us away with takeaways, we had a lot here.

Tim Gasper: This has been great, there's been a ton of great takeaways from this conversation today. And so I'll summarize and then I'll kind of pass it off to Juan. So first of all, you really talked about what is metadata? Well it's data about the data, but more than that, in Fivetran's case, it's the data about what's happening and who is responsible for what's happening? And you mentioned that there are two core use cases that in general you all have been thinking about around metadata, and especially with the metadata API that you launch and the data. world has integrated with and launched our integration around the sort of two main use cases. One is around discovery, and the other one is around compliance. And an example you gave that I thought was very astute, and we see a little too much of in the news these days, is things like Citibank getting fined a huge amount and cataloging wasn't enough, and weren't applying all the policies that they needed to at that ground level. So we started to talk a little bit about governance and the role of governance, automated data governance, how can metadata be shown as structured information that catalogs and other tools can consume? And in the reverse direction, be able to inform about, back to those other systems you can actually operationalize that data. I think the term these days that a lot of people are talking about, we didn't explicitly mention it in our podcast today, sort of activating your metadata or active metadata, there's definitely a lot of interesting things that are possible around there that can help to automate data governance. We asked what's the result we're driving for here, and you're like," Hey, if the CIO or the head of compliance can stay, if they can sleep well at night," and that could be a really good outcome, no triple- digit million fines, that's probably pretty good. Being able to have controls around all of this. And then data contracts came up is a really important mechanism that we could be leveraging to really try to make metadata useful, but also to really create these expectations and dependencies and manage them better within our enterprises around data. And I think you gave a good description of a contract, you said that it's an interface, there are two parties and what is the interface between them? And there's an expectation that the provider is going to meet the expectations of that interface. And you walk through a lot of different ways that you can do that, around testing and reliability, around people responsibility, around if a test fails, can you notify the right people? And that's something you all are doing on the Fivetran side. And you talked a little bit about static contracts versus dynamic contracts. And I think that's a really important topic because I think that, Fraser, you and I are both product guys, and so we know that you got to manage expectations. And a lot of times you make these commitments around roadmap and things like that, but occasionally the business situation changes or the landscape changes, and it can't only be static, we have to be dynamic, we have to be agile. And that's just the way that we can be successful and be dynamic as organizations. So that's some of my big takeaways. Juan, what about you?

Juan Sequeda: I got several here. So first of all, the fundamental challenge of data cataloging is, how do you get people to describe the business process? We kind of already do the easy things, which is look at the metadata. But the really hard thing, which is the opportunity that I feel that we were missing out a lot is, people describing their business processes. You hope that the data represents that as best as possible, usually via the data modeling and lots of cross- functional collaboration and business expertise, but we really need to get the cataloging of those business process of business questions. We talk about the 80/ 20 rule, again, 80% of BI projects fail, and how is that even possible? The past model's always been the data architects are trying to have this crystal ball where they're trying to figure out what is the model that would be most useful. And then with the ELT model, it allows the data to be replicated and you can do the modeling next, and this is definitely a good first step, a big change and very valuable. But the second thing that we need to go focus on is to get out of the room and go meet with the business, go meet and understand with the people, I think this is the big shift, this is why it goes back to one of the things that you brought up is, all problems are people problems, the lack of collaboration. We had a discussion about being proactive on metadata, in addition to just being the protection side, you guys are really focusing on data profiling and PII detection. And I think the spectrum, the different angles look is, what's your company size and what's your data maturity? If you're a smaller company make sure you growing your company, but at some point once you get bigger, you need to really start getting into that next space of how we're going to mature our own things. Not surprisingly, we brought up a little bit about streaming. I love what you said, as soon as things go sub 30 seconds, you're imposing a 10X cost and a 10X complexity, do you really need that? Ask yourself that. And one of the things that you're really focused on is on the reliability of the data. So some pipelines are very mission critical, some are less so, so how do I identify, we know what are those mission critical pipelines and we're keeping track of them. And what's next for metadata? We really need to get this data governance thing right. All right, how did we do, anything we missed?

Fraser Harris: You nailed it, gentlemen. I'm impressed that you had that many notes.

Juan Sequeda: And I'm on my phone, so my phone we're doing, so.

Tim Gasper: We've done this a hundred times, we have too much practice now, it's ridiculous

Juan Sequeda: While we're drinking and I'm jet lagged, and it's 11: 00 PM now over here. All right, Fraser, back to you. Three questions, one, what's your advice about data, about life, whatever. Second, who should we invite next? And third, what are the resources that you follow that you should suggest other people to follow too?

Fraser Harris: Well I'll start with the first. Keep it simple. Almost everything is better when you just simplify it down. Yeah, keep it simple. I read a lot of hacker news, follow some folks on Twitter, and vary. The data intelligence is very loud and talkative, and it's interesting to keep an eye on, but just focus on the outcomes that you're driving, so you're creating value. Because ultimately that's why we're all here, is creating value for the folks that are dependent on us. And then finally, who you should have next. I recently had a great conversation with Sahir, I can't remember his last name, the CPO of MongoDB. I don't know how relevant that is to you, but had a phenomenal conversation with him. Who else? In data, George, actually our CEO. If you want to ever talk about databases of all forms and types, George is one of the most knowledgeable people, and I always enjoy talking with George. So I recommend having George here.

Tim Gasper: Interesting. We should have George on at some point. I think the last time we had a deep talk on databases, it was with the CEO and founder of Neo4j.

Juan Sequeda: Yeah, that was a good one, because we did a whole kind of gamut on not just graph databases, but what is the database kind of market?

Tim Gasper: We might be due for another round of that.

Juan Sequeda: Yeah, that would be a good one.

Fraser Harris: Actually a little secret, but in those early days of Fivetran, we didn't believe that data integration was a very big market. And so every year we're like," Okay, we're going to do data integration for another year, and then we're going to have to figure out what the next thing is." And George was always running in his spare time, a Presto cluster, and just messing around with Presto. And he was like," We should really launch our own data warehouse." And I was always like," George, stop it, we need to focus on this data integration thing for just one more year." And then each year it was like," The market's 10 times bigger than we thought." And then it just kept getting bigger, and now it's like," Oh, my God, this is a really big, hairy problem, and we're just going to be doing this forever."

Tim Gasper: Well I'm glad you stuck with it.

Juan Sequeda: The joke I always say, repeat in talks is like," We can take a rocket to space, we can bring it back to earth, it can land on a platform in the middle of the ocean, but you still can't say if these two spreadsheets match, and integrating this data so hard, so wait, is data integration harder than rocket science?" And actually rocket science is a natural science, I don't have to go deal with people. And then data integration, you said there's a lot of people stuff and hey, people, humans, we're complex.

Fraser Harris: Well, it was just a little secret for you, but these things overlap. We replicate data from the space station.

Tim Gasper: Oh, it all comes full circle.

Juan Sequeda: Okay. So tomorrow we're given a talk, well, we're not giving a talk, one of our colleagues, Emily Pick, is giving a talk together with OneWeb, one of our customers, and the title to talk is, data from space to users in minutes.

Fraser Harris: That's a great talk.

Tim Gasper: Anything space and data together is good.

Juan Sequeda: All right. We love this. So we got to wrap up. Next week we have Rupal Sumaria, who is head of data governance at Penguin Random House. So I think after following this conversation, we'll get into some data governance conversations next week. And we'll both be live together from Austin back home.

Tim Gasper: Yeah. Exactly. And tomorrow, September 22nd, is the data. world summit. So I don't know exactly when this episode's going to drop, but for those that are watching online right now, definitely register for that. And if you end up missing it, the live summit, definitely check out the recording of it at data. world.

Juan Sequeda: And we're finishing live, but we will be again, live tomorrow and then it'll get hit the podcast, just giving our takeaways, takeaways of all the things that we looked at, Big Data London. So the same time tomorrow, Wednesday, Thursday, I don't even know what day it is. Thursday 4: 00 PM Central, or something like that. Anyways, so much stuff. 100th episode. Fraser, thank you so, so much. Thanks to data. world, who's letting us do this for 100 episodes. And we're so happy to be working with you, Fivetran, and all the metadata, the goodness that we're all doing together. So cheers to metadata and cheers to 100 episodes, and cheers to Fivetran and data. world.

Tim Gasper: Cheers, Fraser.

Juan Sequeda: Cheers.

Fraser Harris: Well Juan, one good luck with your first child in a month, and I hope you take a long paternity leave and don't do the podcast.

Juan Sequeda: I think there's a two- week break.

Tim Gasper: Okay, you've timed it.

Juan Sequeda: I timed it. We'll see, I'll be able to escape an hour. We'll see.

Fraser Harris: That's pretty short.

Juan Sequeda: I'm taking over a month, for sure.

Fraser Harris: Okay, okay.

Juan Sequeda: And one hour out of the third and fourth week, just to go do this.

Fraser Harris: All right. Well, good luck.

Juan Sequeda: All right. Thanks.

Announcer: This is Catalog& Cocktails. A special thanks to data. world for supporting the show. Karli Burghoff for producing. John Williams and Brian Jacob for the show music. And thank you to the entire Catalog& Cocktails inaudible. Don't forget to subscribe, rate and review, wherever you listen your podcast.


Answering critical business questions relies on integrating data from a variety of systems. But it takes a lot of work to understand what the disparate data means and how it all fits together. How do we make data as reliable as an electricity?

Join Tim Gasper, Juan Sequeda and Fraser Harris, VP of Product at Fivetran, as they celebrate the 100th live episode of Catalog & Cocktails and discuss how #metadata, #datacatalogs, and #dataintegration act as the power source for your connected enterprise