Data in Motion with guest Jason Schick
Data expert Jason Schick discusses how data has evolved from static enterprise data to dynamic data-in-motion, and how open source software tools like Kafka are making it possible to access needed data in near real time to enhance enterprise processes.
Webinar Transcript
Jim: [00:00:04] Good afternoon, this is welcome to Smart Future Webcast podcast. This is our seventh session, and today we have the honor of Jason Schick from Confluent. And welcome, Jason.
Jason: [00:00:18] Thanks, James. Good to be here.
Jim: [00:00:20] And first, I know you and I worked together at IBM several years ago. So glad that you could make it, but maybe give a little background and we can start to search enough for your background.
Jason: [00:00:31] Sure. Yeah. So I’ve been working in the data space for twenty five years at this point, which is kind of hard to believe. I’ve spent a lot of that time worrying about how to enable analytics and how to go get data necessary to feed those analytics. And so, you know, this this chapter of my career confluent fits very naturally into that. It’s it’s a problem space that continues to get attention, rightfully so, because it’s the ability to provide meaningful, relevant information to impact, you know, the missions and better support our war fighters continues to be a challenge. And, you know, we’re where we are here. Confluence pretty exciting because we do a couple of things that are different that we haven’t done in the past and the conversation and just talk a little bit about it to us.
Jim: [00:01:27] So I think we’re going to call this session data in motion. So maybe can you kind of kind of explain where this fits in? Again, several DOD use cases, but where do you see confluent specifically your product, COFCO, or your platform fitting in?
Jason: [00:01:46] Sure, yeah. So I think the thing that’s most exciting about confluent and which is based on Apache COFCO, and there are probably a number of folks listening here that that are familiar with Kafka. It’s one of the most successful open source projects out there in the world. But what we’ve done that I think is most significant is we’ve reimagined the way data should be treated. Historically, we’ve thought of data as a passive asset and we’ve pivoted that. So now we view it as confluent, imagines data as an active asset, and that’s got really profound implications for anybody or any organization that needs to be highly responsive to changing environments. And so, you know, in the past, data gets generated or collected and then it’s stored somewhere and it stays there until someone or some thing finds it and understands it and then retrieves it. And a lot of times with some complex integration over there, or they’re retrieving it from a batch file and there tends to be a lot of organizational friction associated with doing that. And so every enterprise has been around for a while, has ended up building lots of systems to store copies of data for their own purposes.
Jason: [00:03:06] And that works. But then it starts to be really hard to keep data in sync across systems, and that makes it hard to know sometimes what data to trust and what not to trust. So with confluent, like I said, built on top of the patrica, we put data in motion. A good analogy is like the central nervous system of a body so that any properly credentialed system or user can get that data tapped from the source any time there’s an event of interest to them that adds to their operational picture. And so flipping that script like we have, it removes a lot of the consistency in the data currency challenges that we see when we’re doing things like counting inventory of parts or supplies or personnel, where a lot of time systems will disagree with one another, even on fundamental facts like how much if we got. So that’s at a sort of at a macro level. That’s that’s the most interesting thing about about confluent about Kafka. Now, if if we want to get him some use cases here, I’m happy to do that. Or if you want to kind of poke at me on points I just made.
Jim: [00:04:25] Conforth Yeah. I think where I see it, like the old mindset was I’m I’m building a command and control application or a DOD application, and I usually have an API to connect to a database that, you know, and then I constantly have to manage that API and. You know, five years later, the either the database changes or the API gets become stale or certain things don’t support it and suddenly broken. So can you maybe explain how it can help in that category?
Jason: [00:05:00] Yeah, for sure. And that’s that’s really putting your finger on the challenge. We have to do an API integration that requires a pretty deep understanding of your source system. And so you’ve got to know a lot about the schemas and the way the API calls work. And then if that changes, then you’re in trouble with your downstream systems, with with Kafka. What you get is it’s drawn from some of the better characteristics of a publishing subscribe paradigm where that source can simply publish the data up into universal data pipeline. So if you decouple your your source and your consumer and that means that when you make changes to your source system, your consumers can continue right along without being brought down, without worrying about breaking API integrations. And probably more importantly, you start to get reuse of that data source. You know, a lot of times you’ve got a lot of different systems that do use or would like to use system from a particular source, whether it’s sensor data or could be just inventory supplies, data. You don’t want the organizational friction of having to manage this and manage integrations with a whole bunch of downstream systems. It’s a lot better for the people that manage that one source to be able to publish. And then whoever is credentialed to go get that data can then they can simply subscribe. And that what that does is it starts to create a network effect over time. It makes it easier to feel new capabilities and new systems, and it also makes it easier to retire old ones.
Marv: [00:06:48] Jason, are you suggesting that you could bypass the API architecture paradigm and move into this pub sub?
Jason: [00:06:57] Yeah, you can. I mean, the API has its place, Mark, and we see a lot of API usage, especially when An is dealing with external entities. Right. APIs bring a certain degree of security and you can then you can certainly find other advantages to using the API at times, but it requires a lot more work technically. And so with Kafka, it’s more of a messaging layer way of publishing and subscribing to data that’s of interest. So a lot of times you’ll see an organization use the two of them together, but. A lot of our customers that have started to use conflict have started to recognize that because it does reduce that friction, that they do start to use the conflict platform a lot more broadly, especially when they’re dealing with trusted sources and partners.
Marv: [00:07:56] So would you want to publish to a data like and then access the data link from any application that needs to access that data?
Jason: [00:08:06] Yeah, that’s that’s a pretty common pattern. Where it gets to where it gets even more interesting is you want to be able to move data to where it’s needed most in the moment that is needed. And so when you’re collecting data as it’s being generated, you probably want to be able to move that up into a data lake or some sort of analytics platform to do machine learning and A.I. and derive other insights. And so a lot of customers are publishing data using confluent. Good example is in a lot of cyber modernization use cases where they’re using conflict to collect the data off the sensors at the edge and then bring it up to a data lake. But at the same time, there are actors that might need that data in real time. And so confluent allows you to move the data intelligently to where it’s needed most without having to build a couple of different parallel integration paths.
Jim: [00:09:08] Like I think where you are in the world of real time, you know, people want to they can’t wait. Right. And mean used to be batch processing in the world of cyber. You know, we want to grab that he kept data or those service logs we found, like when I was little lastic that the underlying connectors Kafka was one of the major tools to to to collect the data and create that real time process. So you’re not waiting around.
Jason: [00:09:37] Yeah, that’s right. I mean, we see we see elastic and confluent show up in a lot of cyber modernization architectures for that reason.
Marv: [00:09:48] How hard is it to take a legacy sensor, for example, and move into a publishing mode with Carmelo?
Jason: [00:09:59] That’s a it’s a pretty straightforward process, ma’am. It’s the as a software development platform, we’ve got a connect framework that allows any skilled software developer to just access the interfacing standards that that ship with the product using Java or C++ or. And so it’s it’s a messaging level integration that people with the most basic software development skills generally possess.
Jim: [00:10:36] Kind of the whole buzzword, obviously, in dealing with deficit gaps and Platform one and black pearl in the Navy and the forge with the Aegis combat system and the Air Force has several software factories, whiskey camp and surf camp. What is your experience? Are you seeing people understand Kafka or they don’t understand there’s an education required of becoming a little bit on that?
Jason: [00:11:02] Yeah. So there has been a successful open source project. And what we find is in a lot of cases, customers are pretty familiar, especially software development oriented people. They’re pretty familiar with, with Kafka and its core capabilities, the especially the desk ops folks that have embraced Kafka and confluent. You can kind of think of us as providing the data layer for Damasak ops framework and what what we do, because you get that published and subscribed sort of underpinning paradigm, we make it easy for them to access data and really put it to use as they’re designing and testing applications and then iterating on them so that you see one of the challenges that that bedevils people that are building analytics environments is how do you access data? And you need something that’s a reusable sort of modular approach to data access. And so the deficit Cyclops folks generally understand that in particular, we’ve we’ve spent a lot of time working with the Iron Bank team within Platform one. And so they have a certified platform and iron banks. And we’ve we’ve built a hard into distribution of confluent platform, a hard and distribution of Kafka that’s that’s out there and available to users of IUPAC.
Jim: [00:12:42] And it’s gone through the continuous ATO or plans to go to the ATO process.
Jason: [00:12:48] That’s correct, yes. So we’re we’re in there are people that have access to Ironbound can go and pull us down right now,
Jim: [00:12:55] Which I’m aware that, like even overmatch software armouries is is connected to Iron Bank through the temporary repository for for both Navy and Air Force right now.
Marv: [00:13:08] It’s pretty interesting because platform one was trying to run so completely on open source without licensed products, so your subscription model apparently flies in the face of that. Is that true?
Jason: [00:13:22] Well, that’s probably better better addressed by others, but I think what’s what’s relevant for me to say on that market is that Kafka by itself has a lot of virtues to it. But some of the things that we had around it are essential for use in the mission space. So we provide the enterprise grade security that you’ve got to have in mission critical workloads. We provide more than 200 different connectors to different data sources and targets. We provide a Cuban êtes operator, which is important to be able to, you know, have the administrative and operational controls that you want in an environment like this. So I don’t want to try and comment to how they arrived at at deciding that a subscription software product is compatible with their world view. But we’ve had success working with them.
Jim: [00:14:23] And I would add, Marv, I found there’s kind of two parts of it. It’s one, all the extra capabilities that the head engineer could develop it, but it would be two, three manyas to develop. And then the government has to support those components. Or you could just spend the license fees, which are minimal. And that that, you know, it’s something it’s a subscription model. It’s always supported. And the other part is they also provide support. And one of the IMF requirements of any commercial tool is you’ve got to have a support story. Right. So I saw most of the times they bought the license version because they wanted that support and they had to check the box for from a risk management point of view.
Marv: [00:15:06] Well, to your point, Jim, I’m always amused by the dichotomy of our S.A. organizations like to use complete open source to develop capabilities and then throw it over the fence to the acquisition folks for the products, for the program to record. But programs of record in general can’t get approval to use a pure open source product. They really want supported open source. You know, that’s hence the creation of RedHat and the success they’ve had within the department.
Jim: [00:15:36] Yeah, or I will say there’s a from based on our acquisition discussions we’ve had is they’ll fund the the service side of it and the people they need to write the code, but they don’t budget for the actual SAS licenses and know that that takes a while to get. The good news is sometimes after a while they figure it out and the cost is minimal compared to, you know, 10 man years to do this.
Jason: [00:16:02] I think that’s consistent with our experience, too. There are plenty of cases we’ve had where we encounter developers that have been working with COFCO for a long time and there’s a lot of benefit to that. They learn how the software works and they decide where it fits. But then when it becomes time to field capability, the conversation is going to be different. And what you find when you kind of look into the Kafka ecosystem is the vast majority of the commits to the open source code base come from confluent employees. And you know that the origin story of Kafka is that at LinkedIn 10 years ago, our founders developed it because they tried to buy various KOTS offerings that would help solve a problem they had and they couldn’t find something that worked. So they built something and that became Kafka. And as is common with a lot of these open source projects, there’s there’s a company that that kind of is the the clear expert in that particular open source project. And for Kafka, that’s
Marv: [00:17:13] It’s good to. I didn’t know that.
Jim: [00:17:15] Yeah. Which I think is the future. Right. You’re starting to see H2O, a the machine learning platform. That’s OK. Based on open source elastic is the search engine based on open source? Obviously, RedHat with Linux started the day they started that. So dequeue with Mesoscale Carbonetti. Yeah, yeah. There’s a DOD strategy memo. A lot of OASDI. You know, I, I my experience is like the world of data is still an old legacy. Could you maybe comment, like, do they get this data inmotion concept that you don’t have to wait nine hours for a batch process to get the answer? You can get it in real time, just like Wal-Mart knows their inventory or Best Buy knows their inventory in real time, as opposed to this reports based on last night’s batch run.
Jason: [00:18:07] Yeah, you know, the if you look at like the memo that came out from his ex, it’s designed to be pretty broad, right. And not necessarily too prescriptive, but there are a couple of important concepts that I read into it, that there’s a need to make data discoverable. There’s a need to to make data actionable. And, you know, the I’m looking for a couple other their talking points as I as I scroll through my laptop. But the to the challenges the duty faces is how do you get the data in the hands of people that can do something with it. And this is the ability to make data available and accessible and then, you know, operational. Right. Like it’s one thing to collect it and do interesting things with it, sort of in an offline capacity. But then how do you take it and actually apply it to the mission? And so when a lot of a lot of what I’m reading and memos like that sets the stage for how do you take it and make it operational. And so when we are confident, look at that, we say, OK, we can help with the accessibility. Data needs to be discoverable. It needs to be shareable. It needs to be reusable. So publishers subscribe naturally aligns to that. But just as important, once you once you’ve discovered it and started to understand it, how are you going to actually bring it to the people that need it? Because they’re they’re downrange. They’re in an environment that’s hot and changing fast. And we think we we fit into what those objectives kind of lay out. But we also set the stage to really put that data in motion to help the people who need it most.
Jim: [00:20:05] Yeah, there was a joint all domain command control session this week, and there was a great Army officer who’s made a comment like the biggest problems with. What we’re trying to solve is there’s too many enclaves and then the data are there’s too many different data formats, so you help out on, you know, in the public subscribe model where you help kind of come up with a common schema or at least, you know, push for data to come to some common format, which is maybe it’s JSON or whatever.
Jason: [00:20:38] Yeah. So we sort of take a different tact on that. So we allow for a degree of schema independence. And so as long as your source system can publish and your consumer can read, then. We’re going to we’re going to let that data move as it needs to go, and so I think one of the things that’s been a struggle for the DOD, it’s been a struggle for any large enterprise, has been if you try to if you try to really flatten everything and drive to a single standard, you never get there. So you have to allow for a degree of heterogeneity in the way data is produced, in the way the way it’s consumed. And so we’re not really trying to to create a single monolith at all. What we’re really after is how do you just make that data available? And there’s always going to be work that needs to be done for your your consuming application to interpret the data and put it in context so that it’s meaningful to whoever uses it. Where where we help, though, is all these staging environments that got stood up because you needed an access. You need to access a copy of the data that you can actually get at without relying on some other program that had other priorities. The other staging environments have a chance to to be removed when it makes sense and when you’ve eliminated the disruption, is that taking them out will cost. So over time you can have a single you can have a library of your data sources and. That’s what you go to when when you have a mission
Jim: [00:22:29] Or you just compliment that data like like I know in the case of like I know Noble, right. Which is the next replacement marvel in the Navy, they’re using Cafcass, you know, as one as that as that central nervous system. And they were you know, the data like is, I think, TBD with some recent things that are going on.
Marv: [00:22:54] And so, Jason, can you relate how the Aimo needs of all of these new projects like Overmatch can relate to confluent?
Jason: [00:23:04] Sure. Yeah. So that’s that’s an important one. Right. So first off, we’re not really an analytics platform. What what we do is we we get data and we move to put the data where it’s needed. And so in in any of those AIIMS scenarios, data acquisition is important. So you can you can build, test, refine the models. So we help with data acquisition. Then when you build those models, they have to be they have to be operationalized, so you in a lot of cases, you want to be running those models on the data in real time as the data streaming off your sensors, as it’s coming in from your, you know, your command control and your Iosava systems, you want to be able to go to the. Apply those algorithms for maximum advantage in the moment that you’re collecting that data. And so how do you do that? Well, or that data InMotion platform where that tactical data match or tactical data fabric. So you take those algorithms that you’ve you’ve developed and hardened and find in those analytics environments, then you operationalize it running as part of those data streams that the conflict actually handles for you.
Jim: [00:24:33] I think we’re getting close to our time, so any closing comments, Marv, or where, Jason and obviously we’ll put the recording, we’ll put links up on the YouTube channel for anything you want to put, Jason, as far as how to get more information or upcoming workshops or anything like that, but why don’t we do a closing comment from you?
Jason: [00:24:56] Yeah, sounds great. So first of all, I appreciate the time, Jim and Mark, that we we we touched on use cases sort of indirectly. But, you know, a couple of things just to leave with listeners here. So for handling like sensor to sensor in theater, data transmission is like a tactical data mesh is a work on a workload that we’ve seen a number of times handling telemetry data and ESADE data and synchronizing that across ground stations is another use case. We see a lot of keeping like inventory of things like supplies and parts synchronized around the globe for better maintenance and readiness of the fleet. It’s another use case that we see ourselves serving both the duty and for a lot of commercial organizations to. So we’re a data platform and we support a whole lot of use cases. Jim, you mentioned cyber modernization before, so I would love to talk more about any of those with anybody that’s interested in the middle of September. We have a summit coming up and so confluent our staff. That’s the the annual global user event. And it’s not a marketing boondoggle at all. It’s actually a great chance to hear from a lot of different customers, how they’re using Kafka. And there’s a government track. We’re going to have a number of different agencies talking about what they’ve done and the problems they’ve solved. So the dates are September 14th and 15th. Anybody that’s interested can go online and find that. And Jim, sounds like I got a chance to drop off a link in the attachment to this. So I hope this reaches somebody that wants to attend and they get a chance to join us.
Jim: [00:26:51] Oh, definitely. Thank you, Marv. Any last comments
Marv: [00:26:54] That are very, very informative? Thank you very much. Much appreciate it. And look forward to learning more about Confluence. So thanks again.
Jim: [00:27:02] Thanks, Jason.
Jason: [00:27:03] Yeah, thanks. Thanks to both of you guys. Really appreciate it.