Intelligent Video Today - Vivoh's Erik Herz Takes Long View on the Evolution of AI-Infused Video Technologies
For the inaugural episode of Intelligent Video Today, we were grateful for the chance to visit with Vivoh CEO Erik Herz. While Vivoh focuses on enabling network distribution solutions for video content, Herz has been involved in the video streaming business for more than a quarter century and has some keen insight to share on the massive changes taking place in the video technology segment.
Among the issues addressed by Herz in this 12-minute interview are:
- How the drive to incorporate intelligence into video is not new – but is certainly evolving
- Identifying the steps that organizations will need to take to make AI-infused video applications more relevant to the enterprise
- Determining ways to leverage AI to make content relevant to work audiences across the globe
To access the video, click on the photo above or watch via this link.
Below, please find the transcript of the discussion from this episode of Intelligent Video Today.
Steve Vonder Haar, IntelliVid Research: Hello and welcome to this edition of Intelligent Video Today. I’m your host, Steve Vonder Haar from IntelliVid Research. Joining us on the show today is Eric Herz, the CEO of Vivoh. Welcome, Eric.
Erik Herz, CEO Vivoh: Thank you, Steve. Great to be here.
VonderHaar: Well, it’s the inaugural issue or the inaugural episode of intelligent video today, and I wouldn’t think of anybody better than you to have on board here.
You’re now at Vivoh. Tell me a little bit about that and we’ll talk about a little bit about your history in the enterprise video market space. But what does Vivoh do?
Herz: Sure, yeah. Vivoh’s been in business over four years. We’ve built software for enterprise video delivery and that includes a multicast solution used by customers for hundreds of sites. We include video caching and a Zoom DVR for the time-shifting of video conferences.
VonderHaar: So you certainly help the video get from one place to another on the corporate network. And certainly that’s a big issue for all enterprise streaming users these days. But we have bigger fish to fry on Intelligent Video Today, not only the networking aspects, but thinking about how evolving technologies, particularly artificial intelligence, are impacting the way we’re going to be using video in the business realm.
And we’ve been talking a little bit about this concept of Intelligent Video 2.0 – or the next phase. What’s going to be happening in enterprise video? Tell me a little bit about your thoughts about where we’re heading in this marketplace.
Herz: Yeah, started streaming video at RealNetworks back in 1997 and then joined a company, Virage, that focused on we would now all call “artificial intelligence.”Now, back then we called it media analysis. So I call that kind of “Enterprise Video Intelligence 1.0” And that was really focused on extracting information from the video speech, text, object recognition, optical character recognition and facial recognition.
VonderHaar: So, this is our artificial intelligence. A rose by any other name, right?
Herz: Yeah, right now we call it that. Back then it was called media analysis, you know, video processing, computer vision, speech-to-text. But it was this use case for enterprise video, call it “Intelligent Enterprise Video 1.0,” and it really focused on search. So, if you had a large repository of video, how to make that easier to find a keyword search that goes across not just spoken words, but maybe the word Vivoh in the background or a text on a PowerPoint slide, or maybe an object that was recognized by media analysis plug-in that can do object recognition.
VonderHaar: So that was sort of the 1.0 use case for A.I. in enterprise video. So you have 1.0 now being search. When we look at the next generation – Intelligent Video 2.0 – what does that mean to you?
Herz: But we’re seeing things like meeting summaries. So Zoom and Microsoft teams now have partners and their own capabilities to provide an intelligent summary of those meetings.
So that’s taking the video, extracting text, and generating either a short transcript or even a short video summary of those meetings. So that’s kind of a generative approach. Something where you’re actually not just searching, but you’re actually creating new content based on that. And it’s still often around the use cases of trying to find information, but pretty soon it’s going to be evolving toward a place where we’re actually creating new content for new target audiences based upon these repositories.
VonderHaar: So are there some aspects on what really makes enterprise or intelligent video work?. You think that there needs to be some guardrails in place?
Herz: Yeah, guardrails is one big aspect of this. It’s controversial. You know, some people would say some of those guardrails are politically correct, but some of the guardrails are just technical. You know, does this code compile? Is it appropriate sequel code?
So there’s guardrails around the technical feasibility of the output of, say, ChatGPT. And, then, there’s guardrails around policies – your policies around language or terms or competitive information. So yeah, guardrails is a big part of it. And right now, you know, unfortunately we’re kind of going hard back to the other side.
So some of our customers are now prohibiting any use of any generative AI or ChatGPT. You have Zscaler – blocking and locking down all access to that content. I think what we’re going to see is something that’s going to evolve toward the middle, and that’s really exciting. That’s that’s sort of AI processing on the edge with guardrails and some other functionality that I love to talk about that brings AI into enterprise video and really compelling new use cases.
VonderHaar: So what are some of those other aspects of AI-infused capabilities or considerations that we should be thinking about as we’re entering this new age of intelligent video?
Herz: Guardrails is a big part of it. And then there are a couple of things. So there’s redaction, there’s embedding and there’s tuning.
VonderHaar: So tell us about that.
Herz: So specifically, what we’re going to see is the OpenAI has really changed a lot of this because the large language model is huge, and it’s just too expensive for anyone else to create those and process those. And having those be open source and being able to bring those on-prem allows you to have a really great starting place.
But you’re going to need a few things. You’re going to need tuning. So tuning says, let’s take your enterprise knowledge to set and tune the large language model so it has enhancements so that those enhancements bring in data that’s relevant for your specific enterprise. And those are going to be proprietary. Those are some of the things that the enterprises own. They’re not necessarily going to want to put it in the cloud. They’re certainly not going to share with customers, but there’s going to be proprietary enterprise tuning for these models.
The second one is embedding. Embedding is a specific term that enables semantic search. So, if you’re searching for fruit and you want that search to come back with bananas and apples and oranges, embedding is the thing that links those fruit items to the query “What is fruit?”. So it’s semantic search.
VonderHaar: You’re figuring out what exactly you want to search for, right?
Herz: That’s right. And so imagine a scenario where you’re a drug company and when someone searches for drugs, you’re going to want to have all kinds of internal proprietary information that are the embeddings for your enterprise to the language models that are used internally.
And again, I think that’s going to be either on the edge or internal, not yet shared or exposed to the Internet. Maybe not even available, certainly available more broadly, but could be on the edge, for example.
And then redaction is important, too. So this is another new AI video kind of capability where you can go in and say, you know, here’s a picture of Steve. Steve has left the company, sadly. So what we need to do is we need to go through every video in our repository where Steve is there and we need to redact his image.
And similarly with data sovereignty, you’re going to see this stuff happening in real time. And again, this is why we’re going to need massive processing on the edge. And that opens up a whole new capabilities of, you know, from vendors like Lexmark and Videon. These are companies are thinking about how to make enterprise video on the edge work with really hardcore processing engines. You know this Lexmark engine it’s 384 Nvidia cores on a little edge appliance. Why? Because let’s say the data is flowing from one country to another and that country has data sovereignty laws. You’re going to need to redact that video in real-time and prevent that employee information to be leaked even internally within the enterprise.
So you see guardrails, embedding, tuning, redaction. These are all some of the key things that are going to enable intelligent enterprise video to be appropriate in the enterprise. Video 2.0 world. We’re not locking it all down. We’re going to start using A.I., but it’s going to be proprietary. It’s going to be protected. It’s going to be valuable, and we’re going to make sure that it’s, you know, it’s the right content.
VonderHaar: And that creates more opportunities for companies like Vivoh, right? If you have more people using AI-infused video, that creates more opportunity for networking providers like Vivoh
Herz: Yeah, I mean network is a kind of a way to get into this. So we just released an open source video cache, high end video cache. We’re going to put you putting this on these high end edge appliances. This allows us to sort of be in the middle of the flow of this, protect the networks, help on the networking side. But it also is going to open the door or provide hooks for an enterprise that might want to apply some AI processing to this.
You know, if you want computer vision at the edge so that way things are redacted or things are blocked or things are enhanced, you want to protect your information, make sure it doesn’t flow out of the network. So we think there’s a lot of opportunities around enterprise video and especially with edge and edge appliances to provide a lot of value for enterprises, they’re not going to shut it all down. They’re not going to block it entirely. They’re going to figure out how to have it be protected, proprietary and super valuable, because employees without those tools are not going to be competitive.
VonderHaar: I love your vision for where intelligent video is today and – always – Eric Herz always has a great crystal ball. Take us a little bit to version 3.0 of the intelligent video space. Look around the corner for me. Gazing at crystal ball, tell me what should we be looking for? Not what’s happening today, but what’s going to be coming around the corner.
Herz: So, enterprise video combined with Generative AI, I think is super exciting. It’s probably very frightening for people. It’s got to be well managed. We’re going to need guardrails, we’re going to need the tools to make sure this is done well.
But, for example, if you have a large video repository and it’s all in English and then you have a lot of learners from another country with other languages, why can’t you use generative AI to make it be a great experience that’s in Japanese or a great experience that’s in French instead of just English. Instead of just captions, let’s actually make them speak French or speak Japanese. So that’s kind of the early easy.
But then you’re going to take this very quickly to the point where you’re going to say, you know, I want a video that is in 15 minutes on this particular topic, in this language with these considerations and these data points: Go. And you’ll be able to generate a great 15-minute video on a subject that’s really important to you. Never existed before. But with the guardrails and the tuning in, the embeddings of your enterprise video knowledge, you’re going to be able to generate compelling content that allows your employees to get the information they need in their language. Condense to the to the point they need – at their fingertips.
VonderHaar: Wow, Eric. It is a great vision. My show here is called Intelligent Video Today, and I think that we’ll have plenty of “today’s” to talk about for several years to come based on that vision. So I really appreciate your sharing that insight and thanks for joining us today on the show.
Herz: Yeah, thank you, Steve. Really looking forward to show them and can’t wait to subscribe to it. And I’ll be watching every episode that comes out
VonderHaar: We’re going to have you back down the line again because the great insight that you shared with us today and as Eric said, please subscribe to intelligent Video Today.
We’re going to be doing a lot of these over the coming months and years as part of our research efforts here at IntelliVid Research. And we appreciate you joining us today. Be sure to look for more episodes of Intelligent Video today down the line. For IntelliVid Research, I’m Steve Vonder Haar. Thanks for your time.