Episode 5  |  57 Min  |  March 13

Exploring reinforcement learning with MIT Professor Vivek Farias

Share on

Engaging topics at a glance

  • 00:16:50
    What is Reinforcement Learning
  • 00:20:10
    Reinforcement Learning for LLMs
  • 00:24:00
    How do you reward your model?
  • 00:33:00
    Revealed preferences v/s just a few individuals doing that
  • 00:36:00
    AI model training AI in the future?
  • 00:40:18
    Methodologies other than Reinforcement Learning
  • 00:43:10
    Considerations when in the Reinforcement Learning with Human Feedback (RLHF) Phases
  • 00:48:10
    About Cimulate

“Exploring Reinforcement Learning” with guest Vivek Farias, Professor, MIT, discusses what role reinforcement learning has to play in this world of Artificial Intelligence.

Learning systems with humans date back to almost 5,000 years ago. And these learning systems have what allowed us to progress as a society. Being able to teach other people what we know and share knowledge has been the foundational pillars of our evolution and civilization. And interestingly, these learning systems are not unique to just humans. Animals also have these learning systems. When you look at orcas, dolphins, the higher-order intelligent animals spend time training and teaching their young ones. In the last 50 to 60 years, we have not just been teaching humans how to learn, but we have been teaching machines how to learn. And this artificial intelligence area has benefited from our understanding of these learning systems.

Reinforcement Learning is the agent interacts with the world, the world does something to the agent.

– Vivek Farias

The guest started with highlighting the importance of acknowledging uncertainty and balancing between exploiting what is known and exploring to learn more about the environment. This problem is referred to as a “multi-arm bandit problem” and is considered fundamental in reinforcement learning, where the goal is to optimize actions in an environment.

When looking at it specifically for Large Language Models (LLMs) the role of Reinforcement Learning. RL has played the central role in building general purpose chatbots that are based on LLMs. Because the resulting model that has been trained on data might not give you the refined output that you are expecting from it.

The idea is that, listen, there are so many uncertain things in my environment. If I, I don’t acknowledge uncertainty altogether, I may get into this trap where I never learn.

– Vivek Farias

When discussing about rewards and losses in reinforcement learning phase, it came out that the way we structure rewards and penalties for AI models greatly influences their reliability, how they interact with public and the accountability.

Overall deploying AI involves a balance. Backend deployment offers some level of predictability, while front-end deployment is uncertain. Successful business must experiment and capitalize in both aspects.

Production Team
Arvind Ravishunkar, Ankit Pandey, Rinat Sergeev, Chandan Jha, Nikhil Sood, Dipika Prasad

Latest podcasts

Trailer  |  01 Min  |  March 13

Unpacked with Arvind Ravishunkar

Unpacked with Arvind Ravishunkar

Share on

In this series, Unpacked, I explore and unpack the most important concepts that business leaders need to know about emerging technologies. I connect with reputed scholars, industry experts and leaders through conversations. I also do short 5 min episodes on key concepts. I am your host Arvind Ravishunkar and this is Season 1 : Generative AI

Top trending insights

Episode 3  |  48 Min  |  March 13

Leading the AI transformation of your company with Prof. Gregory LaBlanc

Leading the AI transformation of your company with Prof. Gregory LaBlanc

Share on

Engaging topics at a glance

  • 00:13:40
    What is transformation? What constitutes it?
  • 00:15:29
    Have you seen unpredictable organizational behavior before?
  • 00:16:30
    Learnings that enterprise leaders should pay attention to
  • 00:17:30
    How do organizations overcome fear to adapt?
  • 00:18:55
    Do you foresee AI running parts of companies?
  • 00:21:28
    Is data accessibility a key challenge for AI?
  • 00:23:29
    Are algorithms or data the true competitive edge?
  • 00:25:17
    Will companies without data become irrelevant?
  • 00:30:28
    What is your vision for the future of work?
  • 00:36:53
    Will AI drive higher-order thinking?

"AI Transformation – the new paradigm" with UC Berkeley Professor and AI Startup Expert, Greg La Blanc. Get ready to dive into the future of AI!

For some people, transformation is exciting and challenging. Curiosity and excitement about learning, drew Greg to into the field of strategy and transformation and all the other topics that he has been teaching throughout his career. 

Every time you learn something, you are displacing or changing some previous notion of how the world works. For some people, this is disturbing. But for others, it is a thrill and really exciting. It's how you approach the transformation is the beginning of how you deal with transformation, and curiosity is such a powerful, such a powerful human trait.

Some people would emphasize what they call long-term trends. And then others would be more inclined to say everything's new. Similarly, with the digital and AI transformations taking place, you can say, everything's new, everything has to be changed. This is something that we've never seen before, or you can say this is not that much different from the sorts of things that we have seen and happened to us in the past. 

As humans, we are in the entropy reduction business. We are trying to create order. We're trying to make sense of our world. We're trying to put in place practices that we can automate. We're trying to create routines and subroutines, and indeed, this is how efficiency happens. Efficiency happens when you realize, you start to recognize patterns, and you start to engage in repetitive action. 

The problem with that is that the circumstances and the environment changes. And so, the routines that you've established, they need to be changed at some point. And that requires a bit of work. So, sometimes, there's a couple different ways we can respond to that. One is to say, okay, the world's changed, so we got to change the way we're doing things. The other is to say, well, let's try to change the world so that we don't have to change. And that often means trying to shape the behavior of your customers or your employees or try to use regulation or market power to hold off the onslaught of change.

The third way is to say, let's change. 

Too much flexibility means that nothing ever gels, too little flexibility means that, you get stuck. And so, it is needed to figure out what that optimal amount of flexibility is, and then figuring out a way to routinize change. That sounds paradoxical. It means creating systems, which are designed right intentionally to respond to the, the changing environment. If you can routinize change, you can routinize curiosity. If you can create a standard operating procedure for discovery, then in some ways you can have your cake and eat it too. And that’s what all really good dynamic businesses are, are trying to do.

Every time there's a new discovery in the world of artificial intelligence, people say, now's the time. This is AI, it's this. Back in 2015 with neural nets, everyone's like, yes, AI finally. The possibilities of AI and each one of these sorts of punctuated discoveries are a continuation of series of discoveries that have been happening right in the world of artificial intelligence for the last couple of decades.

The technology diffuses rapidly. What doesn't diffuse as rapidly are managerial techniques, organizational, architectural innovations. And that's also the reason why older companies have a tough time adapting. They resist change and the kinds of transformations that they would need to undertake in order to enable new technologies.

There is the immune system of the organization, but the immune system of all of the individuals within the organization Natural propensity for many people is to fight new ideas when they encounter them as individuals. And then if you take that and you combine it into a big organization, you can often have an organization where every individual's open to new ideas, but the organization is not because it has its own logic.

Fear plays a role, but it's not the complete story. It's not always that they're afraid. They feel fairly confident that they can keep this at bay. And this is why leadership is so critical. You need carrots and sticks, but you also need your, your, your vision and, and your messaging.

Even before generative ai, more primitive forms of machine learning and the ones that have been the easiest to adopt are the ones that perform some relatively narrow tasks. Suppose you are in HR and you're doing hiring, and someone comes up with a product that helps you to process more applications more quickly. You can see how that is going to save you money. You can see if you are in marketing and someone comes along and says, I got this great tool that'll help you to figure out who you should be targeting with your marketing. You will think, I am a revenue center, I've just boosted my revenue. So, all of those specific applications are actually relatively unproblematic. 

Just setting aside AI for a second, if we look at the automotive industry. Look at a company like Ford or GM that has tier one suppliers, tier two suppliers, tier three suppliers, and son on. If there is an innovation in the steering column, the tier one supplier makes steering, they'll figure it out and they'll start selling it. But the challenge is when you want to figure out a way to connect those things.

The current supply chain architecture makes it very difficult, because you need to adjust the design elements of the brake to coordinate better with the design elements of the, the steering column. And when you have everything set up in this, then it becomes tough. Whereas with Tesla, which has an integrated, much more integrated production process and design process, it is super easy. To make those kinds of shifts. So, the reason the car companies are struggling is because they've tried to incorporate a lot of these new technological innovations into the pre-existing business architecture, supply chain, and value chain architecture, which was optimized for the internal combustion engine. Which is why someone like Tesla can just leapfrog.

Your competitive advantage is always going to come from the data. It is never going to come from your analytics tools. 

If I have access to unique data, then I can take cutting edge algorithms and train them on that data it can give competitive edge.

There will be companies that can they live without a solid data strategy, but for the vast majority of companies, if you do not have a data strategy, you're toast.

There are two major takeaways. The first one is in this transformation; your organizational structure is super important. How you organize your company so that data is democratised. And then the second one is having high quality unique data. Not just the quality of data, it is the uniqueness of the data is what's going to differentiate you going forward, at least in the next couple of years.

How do you make a balance between flexibility and order is also going to be an important skill for all leaders. All our education systems have to teach flexibility, adaptability, how to learn and how to learn fast.

With artificial intelligence in all of our jobs, we have to develop higher order thinking skills.

Production Team
Arvind Ravishunkar, Ankit Pandey, Rinat Sergeev, Chandan Jha, Nikhil Sood, Dipika Prasad

Co-create for collective wisdom

This is your invitation to become an integral part of our Think Tank community. Co-create with us to bring diverse perspectives and enrich our pool of collective wisdom. Your insights could be the spark that ignites transformative conversations.

Learn More