Let’s take a break and think about the state of AI in 2022.
In this episode I summarize the long report from the Stanford Institute for Human-Centered Artificial Intelligence (HAI)
Enjoy!
If you want a new interactive experience, I am scheduling hands-on session on Twitch
Feel free to drop by when there is a live session, and interact with me. I’ll see you there!
References
https://spectrum.ieee.org/artificial-intelligence-index
https://www.twitch.tv/datascienceathome
Transcript
1
00:00:00,610 –> 00:00:03,982
This episode is supported by Tonic AI.
2
00:00:04,126 –> 00:00:12,898
Creating quality test data for data scientists is a complex, neverending chore that eats into valuable technical resources.
3
00:00:13,054 –> 00:00:20,802
Random data doesn’t do it, and production data is not safe or legal for data scientists or developers to use.
4
00:00:20,996 –> 00:00:31,470
What if you could mimic your entire production database to create a realistic data set with zero sensitive data? Tonic AI does exactly that.
5
00:00:31,580 –> 00:00:38,074
With Tonic, you can generate fake data that looks, acts and behaves like production.
6
00:00:38,182 –> 00:00:40,438
Because it’s made from production.
7
00:00:40,594 –> 00:00:51,090
Tonic integrates seamlessly into your existing pipelines and allows you to shape and size your data to the scale, realism, and degree of privacy that you need.
8
00:00:51,260 –> 00:00:59,240
Your newly mimic datasets are safe to share with developers, QA, data scientists, and even distributed teams around the world.
9
00:00:59,750 –> 00:01:09,438
Shortened development cycles eliminate the need for cumbersome data pipeline work and mathematically guarantee the privacy of your data.
10
00:01:09,584 –> 00:01:15,874
With Tonic AI, this is the sound of turning ideas into software.
11
00:01:16,042 –> 00:01:18,990
This is the sound of engineering and passion.
12
00:01:19,790 –> 00:01:20,550
Work.
13
00:01:20,720 –> 00:01:22,798
Work more, work harder.
14
00:01:22,954 –> 00:01:24,022
Experiment.
15
00:01:24,166 –> 00:01:27,342
Build, break, and build again.
16
00:01:27,536 –> 00:01:29,958
Write code, improve it.
17
00:01:30,104 –> 00:01:31,160
Job done.
18
00:01:31,610 –> 00:01:32,670
Celebrate.
19
00:01:33,290 –> 00:01:34,086
Insurance.
20
00:01:34,268 –> 00:01:35,950
Finance, retail.
21
00:01:36,070 –> 00:01:36,838
Defense.
22
00:01:36,994 –> 00:01:37,906
Robotics.
23
00:01:38,038 –> 00:01:38,720
Energy.
24
00:01:39,290 –> 00:01:44,110
Amethyst welcome back to another episode of Datasensertone Podcast.
25
00:01:44,170 –> 00:01:50,470
I’m Francesco Podcasting from the Regular Office of Amethyst Technologies, based in Belgium.
26
00:01:50,650 –> 00:01:55,386
Today I’m speaking about the state of AI in 2022.
27
00:01:55,508 –> 00:02:04,630
This data comes from the usual report that is produced by the Stanford Institute for Human Center Artificial Intelligence.
28
00:02:04,810 –> 00:02:21,414
Hai they essentially put the AI index, which is a very large report of data and graphs and statistics about what’s going on in the field of AI and, of course, what we should be expecting in the coming years.
29
00:02:21,572 –> 00:02:29,010
Now, the links of the official report will be reported in the show notes of this episode, as always@datasensatom.com.
30
00:02:29,060 –> 00:02:38,120
But of course, you can also jump on the Discord Channel or Official Channel, where you can speak with the folks of the Data Science at Home Podcast and with myself.
31
00:02:38,990 –> 00:02:57,726
So in this episode, I would like to go through this summary of the report, and because I believe that it’s quite important, there are some news that we probably did not expect, some others that we definitely have seen in the last few years happening.
32
00:02:57,788 –> 00:03:10,698
So nothing super novel there, but it’s always good to make a state of the art of how things are and kind of a point of where we are at and where we are going.
33
00:03:10,844 –> 00:03:16,750
So the very first thing that I would like to discuss is about investment.
34
00:03:16,810 –> 00:03:27,694
So we have seen, if you look at the trend, from 2013 to 2021, there has been a massive global corporate investment in AI.
35
00:03:27,802 –> 00:03:30,150
And no big news here.
36
00:03:30,200 –> 00:03:35,540
We have seen this happening, we have heard many times, and we have read it many times on the news.
37
00:03:35,930 –> 00:04:00,834
But just to give you some numbers, when it comes to private investment, in fact, there has been an explosion, literally from about $9 billion in 2015 that number has grown to $15,000,000,000.02 335-4246 in 2020 and a massive 93 in 2021.
38
00:04:00,872 –> 00:04:04,558
So almost double that with a similar trend, very similar trend.
39
00:04:04,594 –> 00:04:11,310
We have seen also merger and acquisition in, of course, billions of dollars in the years.
40
00:04:11,420 –> 00:04:13,566
Very small number, 2015 and 2016.
41
00:04:13,568 –> 00:04:30,570
And then all of a sudden it starts growing to 22 billion of US dollars in 2017, 20 in 2019, and then again 21 in 2020 and the 72 in 2021.
42
00:04:30,620 –> 00:04:39,620
So again, two big trends here a lot of private investment in the last year and a lot of acquisitions and merged companies.
43
00:04:40,190 –> 00:04:49,170
So the story here, probably, if you want to give an interpretation to these numbers, is that it’s probably good to join a STARStar, as in making one.
44
00:04:49,340 –> 00:04:52,458
That’s at least what they conclude out of these numbers.
45
00:04:52,544 –> 00:05:11,962
But probably it’s not going to be a general case, but it’s a very good indication that we are in front of, I would say richer model in which these established corporations keep merging and keep acquiring technologies from small startups.
46
00:05:11,986 –> 00:05:18,486
We have seen that trend going already, up from 2017 and it doesn’t seem to stop anytime soon.
47
00:05:18,548 –> 00:05:21,320
At least that’s what the numbers are telling us.
48
00:05:21,710 –> 00:05:30,750
The second point is about the US versus China cross collaboration, crosscountry collaborations in AI publications.
49
00:05:31,250 –> 00:05:44,998
And there is a massive trend here that tells a very clear story that United States and China are in fact contributing to AI publications.
50
00:05:45,154 –> 00:05:50,362
There is a never growing increase in cross country collaborations when it comes to two countries.
51
00:05:50,506 –> 00:05:57,142
And the difference between only the second one, like UK and China, it’s just a different planet.
52
00:05:57,286 –> 00:06:04,242
We have a number of AI publications in thousands, it’s about more than 110 in 2020.
53
00:06:04,256 –> 00:06:13,086
And the second best, which is UK, china is only 3000 or a bit more.
54
00:06:13,208 –> 00:06:18,282
So different orders of magnitude there.
55
00:06:18,416 –> 00:06:19,578
We have seen that.
56
00:06:19,724 –> 00:06:35,790
Honestly, I didn’t expect that because we have heard a lot of geopolitical tension between US and China, and I personally did not expect to see those completely different stories in terms of collaboration and AI between these two countries.
57
00:06:36,350 –> 00:06:37,650
So that’s a good thing.
58
00:06:37,700 –> 00:06:48,210
In fact, I like that and I believe that good research comes also from a bit of competition, sometimes also together with collaboration.
59
00:06:48,890 –> 00:06:57,406
There’s always that kind of sweet spot between competing and collaborating that keeps things spicy and definitely improves research globally.
60
00:06:57,538 –> 00:07:11,106
When it comes to patents, however, the story is pretty different because China has, let’s say, record of applying to patents, but not the record of filing them.
61
00:07:11,168 –> 00:07:18,930
So these patents, they apply much more than United States, but these patents don’t get granted, essentially.
62
00:07:19,430 –> 00:07:21,946
So there are two different curves.
63
00:07:22,078 –> 00:07:34,830
There the application, which is China is basically dominating the rest of the world, but when it comes to granted patents, they are even below European Union and United Kingdom.
64
00:07:35,150 –> 00:07:38,202
And again, United States there tells a different story.
65
00:07:38,336 –> 00:08:06,546
They have several points up in percentage when it comes to granted patents, and also when it comes to applied patents, they have a much smaller curve than China, much lower curve than China, which tells us that in the United States, there is much more reasoning about applying to patents and much higher success rate.
66
00:08:06,728 –> 00:08:10,998
Which means that the papers, the research that gets produced by US.
67
00:08:11,084 –> 00:08:18,320
Researchers is more credible, in a way, at least this is what I believe these numbers are telling me.
68
00:08:18,890 –> 00:08:20,420
Patents cost money.
69
00:08:20,930 –> 00:08:25,700
You also have to defend from abuse, and that also costs money.
70
00:08:26,030 –> 00:08:33,726
So when you file for a patent, it means that you’re really sure that something tangible behind a patent is there.
71
00:08:33,848 –> 00:08:41,518
There is a research that can be, indeed can have a massive impact on a foreign application in a particular sector.
72
00:08:41,614 –> 00:08:45,620
Otherwise you wouldn’t be spending that much money for protecting that idea.
73
00:08:46,130 –> 00:09:02,382
And this means that in the United States, this feeling about registering a patent and protecting it, and of course, selecting credible research is much more evolved, is much more sophisticated than the rest of the world.
74
00:09:02,516 –> 00:09:14,610
Now, when it comes to technology, we have seen two big factors when it comes to AI and two fields of research, in particular, computer vision and NLP.
75
00:09:15,050 –> 00:09:42,262
In the years, we have seen these two fields going kind of hand in hand when it comes to outcome of research and accuracy, and also the power of these methods whenever they are applied in the real world, right? And when it comes to computer vision, we have seen a big improvement in, for example, recognizing images, but also doing facial recognition, object classification.
76
00:09:42,406 –> 00:09:54,610
We know that the current technology, which is a technology already three, four, probably five years old, already has kind of reached a plateau in terms of accuracy.
77
00:09:54,790 –> 00:09:57,082
We are good at recognizing objects.
78
00:09:57,226 –> 00:10:04,306
We know that there are several limitations when it comes to, for example, reasoning about the recognized objects.
79
00:10:04,378 –> 00:10:33,990
And there is a very classic example now about recognizing a scene in a picture and trying to understand what’s going on from another perspective, not just recognizing a ball or a person or a dog, right? We also want to understand what’s going on in the scene, like what’s happening and why, for example, the dog is not jumping or whatever is described by those images.
80
00:10:34,430 –> 00:10:42,534
Now, that’s where the problem essentially comes out, because artificial intelligence, and in particular, deep learning technology, we have seen it many times.
81
00:10:42,572 –> 00:10:44,900
We have said it many times as well on this show.
82
00:10:45,350 –> 00:10:50,550
It’s not sufficient for reasoning about images.
83
00:10:50,870 –> 00:10:54,810
And a very similar trend has happened in NLP.
84
00:10:55,370 –> 00:10:57,214
Natural language processing.
85
00:10:57,382 –> 00:11:13,150
We are good at, for example, recognizing or classifying the semantics of a paragraph, or even summarizing small text or relatively long documents into paragraphs.
86
00:11:13,330 –> 00:11:17,720
But reasoning about what’s written, it’s a different story.
87
00:11:18,170 –> 00:11:31,640
And we also are very familiar with the fact that natural language processing and language models in particular, are probably one of the most complex tasks that an AI would be asked to work on.
88
00:11:32,090 –> 00:11:43,880
Now, regardless of what they say with GPT-2 three, whatever other version out there, NLP hasn’t been mastered as a field.
89
00:11:44,870 –> 00:11:49,090
And this is exactly where all these limitations are essentially shown.
90
00:11:49,150 –> 00:11:53,974
Even in the report, reasoning is still a frontier of AI.
91
00:11:54,142 –> 00:12:05,622
We haven’t solved that problem, and I’m pretty sure, and I quote pretty, that with the current technology and with current algorithms, we cannot solve that problem.
92
00:12:05,816 –> 00:12:19,830
I believe, personally believe, that there is a, I’m not saying a plateau on deep learning techniques, but definitely we need different methods to think about and to solve the reasoning problem.
93
00:12:20,000 –> 00:12:34,762
When it comes to images, and of course, when it comes to natural language processing, the biggest change with respect to the past years has been shown in the ethics sector.
94
00:12:34,906 –> 00:12:40,902
So ethical AI and has essentially monopolized the research.
95
00:12:40,976 –> 00:12:46,158
In the last two to three years, we have seen literally an explosion in education.
96
00:12:46,244 –> 00:12:48,046
When it comes to number of papers.
97
00:12:48,118 –> 00:13:09,642
From 2018, we had something like 63 papers related to ethical AI and ethics, and these numbers literally doubled in 2019 to 139 papers then, 200 papers in 2020, and 227 in 2021.
98
00:13:09,836 –> 00:13:10,580
Now.
99
00:13:10,970 –> 00:13:12,970
This was kind of expected.
100
00:13:13,090 –> 00:13:22,750
And we have had several guests even on this show when we spoke about ethical AI and ethics in general when it comes to machine learning and algorithms.
101
00:13:22,870 –> 00:13:29,962
Whenever they are applied to all these critical applications where there is a lot of human involvement.
102
00:13:30,046 –> 00:13:31,450
Think about healthcare.
103
00:13:31,510 –> 00:13:32,686
Pharmaceuticals.
104
00:13:32,818 –> 00:13:42,210
Autonomous vehicles that have to decide to stop or kill the dog or the person in extremely critical situations.
105
00:13:42,590 –> 00:13:44,850
Or think about defense and military.
106
00:13:45,470 –> 00:13:57,594
Whenever you deploy artificial intelligence in all these critical sectors that have to do with humans, well, then we should be thinking about ethical algorithms, that’s for sure.
107
00:13:57,632 –> 00:14:00,620
So we have seen this happening.
108
00:14:01,550 –> 00:14:08,110
One thing that, however, I didn’t expect personally is that industry is following a very similar trend.
109
00:14:08,230 –> 00:14:15,238
And so that rarely happens, to be honest, that education and industry go hand in hand on certain matters.
110
00:14:15,334 –> 00:14:28,590
And we have seen that in 2018 to 2021, there has been pretty much a similar growth in terms of number of papers for the ethical sector, even in the industry.
111
00:14:29,150 –> 00:14:43,014
So this tells a good story, because probably already the idea of thinking about ethical algorithms is also something that people from industry are considering as a source of problems.
112
00:14:43,112 –> 00:14:59,038
If we don’t deal with these issues, we better deal with these issues now, because algorithms are more and more penetrating the daily life of people, and many other fields out there are pretty much tainted by artificial intelligence.
113
00:14:59,074 –> 00:15:05,142
So we better start thinking of ethical AI on a much more serious level.
114
00:15:05,336 –> 00:15:17,058
Now, when it comes to the people involved in AI, the people who do AI, well, there’s a different story, a pretty depressing one, I must say.
115
00:15:17,204 –> 00:15:23,840
Only a couple of points, percentage points when it comes to AI and women from 2010.
116
00:15:24,290 –> 00:15:35,938
So ten years ago, twelve years ago, we had something like 16, probably 17% females and computer science PhDs.
117
00:15:36,034 –> 00:15:40,270
That’s the percentage represented in North America.
118
00:15:40,450 –> 00:15:43,962
And in 2020, only a couple of percentage points.
119
00:15:44,036 –> 00:15:45,574
That’s really depressing.
120
00:15:45,622 –> 00:15:48,222
Like only a couple of percentage points.
121
00:15:48,356 –> 00:15:51,834
AI definitely needs women and this is not happening.
122
00:15:51,932 –> 00:15:55,280
So we better insist on this.
123
00:15:55,970 –> 00:15:57,414
I don’t accept this.
124
00:15:57,512 –> 00:16:16,618
I honestly don’t accept this also because at the same time we have seen, again in the same demographics in North America, we have seen an increase in number of computer science undergraduates and graduates at doctoral institutions, and that’s been a massive increase.
125
00:16:16,774 –> 00:16:24,500
So if you combine these two numbers and these two trends here, the story is pretty clear.
126
00:16:24,890 –> 00:16:30,150
Even the new computer science undergraduates and graduates are mostly males.
127
00:16:30,470 –> 00:16:33,870
And that’s also something that it’s hard to swallow.
128
00:16:34,610 –> 00:16:35,982
I don’t like that.
129
00:16:36,176 –> 00:16:38,562
I think I said that a number of times already.
130
00:16:38,756 –> 00:16:40,890
We need more women in AI.
131
00:16:41,390 –> 00:16:47,934
Not only that, we also need different ethnic backgrounds to be represented in a more homogeneous way.
132
00:16:48,032 –> 00:16:59,190
Personally, I find it unacceptable to have almost 60% white non Hispanic PhDs and CS undergraduates in AI.
133
00:17:00,110 –> 00:17:12,426
The Asian population is represented only for 25%, but Hispanic and black or African American are represented for merely 3.5%.
134
00:17:12,488 –> 00:17:14,842
So that’s also unacceptable.
135
00:17:14,986 –> 00:17:23,360
And the reason why I’m saying that is because there might be bias in the data, as we have proved many times, but there are also bias in research.
136
00:17:23,690 –> 00:17:39,214
And so if only a particular ethnic background is representative of the community, of the AI community, I believe that their research directions will also be biased.
137
00:17:39,322 –> 00:17:42,630
And that’s also something that doesn’t sound right to me.
138
00:17:42,800 –> 00:17:52,414
Now, last but not least, and this is also something that kept me awake at night, is climate change.
139
00:17:52,572 –> 00:18:00,770
We have claimed climate change to be the next big sector in which we should have seen, or we would have liked to see AI in action.
140
00:18:00,890 –> 00:18:02,770
And that’s not happening now.
141
00:18:02,820 –> 00:18:10,200
Of course, 2022 is not finished yet, but the trend is quite depressive as well.
142
00:18:11,970 –> 00:18:23,930
So I have a list of the topics where AI related papers and policy papers in particular are most represented.
143
00:18:24,370 –> 00:18:27,974
And we have at the top of the list, of course, privacy, safety and security.
144
00:18:28,132 –> 00:18:29,774
We have innovation and technology.
145
00:18:29,932 –> 00:18:40,746
We have ethics, as we have already said, and then we have all the others that go down and down, like industry and regulation, workforce and label, education and skills, et cetera, et cetera.
146
00:18:40,878 –> 00:18:48,846
At the end of that list, and again I will send you the link of this chart in the show notes of this episode.
147
00:18:48,978 –> 00:18:58,206
At the end of this list, we have social and Behavioral Sciences, health and Biological sciences, energy and Environment, and Humanity.
148
00:18:58,278 –> 00:19:01,878
So these are the last two in the list, energy and Environment.
149
00:19:01,974 –> 00:19:03,950
So I definitely do not.
150
00:19:04,060 –> 00:19:09,794
Did not want to see Energy and Environment last month before the last in that list.
151
00:19:09,952 –> 00:19:10,634
Now.
152
00:19:10,792 –> 00:19:11,186
Again.
153
00:19:11,248 –> 00:19:12,746
What my speculation is about.
154
00:19:12,808 –> 00:19:18,650
And because I don’t have the correct breakdown or the detailed breakdown of these findings.
155
00:19:19,090 –> 00:19:30,302
I expect that some of the innovation that is related to energy and environment is probably grouped in the innovation and technology field.
156
00:19:30,436 –> 00:19:32,150
Which is second in the list.
157
00:19:32,320 –> 00:19:33,650
But I’m not sure about that.
158
00:19:33,700 –> 00:19:36,054
That’s why we need more detailed results.
159
00:19:36,102 –> 00:19:42,100
But no matter what we should definitely think about this.
160
00:19:42,490 –> 00:20:02,546
Energy and environment is probably one of the probably the second most important topic that we should be covering together with health care and of course, safety in general, and where I believe that artificial intelligence can play a fundamental role when it comes to energy, renewable energy, and of course, the environment in general.
161
00:20:02,728 –> 00:20:03,710
That’s it for today.
162
00:20:03,760 –> 00:20:05,126
I hope you enjoyed the show.
163
00:20:05,248 –> 00:20:06,760
Speak with you next time.
164
00:20:12,230 –> 00:20:15,262
You’ve been listening to Data Science at home podcast.
165
00:20:15,346 –> 00:20:19,930
Be sure to subscribe on itunes, Stitcher or Pod Bean to get new, fresh episodes.
166
00:20:19,990 –> 00:20:25,980
For more, please follow us on Instagram, Twitter, Facebook or visit our website at datasciencehome.com