Welcome to insideBIGDATA’s “Heard on the Street” round-up column! In this regular feature, we highlight thought-leadership commentaries from members of the big data ecosystem. Each edition covers the trends of the day with compelling perspectives that can provide important insights to give you a competitive advantage in the marketplace. We invite submissions with a focus on our favored technology topics areas: big data, data science, machine learning, AI and deep learning. Enjoy!
The Importance of Human Touch in AI Innovation. Commentary by Igor Bergman, VP & GM of Cloud and Software at Lenovo
Today many businesses rely too heavily on technology to reach their goals. Consequently, automation cannot work solely by itself for the best customer experience. Instead automation requires human touch. Combining both humans and technology together allows businesses to provide improved efficiencies and create tailored and engaging experiences for customers. For example, smart AI, by combining voice recognition with AI back-end, can enable people to communicate in virtual meetings more easily without having to make adjustments manually. Relying on AI to learn what the optimal meeting environment is and then adjusting accordingly is a great use of AI expanding human capabilities and allowing humans to focus on other aspects of the meeting. If the last few years have taught us anything, it’s that we no longer think about work, play and home as separate, but blended. This is where intuitive AI, equipped with human elements working in collaboration, can really excel and help users take their day-to-day experiences to new levels. Solutions and applications should be built to improve the user’s experience, across the board, from the application itself, to how the device is set up, to access and security, whether via the cloud or a corporate VPN. AI enables us to do this more effectively. With device diagnostics powered by AI and user data, we are able to better understand how users use their device, what’s important to them and proactively work to resolve issues. IT administrators, gamers, students and individual device owners can have not only an optimized and secure device using AI to automate much of the manual labor, but a personalized experience because it’s them, the human, that will choose the best options for their own devices. This is another example of how the software on the device assists the human to amplify their device experience.
Tapping into data analytics for more productive hybrid meetings. Commentary by Brian Goodman, Director of Product at Poll Everywhere
Meeting norms have undoubtedly changed amid hybrid work. As a result, companies are conducting most, if not all, of their team interactions online. This reality opens a very beneficial door for business leaders, as they’re able to reap the benefits of data analytics stemming from meetings. With meetings now being largely facilitated on platforms like Zoom, MS Teams, and Webex, data can be collected via live presentations and meetings via participant feedback – like polling, Q&As, open-ended and multiple-choice questions, comments, reactions, etc. From there, vast amounts of functional data that reveal employee sentiment and engagement levels can help leaders to thrive in this new era of work where every interaction serves as an opportunity for better listening. By employing their data insights, leaders can improve online engagements and overall decision-making – once again revolutionizing the way we conduct work.
The potential downside of going serverless. Commentary by Alexey Baikov, CTO and Co-founder of Zesty
For some organizations, the upside to serverless is clear, enabling them to move quickly without the need to address the underlying infrastructure. For many others, however, making the switch to serverless may not only prove to be unnecessary but also has the potential to be challenging from a cost and performance perspective. Limited customization capabilities in serverless PaaS may drastically hinder companies’ power to meet certain KPIs efficiently, and as a closed source platform, the limited visibility and ability to monitor effectively increases the difficulty of debugging, preventing outages, and performing root cause analysis. If managed incorrectly, this can lead to increased costs. In addition, going serverless tends to become very expensive when the time comes to scale. The combined runtimes of the many functions required as a company scales have the tendency to be roughly 45% more expensive than running on a traditional on-demand virtual machine. The benefits of serverless computing are certainly vast, but companies must also consider the downsides to ensure they’re approaching it intelligently.
Open-source large language model, BLOOM, released. Commentary by Marshall Choy, SVP of Product at SambaNova Systems
Large language models (LLMs) are state of the art but the sheer size of them is a big obstacle for academic researchers. BLOOM, backed by a significant grant and in partnership with over a thousand volunteers, is finally making LLMs accessible to academia, allowing researchers to further advance these models. LLMs such as BLOOM represent a new class of technology that fundamentally shifts the needle for AI. These foundational models – AI models which are not designed to be task-specific and are trained on a broad set of unlabelled data – have multiple applications across industries. We still have so much to learn about the mathematics of LLMs, and BLOOM presents an opportunity for academics to improve the algorithms. But two years on from the release of GPT-3, the biggest challenge for LLMs is still applying the models to enterprises in real-world scenarios. Bringing together foundational models such as BLOOM with domain-specific knowledge is a game-changer for enterprises.
Data Quality is Paramount to Data Ops and Data Empowerment. Commentary by Heath Thompson, President and GM of Information Systems Management, Quest Software
Data quality has overtaken data security as the top driver of data governance initiatives — with 41% of IT decision makers noting that their business decision-making relies fundamentally on trustworthy, quality data. The problem? As businesses deal with a massive influx of data available in today’s modern world, so much of that data is not being used strategically. In fact 42% of ITDMs said that at least half their data is currently unused, unmanageable and unfindable. This massive influx of dark data and a lack of data visibility can lead to downstream bottlenecks, impeding the accuracy and effectiveness of operational data. Recent research shows that focusing on DataOps is overwhelmingly agreed to be a major key in empowering employees to use data confidently. In fact, 9 in 10 ITDMs believe that strengthening DataOps improves data quality, visibility and access issues across their business. Businesses should look to improve DataOps accuracy and efficiency by investing in automated technologies and deployment of time-saving tools, such as metadata management. Currently, only 37% of respondents describe their DataOps processes as automated, and a similarly small proportion report having automated data cataloging and mapping today (36% and 35% respectively). That number will need to increase significantly in order to fully maximize data use for both IT and line-of-business needs.
On Snowflake Summit and Databricks. Commentary by Lior Gavish, CTO, and co-founder, of Monte Carlo
This conference season, one thing was clear: it’s all about collaboration. During Snowflake Summit 2022 in Las Vegas, the Data Cloud provider announced new features to make it easier for developers to build and monetize data applications on top of Snowflake, while Databricks announced their own data marketplace, a new platform for exchanging data products, notebooks, and machine learning models across teams and even companies. As these cloud behemoths continue to roll out new products and services that make it easier for customers to decentralize and share data, we expect the onus on data quality and trust will grow even bigger.
Amid record global heat, utilities turn to satellite and AI tech to prevent wildfires, outages. Commentary by Jeff Pauska, Digital Product Director, Hitachi Energy
From Australia and Europe to North America, record droughts and abrupt changes in climate have created profound operating environment risks for power utilities, increasing their likelihood of sparking wildfires and initiating damaging, widespread outages. Utilities need to proactively manage vegetation growth around infrastructure even more carefully this time of year – despite constrained budgets – to avoid a disaster like the Dixie Fire, the second-largest wildfire in California’s history, which was sparked when power lines came into contact with a tree. As utilities take on this critical task of vegetation management, they are turning to new technology for support. Using deep AI visual analysis and satellite imagery, utilities can automatically analyze vegetation around their overhead lines and take proactive steps toward wildfire and outage prevention. This AI technology, using algorithms trained on thousands of miles of utility asset data, satellite imagery, and validated by point-cloud field captured datasets, automatically identifies vegetation infringements against business action thresholds, and predicts tree growth and off right-of-way hazards – a major risk factor in utility-caused wildfires. By automatically identifying trees and other vegetation at risk of contacting power lines, utilities can prevent wildfires and protect their customers from catastrophe and widespread outages.
How AI is becoming easier and more accessible to everyone. Commentary by Erin LeDell, Chief Machine Learning Scientist, H2O.ai
With more businesses moving towards incorporating AI in their day-to-day operations, one of the biggest challenges to its advancement is that often, organizations don’t have the internal resources or expertise to develop and carry through projects that use AI. This is particularly the case with businesses outside of the technology industry. With demand for AI at an all-time high and these challenges in mind, the biggest scale-related trend is the acceleration of democratizing AI – making it not only available to everyone, but also easy and fast to use, so all companies can get in on the action. This is where open source frameworks and the ability to use low-code and pre-canned proprietary apps are growing in popularity, as they make it easier for any kind of enterprise to build and operate AI-based services in common areas like fraud prevention, anomaly detection and customer churn prediction.
Climate Resilience Analytics Emerges as the Latest Means to Evaluate Threats. Commentary by Toby Kraft, CEO, Teren
The infrastructure industry is no stranger to geospatial data. However, it’s often viewed as a source for specialists rather than decision-makers. The size and complexity of geospatial data have limited its use to GIS and data analysts rather than the enterprise. But that’s all changing rapidly as decision makers need access to data that’s not only spatially accurate but timely. Across all infrastructure industries, including oil and gas, renewables, electric, telecommunications, roads and railways, decision-makers need to shift from a focus on risk management to strategic resilience. This requires them to understand not only their asset’s risk, but also the relevant external and environmental threats and how they change over time. The result is in an emerging market: Climate Resilience Analytics. Climate resilience analytics go beyond climate risk modeling to inform physical risk mitigation and strengthen resilience. It pinpoints where climate risks threaten assets, prioritizes threats, reveals how site conditions can be modified to fortify assets, and monitors and measures progress toward physical resilience through time.
Navigating VC deal flows, looming recession. Commentary by Ray Zhao, Affinity’s co-founder and co-CEO
It is clear that the venture investing world has changed, with the public market slowdown and the closing of the IPO window directly impacting startup valuations. However, we are yet to see the volume of investments being made slowing down as much as feared but we expect that to be the case more in the second half. We see that volume shift in our platform that shows that VCs are adding new deals to their pipeline at a 23% slower rate than in 2021 – pointing to a much more strict set of criteria being applied to potential investments. Given the amount of money available to be called down by VCs is actually increasing, we do not expect that this situation will just lead to better deals for VCs in the second half but rather an increased level of competition between VCs for great investments–as they are all applying the same selection criteria. The VC firms who are putting the effort into founders’ relationships, understanding the reality of their investment criteria and great deal management are going to be best positioned to win that competition.
Adapting to new demands for data quality through sustainable data operationalization and end-to-end testing. Commentary by Michael White, Sr. Product Marketing Manager at Tricentis
There is a crisis of information trust, observes distinguished Gartner analyst Ted Freidman, and poor data quality is a major factor at the root of it. With pressure on organizations to adapt to new demands for digital transformation, seamless operation of increasingly complex data pipelines, and more robust data compliance expectations, the risks of bad data are more prevalent now than they were a few years ago. Add to that a renewed emphasis on data governance and data ownership, and organizations are increasingly seeking automation and AI-powered solutions to minimize the risks of tedious scripts, convoluted SQL queries, and manual spreadsheet-based data exercises. After all, a shocking 24% of Enron’s spreadsheets contained errors. Furthermore, Gartner suggests that the lack of a sustainable data and analytics operationalization framework may delay key organizational initiatives for up to two years – an eon in today’s digital economy! While the focus on avoiding past mistakes is good (e.g. designing user interfaces and onboarding data sources), a solid framework for testing the complete data layer – including the API and UI – is critical for a sustainable data framework and operationalized analytics value chain. End-to-end data testing is a more effective approach for ensuring costly data issues are captured at the source(s). In this manner, data discrepancies can be identified, reconciled, and remediated prior to them rearing their ugly proverbial heads downstream causing breakdowns or sometimes worse: leaking into BI reports and ML models which can result in unwittingly poor business decisions and bad predictions for months.
Machine Learning Automates Relevancy at Scale. Commentary by Katie Boschele, Senior Product Manager, Lucidworks
The quality of any digital experience depends on how relevant it is to the person using it. There are thousands, upon hundreds of thousands, of people using a site at any given moment and expecting a relevant experience that meets their unique needs. It’s impossible to build that manually. Machine learning and other advanced technologies automate relevancy to connect people to the information, products, and services that they need. One of the best examples of this automation in action is with semantic vector search. One of the places we see this is enhancing the queries in the search bar. Let’s say a contractor is looking for a very specific piece of connective pipe. They type in the product number but that product is no longer being manufactured. Instead of getting a “No Results” message, semantic vector search understands what they are looking for and relates this query to the newer version of this same product—no manual updating required by the merchandisers on the other end. Machine learning automation saves time and the sale for merchandisers and valued customers alike.
Google announces Q2 2022 earnings. Commentary by Amit Sharma, CEO and co-founder, CData Software
Alphabet’s Q2 earnings show that the tech giant isn’t immune to the challenges the market currently faces. Continued transitions and reliance on the cloud has enabled the provider to remain competitive as businesses modernize their tech stack. Organizations are increasingly shifting to cloud databases to make their data more easily accessible and agile. As organizations prioritize data connectivity amid remote and hybrid work changes, we can expect Google to further hone in on their capabilities to keep organizations efficient. Data is optimized when it’s secure and also readily accessible across all systems in real-time – that’s how businesses can uncover the true value of their data.
Insights for its hybrid cloud and AI strategy. Commentary by Qin Li, Solutions Manager at Tamr
What should businesses focus on to adopt hybrid AI successfully and ensure they’re using the right tech? Focus on the business results. Start the project with value in mind and select solutions that can get you to value quickly and easily. Maintain some sort of lineage. Make sure the workflow keeps track of what decisions were made by the machine and what was overwritten by humans so that the machine can learn from it continuously. Nowadays, any machine learning projects or deep learning projects will involve tons of data. Solid cloud storage and compute infrastructure would definitely be essential. On top of that, distributed systems such as spark can help with parallel processing of the data, helping speed up the process. I would recommend open source technologies and high interoperability with other technology components.
Embracing BI through a vendor-agnostic, open lakehouse approach. Commentary by Billy Bosworth, CEO of Dremio
Anyone familiar with the potential of big data analytics in the cloud is likely a fan of data lakehouses. Lakehouses are shaping the future because they combine the scalability of data lakes and the quality of data warehouses. However, ensuring an open architecture is key to unlocking value in data and enabling organizations to effectively deliver insights using SQL and best-of-breed tools for years to come. Without an open approach, developers are locked into vendor-specific approaches, often with costly contracts that add time and complexity. In comparison, an open model enables reliable insights from a single source of truth, and provides maximum flexibility for future technology decisions. Not only does an open data architecture allow for easy access and the ability to analyze data without moving or duplicating it, but it’s vendor-agnostic. This enables enterprises to future-proof their data architecture and choose leading technologies as they see fit. Done correctly, cost savings are also realized due to the elimination of data copies and expensive data movement.
Sign up for the free insideBIGDATA newsletter.
Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1