Recent news about generative artificial intelligence is creating a great deal of justified excitement around the new possibilities for human-computer interaction. However, as with previous AI and machine learning hype cycles, reaping material benefits will be elusive if generative AI is treated like a magic wand, particularly in the context of enterprise-grade automation. .
While generative AI has already spawned a lot of experimentation, most demos focus on the technology’s ability to generate output—text, music, art, and even code templates—guided by human input. These results, while impressive in their own right, are generally understood to be first drafts, often with varying degrees of acceptable errors and omissions, such as an entry-level wizard might provide.
However, early generative AI automation prototypes tend to demonstrate simple if-this-then-else capabilities without reference to security, governance, and compliance concerns. Complex enterprise and industry systems hold a much higher standard for automating processes safely, with complex use case requirements, organizational policies, and industry regulations, requiring in-depth domain-specific knowledge and rules.
While AI and automation will continue to converge for ever smarter solutions, the need for unmodeled probabilistic analytics and modeled, deterministic transactional capabilities will endure.
Generative AI and domain models: better together
Although some of the early hype of the deep learning space suggested that modeling is outdated and now unnecessary, the limitations of generative AI and its brute-force Bayesian statistical analysis are already apparent. Noted artificial intelligence expert Andrew Ng, founder and CEO of Landing AI Inc., recently acknowledged this in a article promote better data on more data for generative AI.
What is “better data”? It is data that comes from a modeled domain: “tagged” or “prepared”. These two worlds are not at odds. It makes sense that good facts serve as an informed guess. Similarly, the modeling world can harness the power of generative AI to “jump start” domain modeling, which was already a semi-automated endeavor. It is a sign of maturity that we are moving past the false binary stage and are now thinking about how the old and new approaches are better together.
Get better data
He DIKW pyramid (Data, Information, Knowledge, Wisdom) is known to any self-respecting data analyst. In effect, it is a maturity model for data modeling. Each step up the pyramid supports higher-level reasoning about the data and its use:
- A defined domain with classes and entities elevates data to information: the formalization and validation of data.
- Top-level domain knowledge expressed in the form of graphical concepts, types, and policies (ie, ontology) elevates information into knowledge: relationships support analysis and insights.
- When knowledge is applied to optimize action, we achieve wisdom, the top of the DIKW pyramid: understanding the activity in its domain context.
The effort to rigorously model domains creates a clear, consistent, and coherent understanding of the domain not only for humans, but also for machines. This is why “better data” improves AI generative results, as well as AI/ML and analytics.
context is expensive
It makes sense that the closer the data is to a specific application domain, the more accurate its meaning in that vernacular domain and the greater its utility. Conversely, when data is aggregated outside of its domain context, it loses meaning, making it less useful. For application developers, this is the conceptual foundation of domain-based design, the “database per service” pattern of microservices, and federated domain data grids.
Data architects dismiss these app-centric approaches as siloes, which is true, but the logic of AppDev is sound. Developers definitely want data, but they need fast, efficient, and secure access to relevant data that they can apply in the context of a specific application. They work tactically within technical limitations and budgets of latency (that is, how much I/O, data processing, parsing, and transformations they can practically perform in the span of a single one- to two-second human interaction) or system interaction ( usually subsecond). There is also a real financial cost to compute-intensive and I/O applications in terms of network bandwidth and resource consumption.
So the context is expensive. It’s worth noting that historically centralized solutions promoted by data architects, such as master data management, data marts, data warehouses, big data, and even data lakes, haven’t exactly been runaway successes. Typically, core data is historical (non-operational), processing is batch (not real-time), extracting relevant information is difficult and time consuming (inefficient). From a practical perspective, developers typically throttle data processing for latency budgets and instead focus on local application-specific parameters.
The tension between the perspectives of data architects and software developers is clear, but data and code are two sides of the same coin. The barrier to more data-driven applications is technical, not ideological. The focus should be on reducing the “cost” of real-time, contextual data for application developers and making it easier for them to quickly build applications based on real-time data.
Beyond the data lake
The combination of generative AI, insight graphs, and analytics can help prevent data lakes from becoming data swamps. Together, they can provide useful abstractions that make it easy for developers to navigate semantically and take advantage of centralized data lakes. In effect, the knowledge graph brings domain knowledge (“better data”) to the big language model, allowing for more accurate analysis of the data lake.
A company that addresses the technical challenge of the data value chain is snowflake inc., which provides a lake of data and is now built on value-added capabilities, including LLMs, insight graphs, and a variety of analytics tools, such as discrete offers exposed on your marketplace. In addition, Snowflake’s Snowpark development environment supports DataFrame-style programming, allowing developers to add their own user-defined functions in the language of their choice.
Like Dave Vellante and George Gilbert reported during “Data Week” last week, this will be a great help to developers who can use Snowflake’s Snowpark to contain pre-built functions and algorithms that run locally on historical data and stream aggregated operational data into Snowflake. These “data applications” benefit from direct reasoning about the data lake, with common security and governance. In essence, Snowflake is reviving old database techniques (eg, stored procedures, user-defined functions, and CRUD event creation, reading, updating, and deleting) for its data cloud to drive local processing and increase its usefulness as a data platform.
While all this cloud-based data processing won’t come cheap, it represents a leap in the reach, ease, and performance of traditional online analytics processing, making it possible to extract deep insights in real time, which can then generate recommendations and panels. as well as activate notifications and actions to external people and systems.
The industry has come full circle. Enterprise data, once centralized on the mainframe, then fragmented by distributed systems, the Internet, and now the edge, is being added and disambiguated in the cloud with near-infinite compute and storage capacity and high-speed networking. Introduces a powerful new data backplane for all applications.
Snowflake, along with its competitors, now looks to the application layer. The application layer is the strategic terrain because it is where data can optimize business automation and user experiences.
Up to the application layer and beyond!
It is still up to developers to exploit business intelligence in web and mobile applications, as well as their most complex business applications and operational processes. Snowflake’s data app store and containers are interesting, but it’s not a programming model or development platform for software engineers.
Instead of having to manually discover and integrate ad hoc data apps from an app store, it would be ideal if developers could programmatically configure the native data service of a data lake provider (eg, LLM, AI/ML, graphs, time series, costs). and location-based analytics) directly from their application domain so they can fine-tune queries, functions, algorithms, and analytics for their use cases.
This is the best of both worlds! It is a convenient access for developers to a one-stop-shop for aggregated business data (historical, real-time, batch, streaming, structured, semi-structured, and unstructured data) that reduces the “cost” of context for developers to exploit efficiently data for smarter insights. solutions
This approach brings applications closer to data, while maintaining separation of concerns and avoiding tight coupling. In this approach, a knowledge graph, which represents relevant domain knowledge, provides an abstraction for composing data and behavior for real-time data-driven applications.
Dave Duggal is founder and CEO of EnterpriseWeb LLC. The company offers a no-code integration and automation platform that is based on a knowledge base of graphs. The graph provides shared domain semantics, metadata, and state information to support workflow discovery, composition, integration, orchestration, configuration, and automation. Duggal wrote this article for SiliconANGLE.
Image: Geralt/Pixabay
Your support vote is important to us and helps us keep the content FREE.
One Click Below supports our mission to provide free, insightful, and relevant content.
Join our community on YouTube
Join the community that includes over 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU