Our Experience with a Large Language Model for Data Transformation

Relevance and effectiveness of LLM

For the past three decades, we have been automatically converting human-readable engineering drawings and other documentation into structured, machine-readable data for our telecom and utility customers. This data can then be migrated into modern network information systems, which offer advanced data intelligence and automations. At its core, this work involves extracting entities such as nodes (e.g., manholes, cabinets, or devices), links (e.g., trenches, ducts, or fiber cables), and their annotations to build a high-quality physical inventory of the network.

Traditionally, extracting this data from complex engineering drawings, such as single-line diagrams, has relied on a combination of well-established approaches. Rules-based applications, tailored to customer-specific requirements, have been used to structure vector-native formats like AutoCAD or MS Visio. Meanwhile, Convolutional Neural Networks (CNNs) have been applied to extract physical network entities from raster images automatically.

With the recent emergence of Large Language Models (LLMs), a new approach has come under consideration. To evaluate their relevance and effectiveness for our core task, automating data extraction from engineering drawings, we initiated an exploratory project. We applied LLMs to a similar challenge where the input was not engineering drawings but text-based documents from the telecom domain. Instead of extracting physical network nodes and links, the goal was to identify and extract technologies, products, and the relationships between them from the text. By analogy, entities such as technologies and products can be considered nodes, while the relationships between them can be seen as links.

This initial project highlighted both their potential and their limitations, leading to the following four key observations:

#1: “I Do Not Know” Is Not an Option

The Lesson: We quickly learned a fundamental aspect of this technology: it is designed to always provide an answer, even when it lacks the correct one.

Our Experience: Knowing our source data was imperfect - as is almost always the case with human-produced documentation- we asked the LLM what happens when it is provided with messy data. The answer was direct and thought-provoking: "Garbage in, the most plausible-sounding garbage out." This phrase prompted us to investigate the LLM’s nature further before proceeding. It was revealing to learn that, in the LLM's own words, it can never say, "I do not know." This stands in stark contrast to one of the most valuable human traits: the honesty to admit when we do not have an answer.

#2: When the AI Checks Out Early

The Lesson: Our next insight was that an LLM will not necessarily be thorough, even when a task demands it. Unless given explicit and detailed instructions, it may only complete part of an assignment.

Our Experience: One of our tasks was to have the LLM extract specific relationships between predefined entities from a large set of documents. Almost by accident, we noticed that there were likely more relationships in the source text than the LLM had extracted. When we queried the LLM about this, it offered a surprisingly candid response, stating its approach was sometimes "a lazy one." We learned the underlying technical reason is that the model seeks the most efficient path to a plausible answer. If it can generate a satisfactory-looking result by analyzing only the initial parts of a document, it may stop there. This taught us that drafting extremely specific prompts is non-negotiable.

#3: What the LLM Is not Telling You (Until You Push)

The Lesson: We also learned that an LLM’s first answer is not always its best. It often provides a "good enough" solution and will only produce more effective ideas when specifically challenged.

Our Experience: We asked the LLM to help design a processing pipeline for converting our source data into a structured format for an information system. The model delivered what seemed like a complete plan. When we asked it to evaluate its own plan, it rated it as high-quality. As skeptics with a focus on data quality, we questioned this self-evaluation. We asked, "You have rated this workflow with a top grade. Still, is there anything missing or that could be significantly improved?" Only then did the model reveal several significant improvements, despite having given its own plan an "A+" just moments before. The superior solutions were available to the LLM all along, but it settled for "good enough" until prodded.

#4: Can You Trust a Confident Answer?

The Lesson: Perhaps our most critical discovery was that an LLM, despite being built on probability, is not designed to estimate the correctness of its own answers.

Our Experience: One case stood out. We asked the LLM for the best data format to migrate our information into a graph database. "JSON," it declared with certainty. We spent time building our process around this advice, only to discover during implementation that the CSV format was a much better choice for our specific use case. As we learned, the model's confidence was a performance. We then explicitly asked the LLM if it could provide a confidence score for its answers on a percentage scale (1%-100%). It responded that it cannot provide such a score, as it has no internal mechanism to validate its own statements. This brought the common disclaimer, "LLMs can make mistakes, so double-check," into sharp focus. It also presents a practical challenge: which answers should one verify? Verifying nothing is risky, while verifying everything is prohibitively time-consuming.

Conclusion: Trust in a Probabilistic World

This experience brings a core issue into focus: trust. We trust the calculator because it operates on well-defined mathematical rules; We do not need to double-check its results. But can we place the same trust in a probabilistic system?

This question brings to mind a principle we learned from engineers at a large telecommunications operator years ago. We were converting their physical network inventory from legacy documents into a modern information system. In cases where information for an attribute like - a duct's material or a fiber cable's type - was missing or conflicting, their instruction was clear: do not guess. If we were unsure, we were to ask them. If they also did not know, we were to leave the property value empty. Their reasoning was simple and profound: “It is much better to have no data than to have wrong data.” These engineers, working in troubleshooting and provisioning, understood the high operational cost of acting on incorrect information.

That principle, born from years of professional experience, clashes directly with the nature of an LLM, which is designed to always provide an answer. As I have come to understand, this technology was trained on human-produced material, so it is logical that its responses reflect our collective knowledge, but also our biases and errors.

Our intent is not to criticize this promising technology but to share our initial experience. The technology can seem deceptively simple to use, yet it presents subtle traps for the unwary, particularly those of us who are not experts in its nuances. Perhaps, like any complex product, LLMs should come with clear instructions to help users avoid these common pitfalls.

Our Experience with a Large Language Model for Data Transformation

Relevance and effectiveness of LLM

#1: “I Do Not Know” Is Not an Option

#2: When the AI Checks Out Early

#3: What the LLM Is not Telling You (Until You Push)

#4: Can You Trust a Confident Answer?

Conclusion: Trust in a Probabilistic World

Contact Us