Working with Swiss Archive Documents: Human + AI in the Trenches

When I first tried using ChatGPT for historical research, I called it a complete waste of time. It invented sources, botched facts, and turned one MI5 officer into a Frankenstein composite of random facts. I closed the tabs and went back to sifting through archives.

Two years later, the Swiss Archives dropped a mountain of documents in my lap—German, French, Italian, and completely impenetrable. I needed help, and the only “assistant” available was the one I’d already sworn off.

The Locked Treasure Trove

It really was a treasure trove: hundreds of WWII-era pages written in a triage of languages. Some were typewritten, some handwritten. Some were faded, some were clear. I had worked with such documents before but the sheer scale of this project stymied me.

My old method was labourious. I would transcribe each foreign-language document by hand—reading it while simultaneously typing it into Word. I’d type each document by hand into Word, run it through Google Translate or DeepL Translate in chunks, and paste the results under the original text. It was an extremely time-consuming process.

But I gave it a go with the Swiss Archives, and quickly realized, it was an impossible project. It would take me months, maybe years, to finish even one.On top of that, some of the documents were handwritten and they were beyond me. I reluctantly shelved the treasure trove under “Far in the Future” and walked away. It felt Herculean—so much history locked behind a wall of language I couldn’t climb.

By then, I’d already been using ChatGPT for smaller tasks—drafting social media captions, summarizing notes, wrangling research outlines—and it had gotten noticeably sharper. The new models claimed they could even process PDFs into readable text. Was it worth another shot? Why not? I purchased a paid version and got to work.

The False Start

I explained the scale of the project to ChatGPT and it assured me that it could process a PDF into readable text, no problem. I uploaded the first document and waited with bated breath. It quickly spat out a transcribed document. It looked perfect. It had dealt with typewritten Italian and cursive French without even a hiccup. I was giddy with excitement. I had finally cracked the code to accessing this treasure trove!

Or not.

My delight turned to dismay when I compared the transcript with the original PDF document. They weren’t even close. The first page or two might be fine, but ChatGPT happily hallucinated the remainder. Perhaps the PDF had been too large? I tried a smaller one. Same problem. I became frustrated. I cursed and swore at ChatGPT for promising the stars and delivering mud. It apologized, but nothing changed. I wrestled with it and demanded a straight answer. Why was this not working?

It eventually admitted that it was struggling with the OCR (Optical Character Recognition) embedded within the PDF. Which actually made sense—it’s one thing to do OCR on a clean page of typewritten text. It’s quite another to do it with faded typewritten copies that are blotted with official stamps and complicated by letterhead, document codes, and marginal notes. It was time to change the input.

Guardrails and Growing Pains

We batted ideas back and forth and decided to work with JPG images of each page. It could “read” a document image far better than it could “read” a muddled OCR PDF. I found a way to convert a PDF to JPG—not as simple as it sounds—and uploaded the first image. It worked. ChatGPT was able to read the text and transcribe it with 98-99% accuracy. I then asked it to translate the transcribed Italian/German/French text into English. Compared with everything that had gone before, that was the easy part.

After a brief happy dance, I uploaded 10 document images and awaited the results. They were not good. The first page or two would be fine and then ChatGPT would once again slide into efficiency mode and extrapolate (hallucinate) the remaining pages. More cursing.

The solution, as it turned out, was simple: I uploaded one image, ChatGPT transcribed and translated it, I copied the results into Word. I uploaded the next document image… rinse and repeat.

But even then… it would stray. It would guess at a faded word. It would embellish or invent content. It was trying to be helpful. It was not.

I needed to create pretty firm guardrails, and reiterate them at regular intervals. “The transcription needs to be faithful to the original text. Treat it like a legal document. Don’t invent. Don’t guess. Don’t embellish. Don’t leave things out. Any questions? Ask me!”

Slowly, we hammered out a method for working together. Every time we started a new session, I had to restate the guardrails. I also needed to check each transcription against the original document image—it was the only way to know if ChatGPT was toeing the line. I’d already learned that while ChatGPT might promise the world, it can’t always deliver. And even if it delivers once, there’s no guarantee it will deliver twice. It’s a tool that dulls quickly and needs constant sharpening and tending.

Those checks weren’t optional. A single small slip could change meaning entirely. In one case, ChatGPT transcribed a newspaper article that said a group of gentlemen gathered to play “Stat.” I thought it an odd game, but let it ride. Later, while reviewing the translation, ChatGPT noted that “Stat” wasn’t a real game. Back I went to the original scan, and sure enough—it wasn’t Stat, it was Skat. The Fraktur k and t can fool even a human eye, let alone a machine. That little correction summed up the whole process: AI can help, but it still needs watching.

Side-by-side comparison of the lowercase Fraktur letters “t” and “k,” showing how their similar shapes can be easily confused when reading old German documents or blurry digital scans. — German Fraktur “t” and “k” are easy to confuse, especially if the digital scan is even the tiniest bit fuzzy.

Even with all its flaws, it worked. The process was still exceedingly labourious. I purchased a version of ABBYY FineReader to convert PDFs to JPGs en masse. I had to organize the document images precisely and label them. I had to upload them one at a time, copy each transcription and translation into Word, and do a quick review while the next image was being uploaded and transcribed/translated. It took months… but the result was worth it.

A 19th-century handwritten page with brown ink showing heavy bleed-through from the reverse side, making the text blurred and overlapping. The number “200/97 25” is written in blue pencil at the top. — Example of severe ink bleed-through in a faded 19th-century handwritten document—one of the pages impossible for OCR or AI transcription to interpret accurately.

Some handwritten documents, though, were absolutely impossible. ChatGPT tried, but it was clearly guessing. We experimented with different approaches, but nothing worked. These pages needed a human eye that could read late-1800s French cursive with bleed-through. That person was not me… and I left them.

For the first time, though, the archive had started to speak.

The Payoff

For all its caveats, it was an amazing thing—to read the content of these once-locked documents. It felt miraculous in a quiet, methodical way, even if I still needed to keep a hand on the steering wheel. ChatGPT is powerful but not perfect. Every now and then it slipped: 1943 became 1945, or a phrase bent out of shape in translation. But even with its quirks, the pages finally opened, and the stories inside began to breathe again.

What had once been a sealed chest was suddenly overflowing. I was surrounded by treasures—letters, reports, forms, fragments—all glinting with detail, all demanding attention. It was exhilarating and overwhelming at once. Each document was a shard of a much larger story, and it took months of sifting, sorting, and cross-referencing to begin shaping them into narrative form.

In the end, that discipline paid off. Those once-silent files in the Swiss archives have fleshed out the story of the Sommerfeld family—their lives scattered by war, pieced back together through the paper trail they left behind. I was especially drawn to Yvonne Sommerfeld and her tangential connection to Josef Jakobs and Werner Goldstein. The next post returns to Josef and Werner, tracing their story through the Swiss newspaper archives. After that, the following four posts will dive deep into the Sommerfeld family and what the Swiss documents revealed.

Header Image – generated by Gemini AI

	Giselle on Physician to Rudolf Hess and J…
	Chris Salmon on Physician to Rudolf Hess and J…
	Giselle on Dealers in Black Market Passpo…
	Giselle on Dealers in Black Market Passpo…
	Fabo on Dealers in Black Market Passpo…

The Locked Treasure Trove

The False Start

Guardrails and Growing Pains

The Payoff

Related

2 thoughts on “Working with Swiss Archive Documents: Human + AI in the Trenches”

Leave a Comment Cancel Reply

The Locked Treasure Trove

The False Start

Guardrails and Growing Pains

The Payoff

Share this Post:

Related

2 thoughts on “Working with Swiss Archive Documents: Human + AI in the Trenches”

Leave a Comment Cancel Reply