As Chicken fast-tracks development, we’ve been testing and refining prompts across a wide range of PDFs to prove what’s possible: We upload a PDF to the module, which will then kick-start the importing process in the background.
Why this matters
We can build as many import pipelines as needed, each with its own custom AI prompt. Useful for things like handling different types of PDF content or layout.
We’ve also cracked the pagination challenge. Early versions mirrored PDFs page-by-page, causing awkward breaks mid-paragraph or mid-list. Now the importer processes the entire document at once and, with the right AI prompt, inserts page breaks at logical user-friendly points such as topic changes or new sections.
The result is the HTML representation of the PDF content, which is then saved into a Drupal Publication. We can then review and publish the Publication.
Understanding the workflow
The Web and Digital team at Southwark Council, along with our partners at Chicken, is building an AI-powered PDF importer for the LocalGov Drupal Publication Module. Together, we’re unlocking a faster, more accessible, and more collaborative future for publishing.
If you’re interested in supporting or scaling this project, contact Angie Forson – Angie.Forson@southwark.gov.uk. Let’s change the game together.
I’m excited about the impact this product will have — not just for our users, but also in transforming how we design, build, and create content internally. We’re shaping a future where services start with HTML-first thinking.
How the technology works
Manual PDF conversion can take hours – sometimes days. With our importer, it happens in minutes – often under one minute. Multiply that across thousands of PDFs, and the time savings are game-changing.
- Extract: A PDF parser pulls content from the PDF. The default is the smalot PDF parser.
- Transform: The parsed content is AI converts it to properly tagged HTML with logical pagination. Currently the module uses Claude Sonnet.
- Save: Clean HTML pages ready to publish in Drupal
Built for flexibility
Furthermore, the pipeline uses a plugin architecture, where each step can be swapped out. Councils can use different extractors, AI models, or output to different Drupal content types to suit their needs.
Together, we’re shaping a scalable, open-source tool that other councils can adopt, adapt, and improve.
Each import process is logged so that any errors can be reviewed and fixed.
Angie Forson, Web and Digital Programme Lead
Agile, user-centred delivery
Each PDF goes through a three-step ETL process, called an “import pipeline” in the module:
We’re delivering this project the way we deliver our best work – agile and user-centred by design.
We have adapted our delivery to meet the challenges of innovation design. Our team has had to continuously refine requirements and acceptance criteria to ensure the tool meets real user needs and delivers meaningful outcomes.
Giorgi Bujiashvili, Delivery Manager
What we’ve achieved so far
Guest blog post by Angie Forson, Web and Digital Programme Lead, Southwark Council.
- import images, URLs and linked text
- rebuild tables with correct HTML tags
- apply accurate heading hierarchies (H1, H2, H3)
- remove unwanted hard returns from PDF text
This project has been co-designed with content designers, developers, and the LocalGov Drupal community.
Built with (and for) the community
Working on this AI product is an incredible experience — each day comes with new challenges, unexpected turns, and fresh opportunities to innovate. The pace of change made the whole process an absolute adrenaline rush.
This project is a great example of AI working alongside and empowering content creators, and Drupal as a platform supports this really well.
The AI PDF Importer isn’t just a tool – it’s a step change in accessible, open-source publishing for local government. Following this release, it will be open and shareable with the LocalGov Drupal community for other councils to adopt and iterate.
A leap forward in accessible publishing
Evelyn Francourt, User Experience Lead
Farez Rahman, Drupal Developer