Drupal AI Initiative: From hours to minutes: Building an AI-powered PDF importer for local government for LocalGov Drupal

As Chicken fast-tracks development, we’ve been testing and refining prompts across a wide range of PDFs to prove what’s possible: We upload a PDF to the module, which will then kick-start the importing process in the background.  

Why this matters 

We can build as many import pipelines as needed, each with its own custom AI prompt. Useful for things like handling different types of PDF content or layout.

We’ve also cracked the pagination challenge. Early versions mirrored PDFs page-by-page, causing awkward breaks mid-paragraph or mid-list. Now the importer processes the entire document at once and, with the right AI prompt, inserts page breaks at logical user-friendly points such as topic changes or new sections.   
The result is the HTML representation of the PDF content, which is then saved into a Drupal Publication. We can then review and publish the Publication.  

Understanding the workflow 

The Web and Digital team at Southwark Council, along with our partners at Chicken, is building an AI-powered PDF importer for the LocalGov Drupal Publication Module. Together, we’re unlocking a faster, more accessible, and more collaborative future for publishing. 
If you’re interested in supporting or scaling this project, contact Angie Forson – Angie.Forson@southwark.gov.uk. Let’s change the game together.
I’m excited about the impact this product will have — not just for our users, but also in transforming how we design, build, and create content internally. We’re shaping a future where services start with HTML-first thinking.

How the technology works 

Manual PDF conversion can take hours – sometimes days. With our importer, it happens in minutes – often under one minute. Multiply that across thousands of PDFs, and the time savings are game-changing. 

  1. Extract: A PDF parser pulls content from the PDF. The default is the smalot PDF parser. 
  2. Transform: The parsed content is AI converts it to properly tagged HTML with logical pagination. Currently the module uses Claude Sonnet. 
  3. Save: Clean HTML pages ready to publish in Drupal 

Built for flexibility 

Furthermore, the pipeline uses a plugin architecture, where each step can be swapped out. Councils can use different extractors, AI models, or output to different Drupal content types to suit their needs. 
Together, we’re shaping a scalable, open-source tool that other councils can adopt, adapt, and improve.

Each import process is logged so that any errors can be reviewed and fixed. 
Angie Forson, Web and Digital Programme Lead 

Agile, user-centred delivery 

Each PDF goes through a three-step ETL process, called an “import pipeline” in the module: 

We’re delivering this project the way we deliver our best work – agile and user-centred by design.  
 
We have adapted our delivery to meet the challenges of innovation design. Our team has had to continuously refine requirements and acceptance criteria to ensure the tool meets real user needs and delivers meaningful outcomes.  
Giorgi Bujiashvili, Delivery Manager

What we’ve achieved so far 

Guest blog post by Angie Forson, Web and Digital Programme Lead, Southwark Council.

  • import images, URLs and linked text 
  • rebuild tables with correct HTML tags 
  • apply accurate heading hierarchies (H1, H2, H3) 
  • remove unwanted hard returns from PDF text

This project has been co-designed with content designers, developers, and the LocalGov Drupal community.

Built with (and for) the community 

Working on this AI product is an incredible experience — each day comes with new challenges, unexpected turns, and fresh opportunities to innovate. The pace of change made the whole process an absolute adrenaline rush.

This project is a great example of AI working alongside and empowering content creators, and Drupal as a platform supports this really well.
The AI PDF Importer isn’t just a tool – it’s a step change in accessible, open-source publishing for local government. Following this release, it will be open and shareable with the LocalGov Drupal community for other councils to adopt and iterate. 

A leap forward in accessible publishing 

Evelyn Francourt, User Experience Lead 
Farez Rahman, Drupal Developer 

Similar Posts