Project Overview
This project demonstrates the engineering of a specialized web
application designed to
automate the extraction and structuring of data from PDF
documents concerning German reforms. It processes documents
written in either German or English, consistently delivering
structured insights in English. The platform integrates AI for
sophisticated text analysis, employs asynchronous processing for
performance, and utilizes modern DevOps practices for reliable
deployment, showcasing the ability to build robust, data-centric
solutions.
Key Technologies
-
Backend: ASP.NET Core, C#, Quartz.NET, SignalR,
FluentValidation
-
Frontend: React, TypeScript, React Query, React
Router
-
AI Integration: Google Gemini 2.0 Flash API
-
Database: PostgreSQL with Entity Framework Core
(EF Core)
-
Infrastructure & DevOps: Docker, Kubernetes
(K8s), GitHub Actions (CI/CD)
Backend Architecture and Implementation
The backend leverages ASP.NET Core combined with
Minimal APIs to create efficient, lightweight endpoints
suitable for a focused application. A pragmatic
Layered Architecture was implemented to maintain clear
separation between presentation, application logic, and
infrastructure concerns, balancing structure with development
velocity for this specific use case.
A core challenge was handling the potentially time-consuming AI
analysis. This was addressed using Quartz.NET to manage
asynchronous background jobs. Upon PDF upload, tasks are
queued, decoupling heavy processing from the user request and
ensuring UI responsiveness. Quartz.NET provides reliability for
job execution.
To perform its core analytical task, the platform integrates with
Google Gemini 2.0 Flash. To accurately extract
approximately 20 distinct data fields from the document text, I
developed and utilize a strategy involving multiple,
parallel-executed, specialized prompts. I also implemented a retry
mechanism that handles potential AI response formatting issues,
enhancing reliability. Once the analysis is complete,
SignalR delivers real-time feedback directly to the user's
browser, eliminating the need for manual page refreshes.
Server-side validation using FluentValidation ensures that
invatiants are kept intact after the document submission.
Persistent data storage is handled by PostgreSQL, accessed
efficiently through Entity Framework Core (EF Core).
Frontend User Interface
The UI is constructed using React and TypeScript, providing a
type-safe and component-based structure. A key element is the use
of React Query for managing server state. This
library significantly simplifies data fetching, caching, and
synchronization related to the backend processing status and
results. It handles loading/error states gracefully and
efficiently updates the UI when new data arrives via the SignalR
connection.
The user experience is designed for simplicity: upload a PDF,
receive immediate feedback that processing has started, and see
structured results appear dynamically. React Router manages
navigation. Client-side validation provides quick feedback,
complementing the backend rules. The interface ensures user
context (the document being processed) is maintained even if the
page is reloaded.
Deployment Automation and Infrastructure
Modern DevOps practices ensure automated and consistent
deployments. A CI/CD pipeline configured in GitHub Actions
automates the entire workflow: building the backend and frontend,
packaging them into Docker containers, and deploying these
containers to a Kubernetes (K8s) cluster. This
containerization ensures environmental consistency, while K8s
manages application scaling, deployment rollouts, and resilience.
Conclusion
This Reform Proposal Hub project demonstrates practical expertise
in building specialized, data-focused web applications. It
effectively integrates AI for complex data extraction,
utilizes asynchronous patterns (Quartz.NET) and
real-time communication (SignalR) for a responsive user
experience, and employs modern frontend state management
techniques (React Query). The implementation showcases strong
skills across the stack (ASP.NET Core, React, TypeScript,
PostgreSQL) and proficiency in contemporary DevOps workflows
(Docker, Kubernetes, CI/CD), representing the capability to
engineer and deploy robust, automated solutions.