35th ESPU Joint Meeting in Vienna, Austria

S19: ARTIFICIAL INTELLIGENCE

Parallel Meeting on Thursday 4, September 2025, 14:00 - 14:50


14:00 - 14:03
S19-1 (OP)

CAN DEEP LEARNING-BASED SEGMENTATION AND CLASSIFICATION IMPROVE THE DETECTION OF RENAL CORTICAL ABNORMALITIES?

Abdus SALAM 1, Tariq Osman ABBAS 2, Mansura NAZNINE 1 and Muhammad CHOWDHURY 3
1) Rajshahi University of Engineering & Technology, Computer Science and Engineering, Rajshahi, BANGLADESH - 2) Sidra Medicine, Urology, Doha, QATAR - 3) Qatar University,, Department of Electrical Engineering,, Doha, QATAR

PURPOSE

To develop a fully automated deep learning pipeline to enhance the detection and classification of renal cortical abnormalities in pediatric patients. Accurate identification of kidney scarring is critical for diagnosing and managing renal conditions, yet manual analysis of nuclear renal imaging is prone to significant inter-observer variability. By leveraging advanced deep learning techniques, this study aims to provide a more precise, efficient, and reliable method for analyzing renal scans, reducing human error and improving diagnostic accuracy.

MATERIAL AND METHODS

We developed a model using a dataset of 613 renal nuclear images, including 193 from patients diagnosed with kidney scarring. A novel DenseNet121_Self-ONN_FPN model was created, integrating DenseNet121 with Self-Organizing Neural Network layers in a Feature Pyramid Network (FPN) for enhanced segmentation. A modified DenseNet205 architecture was employed for classification. Preprocessing techniques such as Contrast Limited Adaptive Histogram Equalization (CLAHE) and Gamma correction were applied. Performance was evaluated using Accuracy, Precision, Recall, F1-score, Intersection over Union (IoU), Dice Similarity Coefficient (DSC), False Negative Rate (FNR), and False Positive Rate (FPR). ScoreCAM was used for explainability.

RESULTS

The segmentation model achieved an Accuracy of 98.74%, IoU of 86.47%, DSC of 92.74%, precision of 92.61%, recall of 92.88%, F1-score of 99.29%, FNR of 7.12%, and FPR of 0.71%. The classification model demonstrated an Accuracy of 96.91%, precision of 96.98%, sensitivity of 96.91%, F1-score of 96.86%, and specificity of 95.87%, surpassing state-of-the-art methods.

CONCLUSIONS

This fully automated deep learning pipeline outperforms manual analysis in detecting and classifying renal cortical anomalies, offering a reliable, efficient, and transparent solution. It sets a new standard for clinical renal imaging and improves diagnostic precision in pediatric urology.


14:03 - 14:06
S19-2 (OP)

INTEGRATION OF AN AUTOMATED ARTIFICIAL INTELLIGENCE MODEL FOR MEASURING HYDRONEPHROSIS IN CHILDREN WITH URETEROPELVIC JUNCTION OBSTRUCTION

Brian CHUN 1, Hasem ZAMANIAN 2, Christine DO 1, Joan KO 1, Stephan ERBERICH 2 and Evalynn VASQUEZ 1
1) Children's Hospital Los Angeles, Urology, Los Angeles, USA - 2) Children's Hospital Los Angeles, Radiology, Los Angeles, USA

PURPOSE

The SFU grading system is commonly used in stratifying ureteropelvic junction (UPJ) obstruction but can be prone to inter-rater variability. Hydronephrosis index (HI)—calculated as a ratio of the area of renal parenchyma (total kidney area minus the area of dilated pelvis and calices) to the total kidney area—is a quantitative measure that can trend hydronephrosis over serial studies. However, calculating HI requires manual image segmentation which is labor-intensive. Thus, our first aim was to pilot an automated artificial intelligence workflow within our Picture Archiving and Communication System (PACS) that segments and measures hydronephrotic areas on renal ultrasound imaging.

MATERIAL AND METHODS

We included patients up to 5 years of age with confirmed diagnosis of UPJ obstruction. Raw ultrasound images were converted to grayscale and noise reduced using median filtering. Segmentation masks were generated to identify tissue structures and refined through filtering based on cluster size, intensity, spatial characteristics, center of mass, and proximity to image edges. The final output included visualizations of the segmented regions and overlays onto original images. Model segmentation was cross-validated to manual segmentation of the same renal unit.

RESULTS

Using a preliminary model training set of 40 renal ultrasound studies, we developed a pilot artificial intelligence workflow within our institution’s PACS infrastructure that automatically segments and measures hydronephrotic regions and outputs into the study report. This is the first step in developing an automated tool for calculating hydronephrosis index.

CONCLUSIONS

We show the feasibility of integrating artificial intelligence image processing models within an institution’s PACS infrastructure. Our next step is to expand this model to measure hydronephrosis index.


14:06 - 14:09
S19-3 (OP)

A NOVEL ARTIFICIAL INTELLIGENCE MODEL TO IDENTIFY PATIENTS WITH UNILATERAL HYDRONEPHROSIS WHO REQUIRE PYELOPLASTY FOR URETEROPELVİC JUNCTION OBSTRUCTION

Tayfun OKTAR 1, İsmail SELVİ 2, Yusuf YEŞİL 3, M. İrfan DÖNMEZ 1, Orhan ZİYLAN 2, Şule SEÇKİN 4 and Canan KÜÇÜKGERGİN 4
1) Istanbul University, Istanbul Faculty of Medicine, Department of Urology, Division of Pediatric Urology, İstanbul, TÜRKIYE - 2) İstanbul University, İstanbul Faculty of Medicine, Department of Urology, Division of Pediatric Urology, Istanbul, TÜRKIYE - 3) Istanbul University, Istanbul Faculty of Medicine, Department of Biochemistry, İstanbul, TÜRKIYE - 4) İstanbul University, İstanbul Faculty of Medicine, Department of Biochemistry, İstanbul, TÜRKIYE

PURPOSE

Differentiating critical hydronephrosis requiring surgical intervention due to ureteropelvic junction obstruction (UPJO) remains a clinical challenge in pediatric urology. On the other hand, clinical studies of various urinary biomarkers that predict the differentiation of obstructive hydronephrosis caused by UPJO from non-obstructive dilatation have increased in recent years. Meanwhile, artificial intelligence (AI) and machine learning are becoming increasingly popular and reliable in all areas of medicine. Thus, we aimed to investigate the potential use of clinical parameters and various urinary biomarkers in predicting the need for surgery in children with isolated unilateral hydronephrosis by creating an AI model.

MATERIAL AND METHODS

Thirty-nine children with UPJO who underwent pyeloplasty, 40 patients with non-obstructive dilatation (NOD) and 39 healthy children (control group) were included in this case-control study to perform a retrospective analysis of prospectively collected data. Urinary IP-10, KIM-1, CA19-9, NGAL and MCP-1 levels were analysed by ELISA. The patients' demographic and clinical data [anteroposterior diameter (APD) on postnatal ultrasonography (US), renal parenchymal thickness on US, and split renal function on MAG-3] were also recorded. The XGBoost classification algorithm was used to create a prediction model to identify patients who would require surgery.

RESULTS

Antenatal hydronephrosis was present in 82.2% of the total of 79 children (58 boys, 73.4%) included in this study due to unilateral hydronephrosis, and all of them had a postnatal APD ≥15 mm. All five urinary biomarkers were significantly higher in the obstruction group than in the other groups (p<0.001). A prediction model was developed for the identification of the need for pyeloplasty in children with hydronephrosis with an overall accuracy of 94.44%. The model performance as calculated by ROC AUC was 0.98. According to the model, the most important factors in predicting the need for surgery were renal parenchymal thickness (<5.25 mm), split renal function on MAG-3 (<37.5%), APD on postnatal US (>25.2 mm), urinary IP-10 level (>101.1 pg/mg Cr), and SFU hydronephrosis grade.

CONCLUSIONS

Based on our preliminary results, the AI model appears to have high reliability and accuracy in identifying patients who need surgery for UPJO.


14:09 - 14:21
Discussion
 

14:21 - 14:24
S19-4 (OP)

AI-BASED ASSESSMENT OF ANATOMIC SUITABILITY FOR NEWBORN CLAMP CIRCUMCISION: A PROOF-OF-CONCEPT STUDY

Luca A MORGANTINI 1, Laura CRUCIANI 2, Alexa WEINBERG 1, James T RAGUE 1, Elena DE MOMI 3 and Emilie K JOHNSON 1
1) Lurie Children's Hospital of Chicago, Urology, Chicago, USA - 2) Politecnico di Milano, NEARLab, Milano, ITALY - 3) Politecnico di Milano, Electronic Information and Bioengineering Department, Milano, ITALY

PURPOSE

Approximately 50% of newborn boys in the U.S. undergo circumcision, frequently by non-urologists. Yet, a standardized method to guide anatomic suitability assessment for clamp circumcision is lacking. Circumcision on unsuitable anatomy can result in complications; unnecessary specialist referrals of boys with suitable anatomy increase healthcare burdens. We aim to evaluate the feasibility of an AI-based tool to assess anatomic suitability for clamp circumcision.

MATERIAL AND METHODS

We developed a database of standardized newborn penile anatomy images (dorsal, ventral, top, side views), taken with parental consent. A pediatric urologist categorized images as suitable/unsuitable for clamp circumcision. A Convolutional Neural Network model, YOLACT++, was trained. Performance was assessed: accuracy, sensitivity, specificity, area under the curve (AUC), inter-rater agreement with the pediatric urologist.

RESULTS

82 images from 20 patients were analyzed. When analyzing each image singularly, model accuracy/sensitivity/specificity/AUC was 71/81/61/81%, respectively. When analyzing all images of a patient together, sensitivity increased to 100%, although specificity decreased to 50%. While the model demonstrated robust sensitivity (identifying anatomic unsuitability), it tended to overclassify cases as unsuitable (lower precision). The AUC indicates good model performance in distinguishing suitable/unsuitable anatomy. Agreement with the expert yielded a Cohen's kappa of 0.82 (almost perfect agreement).

CONCLUSIONS

AI-assisted assessment of penile anatomy shows promise for clinical implementation. High model sensitivity suggests utility in identifying unsuitable cases, though lower precision highlights the need for refinement to reduce unnecessary referrals. Future work will focus on optimizing model precision, and integration into mobile applications for clinical adoption.


14:24 - 14:27
S19-5 (OP)

PRECISION IN CHORDEE ASSESSMENT: AUTONOMOUS MEASUREMENT OF PENILE CURVATURE EVALUATION THROUGH MACHINE LEARNING

Irfan WAHYUDI 1, Chandra Prasetyo UTOMO 2, Samsuridjal DJAUZI 3, Muhamad FATHURAHMAN 2, Gerhard Reinaldi SITUMORANG 1, Arry RODJANI 1, Putu Angga Risky RAHARJA 1, Kevin YONATHAN 1 and Marco RADITYA 1
1) Faculty of Medicine, Universitas Indonesia - Cipto Mangunkusumo Hospital, Department of Urology, Jakarta Pusat, INDONESIA - 2) Faculty of Information Technology, YARSI University, Jakarta, Indonesia, YARSI E-Health Research Center, Jakarta Pusat, INDONESIA - 3) Faculty of Medicine, Universitas Indonesia - Cipto Mangunkusumo Hospital, Department of Internal Medicine, Jakarta Pusat, INDONESIA

PURPOSE

Chordee is characterized by abnormal penile curvature. Degree of curvature is essential to determine the appropriate surgical approach. Currently, intraoperative evaluation is conducted through artificial erection testing on a degloved penis. However, traditional methods such as visual estimation and goniometer measurements are prone to significant interobserver variability, potentially leading to inconsistent clinical decisions. This study aims to develop an AI-based tool for objective and accurate penile curvature quantification using lateral-view digital images.

MATERIAL AND METHODS

Degloved penile images with artificial erection testing were collected and preprocessed, excluding duplicates. The dataset was divided into training (70%), validation (15%), and testing (15%) subsets. The AI model employed a two-step pipeline: (1) penile segmentation to identify penile structures and (2) chordee angle regression to measure curvature from the previously segmented area. Ethical protocols and informed consent were strictly followed to ensure data privacy.

RESULTS

A total of 76 penile images with curvature ranging from 0 to 88 degrees were analyzed. The AI model achieved a segmentation accuracy of 96.5%, with an Intersection over Union (IoU) score of 91.3% and a Dice Similarity Coefficient (DSC) of 96.2%. The curvature angle estimation achieved a mean absolute error (MAE) of 10.5 degrees compared to goniometer measurements.

CONCLUSIONS

This study evaluates an AI-based automated approach for penile curvature measurement in chordee patients, demonstrating its accuracy and reproducibility. By reducing subjectivity, this method enhances surgical decision-making. However, further validation with larger datasets is required to confirm its clinical applicability.


14:27 - 14:35
Discussion
 

14:35 - 14:38
S19-6 (OP)

ARTIFICIAL INTELLIGENCE CHATBOTS VS. YPUC PEDIATRIC UROLOGISTS: PERFORMANCE ON A CAMPBELL WALSH UROLOGY HYPOSPADIOLOGY QUESTIONNAIRE

Sebastien FARAJ 1, Pauline CLERMIDI 1, Sabine IRTAN 1 and François-Xavier MADEC 2
1) Sorbonne University, AP-HP, Armand Trousseau Hospital, Department of Pediatric Surgery and Urology, Paris, FRANCE - 2) Foch Hospital, Department of Urology, Suresnes, FRANCE

PURPOSE

The increasing role of artificial intelligence (AI) in medical decision-making raises concerns about its reliability in specialized fields. This study aimed to compare AI-generated answers with those of pediatric urologists from the Young Pediatric Urologist Committee (YPUC) from the European Society for Paediatric Urology (ESPU) on a structured questionnaire derived from the 13th edition of Campbell Walsh Urology, covering all aspects of hypospadiology (theoretical knowledge, evaluation and management).

MATERIAL AND METHODS

A 31-question multiple-choice questionnaire was distributed to 77 members of the YPUC, of whom 23 (29.9%) responded on a voluntary basis. The questionnaire was also answered by 5 AI models (ChatGPT 3.5, ChatGPT 4o, Gemini, Copilot and Doubao). Subgroup analysis included FEAPU-certified (Fellow of the European Academy of Paediatric Urology) participants (n=15/23), surgeons over 35 years old (n=16/23), and self-declared hypospadiology experts (n=12/23). Responses were compared to assess accuracy.

RESULTS

Human participants had a mean score of 61.2% [45.2% to 77.4%], while AI models reached an average score of 63.2% [48.4% to 71.0%]. The highest-scoring human achieved 77.4%, outperforming the best AI score of 71% (Copilot). The FEAPU-certified subgroup performed best (65,6%, 54.8%-77.4%) comparing to the other subgroups (surgeons over 35 years old : 63,3% ; self-declared hypospadiology experts : 64,5%).

CONCLUSIONS

AI systems demonstrated performance comparable to human experts but did not surpass top-tier professionals. While AI provided robustness in answering domain-specific questionnaires, human expertise remains mandatory for interpreting nuances and, most importantly, bridging the gap between theoretical knowledge and practical application.


14:38 - 14:41
S19-7 (OP)

ANALYZING THE READABILITY OF WEB-BASED PATIENT EDUCATIONAL MATERIALS FOR PEDIATRIC UROLOGIC CONDITIONS

Geneva PANTOJA 1, Alisha PAZ 1, Nora BROADWELL 2, Albert LEE 2 and Andrea BALTHAZAR 2
1) Baylor College of Medicine, Urology, Houston, USA - 2) Texas Children's Hospital, Pediatric Urology, Houston, USA

PURPOSE

The average American adult reads at an eighth-grade level. The National Institutes of Health (NIH) and American Medical Association (AMA) recommend patient materials be written at a sixth-grade level to maximize understanding. The Flesch-Kincaid (FK) formula calculates a text's reading level based on syllables per word and words per sentence. Health literacy is critical, affecting health outcomes. We aim to evaluate the readability of online information for four pediatric urologic conditions.

MATERIAL AND METHODS

We analyzed 80 online patient materials for four common pediatric urologic conditions using the FK formula. Conditions included "undescended testis/testicle (UDT)," "vesicoureteral reflux (VUR)," "hydronephrosis," and "hypospadias." A Google search was performed for each condition, and the top 10 most visited websites were assessed for readability. A second analysis evaluated 10 of the top pediatric urologic programs ranked by U.S. News & World Report.

RESULTS

Using the FK formula, the overall grade reading level (GRL) of the 80 websites was 10.13. The average reading level of the top 10 websites and top 10 programs was 10.1 and 10.2, respectively. The mean GRL for the top 10 most visited websites were: UDT 10.11±2.66, VUR 9.0±1.53, hydronephrosis 9.97±2.12, and hypospadias 11.21±2.35. For the top 10 program websites, the mean GRL were: UDT 9.33±2.69, VUR 9.3±2.05, hydronephrosis 10.74±1.84, and hypospadias 11.45±2.92. Hypospadias websites had the highest GRL at 11.33±2.01, while VUR had the lowest at 9.15±1.77. 0% of the hypospadias material on the top 10 program websites was at an appropriate level. 73.8% of materials were written above the eighth-grade reading level.

CONCLUSIONS

The most accessed online materials for common pediatric urologic conditions exceed the limits set by the NIH and AMA, surpassing the reading level of most U.S. adults. This highlights the need to improve the readability of patient materials.


14:41 - 14:50
Discussion