<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Openpyxl on Servicios Rogeliowar</title><link>https://blog.serviciosrogeliowar.com/en/tags/openpyxl/</link><description>Recent content in Openpyxl on Servicios Rogeliowar</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Rogelio Guerra Riverón</copyright><lastBuildDate>Mon, 04 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.serviciosrogeliowar.com/en/tags/openpyxl/index.xml" rel="self" type="application/rss+xml"/><item><title>Smart Excel Import in Python: Flexible Column Detection and Heterogeneous Data Cleaning</title><link>https://blog.serviciosrogeliowar.com/en/posts/importacion-inteligente-de-excel-en-python-deteccion-flexible-de-columnas-y-limpieza-de-datos-heterogeneos/</link><pubDate>Mon, 04 May 2026 00:00:00 +0000</pubDate><guid>https://blog.serviciosrogeliowar.com/en/posts/importacion-inteligente-de-excel-en-python-deteccion-flexible-de-columnas-y-limpieza-de-datos-heterogeneos/</guid><description>&lt;p&gt;I&amp;rsquo;ve been working with Excel spreadsheets that arrive from different departments. Each one uses different column names, the data is dirty (phone numbers with notes, tax IDs mixed with text), and codes have inconsistent formats. Here I document the solution I built.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The real problem
 &lt;div id="the-real-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-real-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I was receiving Excel files where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Columns called &amp;ldquo;NIF&amp;rdquo; in one, &amp;ldquo;CIF&amp;rdquo; in another, &amp;ldquo;Identification&amp;rdquo; in the third&lt;/li&gt;
&lt;li&gt;Phone numbers like &amp;ldquo;123-456-7890 (ext 5)&amp;rdquo;, &amp;ldquo;9876543210 - unavailable&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Tax IDs with dashes, spaces and varied letters&lt;/li&gt;
&lt;li&gt;Product codes with inconsistent prefixes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I couldn&amp;rsquo;t expect everyone to format the same way. I needed a system that was flexible.&lt;/p&gt;</description></item></channel></rss>