I am a data architect. I had a challenge this week. A small business wanted a client list, but not just any client list. They wanted an ethical client list. This would have to be a list of potential clients that would want to be contacted. So I went to work.
First things first. I had to know what their business did. They help small businesses with IT work with a major focus on SEO and website development. They also do some business analytics and IT consulting. Fair enough. Their target clients are small to medium sized businesses with the occasional larger business in the Milwaukee, Wisconsin area. Where do small businesses hide their contact information? Well the ones that want to be contacted are in small business associations and, well, the yellow pages. Time to source some data.
Small Business Milwaukee at smallbizmke.com has a directory and lucky me each entry is already divided up by category but not downloadable CSV file. That means a web scrapper with a double FOR loop. While I could write this down in actual Python, I’m going to keep this more open tow everyone.
for category in categories:
for company in category:
scrape company information
data_row = Company Name, Category, Company Website, Contact Information
add row to Client_List
(The list is easier to work with in a database. We can connect it to more tools.)
upload Client_List to database
Now we have a potential client list. Small Business Milwaukee provided nearly 250 potential clients. How many of those potential clients are really potential clients? This is Data Mining.

In case you weren’t paying attention, the name of this site is “I Explain Data.” This here is the point where yes we just collected the data. Now we are going to process the data. Processing the data never, and I mean never…okay fine…it is rarely the shortest process of data science. Pre-processing data in code is the best option. If you are loading all of your data directly into Power BI, and I know some large companies that do, then you are running the risk of your reports being incredibly slow. I mean agonizingly, dragging through data that could have been filtered or summarized slow. So let’s go.
We have a slew of websites. Let’s run all of them through a program that analyzes their website searching for missing alt texts, missing headings, missing meta descriptions, missing meta keywords, and the general placement of their website on the Google search boards. Then we are putting that into it’s own audit table in the relational database we started above.
Without being too technical, this has it’s own difficulties. I tried a few Python libraries. They didn’t really provide what I was looking for. I asked Google Gemini (The Google A.I. Assistant) for some help with ideas for the code. Google Gemini put together some scraping tools utilizing Beautiful Soup. I looked it over. Then I adjusted it to grab more. Then I adjusted it to run as a module will return results automatically. Another FOR LOOP and we run every single company website through the program and update the database.
for website in websites:
analyze website for SEO problems
data set = Company Name, Company Website, SEO results
update the database
Now we can go into the database and filter the data to find the best clients.
SELECT Company_Name, Contact_Information
FROM the_database_table
WHERE placement_value is greater than 10
AND meta_description is None
AND meta_keywords is None
AND image_alt_text is None
This is going to give us a list of clients with bad SEO scores that are easy to understand what affected them. Sometimes a website gets a good score because they are old and institutionalized. Sometimes it is a lack of competition. New businesses are going to have to get everything right if they are going to compete and match SEO. New websites need help.

Now I can add a program module that will email these clients automatically and I probably will. I could add an A.I. calling agent as well. I probably won’t. The sad fact is that though A.I. can do a lot, it can’t build relationships. Sending an email in advance isn’t a bad idea though. It gives the perspective client information and something to reference during a sales call.
That is what I do. I mine for appropriate data. I clean the data and process it. I load the data into a database. I use that data to automate tasks, inform people, or make decisions.