Techo-nerds, can you help? A student of mine downloaded about thousand spreadsheets from a public site using the “Excel file” option that saved themselves as .xls files and will open in Excel but are REALLY HTML files and, as such, cannot be imported or even parsed by Stata. Any ideas for automating the file translation? We estimate that opening each file in Excel and saving it as an Excel file at 30 seconds each will take 35 hours. Hoping for a programming solution.
My university’s class rosters ALSO download with .xls extensions but are really html files. Hmmm.
Edit: I think I can crack this. I’ve learned that I can read each file into Stata as lines of text this way:
import delimited “census_Tract101.xls”, delimiter(“^”) varnames(nonames) clear
from there, I’m pretty sure I can fairly easily extract the information needed with string functions, as all the files have identical formats. This may be more elegantly done in R or a programming langauge, but I think I can do it in Stata faster. We’ll see. Thanks for the fast responses.
Edit #2: That did it. My Stata-fu is strong and once I could get each file into Stata as a long string per row, I was good to go as regards writing the code to parse each file inside loops and combine all them into one big file. If you happen to want clues on how to do this kind of arcane task, let me know.