sigmaleph:

sigmaleph:

  1. there’s a file called whatever_YYYYMMDD.csv with data generated every day
  2. every day, we process the day’s data and also reprocess a few days back
  3. the way this is done involves copying over the various whatever_YYYYMMDD.csv files for the last few days from one folder to another
  4. the way this is implemented is via a script that takes the earliest date and latest date we’re processing today, converts them to a number in YYYYMMDD format, and loops between them
  5. not a date. a number.
  6. if one of those numbers is 20240612 and the other is 20240615, this is fine. it just hits every number in between them, which matches up to every existing whatever_YYYYMMDD.csv file in between those dates
  7. if one of those numbers is 20240630 and the other is 20240703, this is slightly less fine. It’s going to check a bunch of numbers that do not correspond to any dates, like 20240646. so it runs dozens more times than necessary, trying to move files that don’t exist. fortunately computers are pretty fast and this adds very little time.
  8. if one of those numbers is 20241230 and the other is 20250102, then instead of checking for seventy or so nonexistent files it checks for some nine thousand or so. each one of those is pretty fast. less than a second. but not, like, much less than a second.
  9. several hours worth of delay in a process that should take seconds, which has been happening longer than i’ve been working here, every year, for a few days around new year’s. which it seems nobody noticed for a number of reasons mostly adding up to ‘several hours worth of delay are invisible unless you’re specifically looking for it and nobody had a reason to look for it until now’

while “January 2nd, 2025” might sound like some far-off date in the distant future, it was in fact last Thursday