deffetch_html(self, url):
domain = urllib.parse.urlparse(url).netloc
if domain notinself.robot_parsers:
rp = urllib.robotparser.RobotFileParser()
rp.set_url(f'https://{domain}/robots.txt')
rp.read()
self.robot_parsers[domain] = rp
rp = self.robot_parsers[domain]
ifnot rp.can_fetch(self.user_agent, url):
print(f"Fetching not allowed by robots.txt: {url}")
returnNoneifself.last_fetch_time:
time_since_last_fetch = time.time() - self.last_fetch_time
if time_since_last_fetch < self.delay:
time.sleep(self.delay - time_since_last_fetch)
headers = {'User-Agent': self.user_agent}
response = requests.get(url, headers=headers)
self.last_fetch_time = time.time()
if response.status_code == 200:
return response.text
else:
print(f"Failed to fetch {url}: {response.status_code}")
returnNone
Randomly selected something from a project I’m working on that’s simple and just works. Show me less than 300 lines of .NET to do the same, and I would be somewhat surprised.
Well, for starters, it’s a greentext, so who knows how genuine it is, right? Most of the points listed are either subjective or citation needed fodder. However, maybe there’s one fact I can bring to the table:
ASP.NET’s benchmark performance ranked 16th in Round 22 of the TechEmpower Web Framework Benchmarks, ranking below solutions written in Rust, C, Java, and JS. C# has advantages over each of those languages and frameworks in exchange for the relative loss in performance. Rust and C are much lower level. Java’s syntax is generally considered to lag behind C#'s at this point. JS’s disadvantages could fill a whole post of their own. C# and .NET have their own disadvantages (such as relatively fewer libraries available) as you’ve pointed out in this thread and another in this post, but when you take into consideration the relatively high performance while being a strongly-typed higher-level language with plenty of nice QoL features, you might be able to see why it could be attractive to a specific slice of professionals.
That’s exactly my point though - part of my assertion of a big weakness in C# would be that more mainstream languages (python or node) have massive libraries you can draw on with existing code for simple stuff like parsing robots.txt, whereas C# has one that probably seems pretty luxurious if you’re comparing it to nothing, but is well short of what OSS programmers are accustomed to.
So yeah it’s not a purely fair language-design comparison but it’s a perfectly fair “how easy is it to get stuff done in this language” comparison. And then at a certain point it starts to become not just a convenience but a whole new area of computation (something like numpy or pytorch) that’s simply impossible in C# without a whole research project devoted to it to implement. That said, I’m sure there are areas (esp in heavily business-oriented fields like airline or medical backend or whatnot) where it’s the other way around, of course, and you have C#-specific stuff for that domain that would be real difficult to replicate in some other environment. I’m not trying to say that side doesn’t exist, just saying what’s generally applicable to my experience.
So I’m not like being critical of C# because of language features (it seems perfectly fine and functional; I get what the people are saying who say they get work done every day in it and it seems fine.) But also, I think it’s relevant that it’s missing some big advantages if you’re trying to go beyond the “it doesn’t actively punish you for using it” stage.
aspnet core is the library you want
Simple and just works
def fetch_html(self, url): domain = urllib.parse.urlparse(url).netloc if domain not in self.robot_parsers: rp = urllib.robotparser.RobotFileParser() rp.set_url(f'https://{domain}/robots.txt') rp.read() self.robot_parsers[domain] = rp rp = self.robot_parsers[domain] if not rp.can_fetch(self.user_agent, url): print(f"Fetching not allowed by robots.txt: {url}") return None if self.last_fetch_time: time_since_last_fetch = time.time() - self.last_fetch_time if time_since_last_fetch < self.delay: time.sleep(self.delay - time_since_last_fetch) headers = {'User-Agent': self.user_agent} response = requests.get(url, headers=headers) self.last_fetch_time = time.time() if response.status_code == 200: return response.text else: print(f"Failed to fetch {url}: {response.status_code}") return None
Randomly selected something from a project I’m working on that’s simple and just works. Show me less than 300 lines of .NET to do the same, and I would be somewhat surprised.
Well with your mild surprise on the line the stakes couldn’t be higher
I know C# works, it pays my bills so that’s good enough for me.
Here’s a good place to see what it can do:
https://learn.microsoft.com/en-us/aspnet/core/tutorials/first-web-api?view=aspnetcore-8.0&tabs=visual-studio
I am familiar.
Not saying don’t pay your bills with it; that part sounds great. I was just confused by this guy’s enthusiasm for it, that’s all.
Well, for starters, it’s a greentext, so who knows how genuine it is, right? Most of the points listed are either subjective or citation needed fodder. However, maybe there’s one fact I can bring to the table:
ASP.NET’s benchmark performance ranked 16th in Round 22 of the TechEmpower Web Framework Benchmarks, ranking below solutions written in Rust, C, Java, and JS. C# has advantages over each of those languages and frameworks in exchange for the relative loss in performance. Rust and C are much lower level. Java’s syntax is generally considered to lag behind C#'s at this point. JS’s disadvantages could fill a whole post of their own. C# and .NET have their own disadvantages (such as relatively fewer libraries available) as you’ve pointed out in this thread and another in this post, but when you take into consideration the relatively high performance while being a strongly-typed higher-level language with plenty of nice QoL features, you might be able to see why it could be attractive to a specific slice of professionals.
You left out the hundred of lines from the library you’re importing. Where’s all the code for robotparser?
You can import libraries with C# too. That says nothing about the differences between languages.
That’s exactly my point though - part of my assertion of a big weakness in C# would be that more mainstream languages (python or node) have massive libraries you can draw on with existing code for simple stuff like parsing robots.txt, whereas C# has one that probably seems pretty luxurious if you’re comparing it to nothing, but is well short of what OSS programmers are accustomed to.
So yeah it’s not a purely fair language-design comparison but it’s a perfectly fair “how easy is it to get stuff done in this language” comparison. And then at a certain point it starts to become not just a convenience but a whole new area of computation (something like numpy or pytorch) that’s simply impossible in C# without a whole research project devoted to it to implement. That said, I’m sure there are areas (esp in heavily business-oriented fields like airline or medical backend or whatnot) where it’s the other way around, of course, and you have C#-specific stuff for that domain that would be real difficult to replicate in some other environment. I’m not trying to say that side doesn’t exist, just saying what’s generally applicable to my experience.
So I’m not like being critical of C# because of language features (it seems perfectly fine and functional; I get what the people are saying who say they get work done every day in it and it seems fine.) But also, I think it’s relevant that it’s missing some big advantages if you’re trying to go beyond the “it doesn’t actively punish you for using it” stage.