patch(report_news): add peef.dev as source#124
Conversation
add peef.dev as sourcepeef.dev as source
200c5ae to
9b1b85f
Compare
9b1b85f to
ab0e011
Compare
peef.dev as sourcepeef.dev as source
| continue | ||
|
|
||
| link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/")) | ||
| pub_date = lastmod.text |
There was a problem hiding this comment.
what's the purpose of creating this variable
|
|
||
| articles = [] | ||
| for url in xml_raw.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url')[-MAX_ARTICLES:]: | ||
| loc = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc') |
There was a problem hiding this comment.
Can you secure each iteration with a try/except ?
There was a problem hiding this comment.
I updated, I think now only HTTP errors are remaining.
There was a problem hiding this comment.
I am not sure why we use this MAX_ARTICLES=10 limit here, can you elaborate ?
I left a comment on the other PR regarding what I think we should do here,
| link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/")) | ||
| pub_date = lastmod.text | ||
|
|
||
| if not link.startswith("https://peef.dev/post/"): |
There was a problem hiding this comment.
I would like to add here that not all content from /post/ are article
- We have comments or simple post for the early version :
/post/<slug>where slug is autogenerated - And articles are from
/post/<username>/<slug>where slug comes from the article's title
So, i think normal /post/<slug> should be excluded and use only /post/<username>/<slug>
| pub_date = lastmod.text | ||
|
|
||
| # A peef article in under the form https://peef.dev/post/<author>/<slug> | ||
| if not link.startswith("https://peef.dev/post/") or link.strip("/").count("/") != 5: |
There was a problem hiding this comment.
Oh nice, but why not making it very strict ? If we need only articles, then it should be strict, so using AND instead of OR is better
Doing that, every URL should pass both checks : start with https://peef.dev/post/ and have exactly 5 "/"
There was a problem hiding this comment.
@abdounasser202, I think, it is doing exactly what you said.
If I am wrong, please, give me more details for me to understand your point.
There was a problem hiding this comment.
Yep, you're right, I did a test, and it's doing what I was explaining
so it's fine ! 👍
No description provided.