Skip to content

patch(report_news): add peef.dev as source#124

Open
pythonbrad wants to merge 4 commits into
osscameroon:mainfrom
pythonbrad:patch_peef
Open

patch(report_news): add peef.dev as source#124
pythonbrad wants to merge 4 commits into
osscameroon:mainfrom
pythonbrad:patch_peef

Conversation

@pythonbrad

Copy link
Copy Markdown
Member

No description provided.

@pythonbrad pythonbrad changed the title report_news: add peef.dev as source report_news: add peef.dev as source Nov 7, 2024
@pythonbrad pythonbrad marked this pull request as draft November 7, 2024 17:55
@pythonbrad pythonbrad marked this pull request as ready for review November 7, 2024 19:41
@pythonbrad pythonbrad changed the title report_news: add peef.dev as source patch(report_news): add peef.dev as source Nov 7, 2024
continue

link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/"))
pub_date = lastmod.text

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of creating this variable

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sanix-Darker, which variable?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub_date


articles = []
for url in xml_raw.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url')[-MAX_ARTICLES:]:
loc = url.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you secure each iteration with a try/except ?

@pythonbrad pythonbrad Nov 7, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated, I think now only HTTP errors are remaining.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why we use this MAX_ARTICLES=10 limit here, can you elaborate ?
I left a comment on the other PR regarding what I think we should do here,

Comment thread .github/workflows/report_news/main.py Outdated
link = re.sub(r'^http://', 'https://', urlquote(loc.text, safe=":/"))
pub_date = lastmod.text

if not link.startswith("https://peef.dev/post/"):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add here that not all content from /post/ are article

  1. We have comments or simple post for the early version : /post/<slug> where slug is autogenerated
  2. And articles are from /post/<username>/<slug> where slug comes from the article's title

So, i think normal /post/<slug> should be excluded and use only /post/<username>/<slug>

@pythonbrad pythonbrad Nov 8, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdounasser202, I updated 👌

pub_date = lastmod.text

# A peef article in under the form https://peef.dev/post/<author>/<slug>
if not link.startswith("https://peef.dev/post/") or link.strip("/").count("/") != 5:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, but why not making it very strict ? If we need only articles, then it should be strict, so using AND instead of OR is better

Doing that, every URL should pass both checks : start with https://peef.dev/post/ and have exactly 5 "/"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abdounasser202, I think, it is doing exactly what you said.

If I am wrong, please, give me more details for me to understand your point.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right, I did a test, and it's doing what I was explaining
so it's fine ! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants