You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While integrating with linkedin_scraper, I've come across a potential issue where paths are directly appended to URLs without checking for the presence of query strings. This leads to malformed URLs if the original URL contains a query string.
Suggested Fix:
Before appending the path, the package should check for the presence of a query string in the URL. If one exists, the path should be inserted before the query string, and then the query string should be appended after the path. Utilizing Python's urlparse can help efficiently manage and restructure the URL.
Impact:
This change will ensure that the URLs constructed by linkedin_scraper are always correctly formatted and valid, reducing potential issues for downstream users and systems.
I believe this fix would greatly enhance the robustness of URL handling in the package. Please let me know if more information or context is needed, and I'd be happy to help further!
The text was updated successfully, but these errors were encountered:
@alicemy478 and I are interested in investigating this issue. After reviewing the latest commits, it appears that the problem is still present. We could work on a solution that checks for the presence of a query string in the URL before appending the path.
While integrating with linkedin_scraper, I've come across a potential issue where paths are directly appended to URLs without checking for the presence of query strings. This leads to malformed URLs if the original URL contains a query string.
Current Behavior:
When appending a path to a URL that already has a query string, the result is a malformed URL.
For example, appending details/experience to https://www.linkedin.com/in/douglas-b-b23472b/?trk=people-guest_people_search-card results in https://www.linkedin.com/in/douglas-b-b23472b/?trk=people-guest_people_search-card instead of the desired https://www.linkedin.com/in/douglas-b-b23472b/details/experience?trk=people-guest_people_search-card
Suggested Fix:
Before appending the path, the package should check for the presence of a query string in the URL. If one exists, the path should be inserted before the query string, and then the query string should be appended after the path. Utilizing Python's urlparse can help efficiently manage and restructure the URL.
Impact:
This change will ensure that the URLs constructed by linkedin_scraper are always correctly formatted and valid, reducing potential issues for downstream users and systems.
I believe this fix would greatly enhance the robustness of URL handling in the package. Please let me know if more information or context is needed, and I'd be happy to help further!
The text was updated successfully, but these errors were encountered: