14 Commits

Author SHA1 Message Date
dependabot[bot]
0f7d648975
Bump tornado from 6.2 to 6.3.2 in /apps/web-crawl-q-and-a (#459)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.2 to 6.3.2.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.2.0...v6.3.2)

---
updated-dependencies:
- dependency-name: tornado
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 16:05:37 -07:00
DevilsWorkShop
39b62a6c09
Catch the exception thrown by the With.Open and continue with the queue (#155)
Co-authored-by: Ashok Manghat <amanghat@rmplc.net>
2023-09-11 15:55:57 -07:00
ys64
7cda7e2df7
add the last chunk to the list of chunks in web-qa.ipynb (#691) 2023-09-11 14:54:40 -07:00
Simón Fishman
b2ca4d395c
Revert "File name sanitization (#630)" (#668)
This reverts commit 169f5e02c8ab13372bb066263424f9ddb31f7f9f.
2023-08-29 17:45:47 -07:00
Safa Asgar
169f5e02c8
File name sanitization (#630)
* File name sanitization

URL containing reserved characters blocks file name creation.

* Regular Expression fix for Sanitized URL

Co-authored-by: Simón Fishman <simonpfish@gmail.com>

---------

Co-authored-by: Simón Fishman <simonpfish@gmail.com>
2023-08-29 10:49:23 -07:00
Tomas Dulka
4fd2b1a6d2
replace eval with safer literal_eval (#561) 2023-07-17 16:40:54 -07:00
Darshan Panchal
e66613331a
Update requirements.txt
removed html since it was not required
2023-05-11 09:21:34 +05:30
Alexander Khapaev
ee9b6268d4 Updated the get_domain_hyperlinks function to include handling of tel: links in addition to mailto: links, to exclude them from the clean links list. 2023-04-07 18:28:44 +03:00
fabiofranco85
5a80ef2571
Improve regex 2023-03-27 07:38:35 -03:00
William Buck
ca9b9d485d
remove duplicate import of distances_from_embeddings 2023-03-20 13:02:37 -07:00
Sung Kim
3210b38e35
Add handling for last chunk in split_into_sentences function
I have added handling for the last chunk in the split_into_sentences function. Previously, the function did not account for the last chunk, which could lead to incomplete sentences in the output.

To solve this, I added a conditional statement to check if the last chunk is non-empty. If it is, I append it to the list of chunks with a period to ensure the last sentence is complete.

This change improves the accuracy of the split_into_sentences function and ensures that all sentences in the input text are properly segmented. Please review and let me know if you have any feedback or concerns.
2023-02-19 11:00:27 +09:00
Logan Kilpatrick
3826607431
Add comment on where to learn about rate limits 2023-02-17 06:16:14 -06:00
Daniel Zhukovsky
be9877edbf
Redefinition of unused 'pd' 2023-02-16 15:05:04 +00:00
isafulf
daf8e0d011 rename web crawl q and a 2023-02-11 16:37:29 -08:00