Thanks so much again for this man, any chance you've gotten it to work well for Coomer? When I try to do it all it does is get the lower-quality preview images, and it doesn't get any video files or images from the multi-photo posts. Is it required with something like that to open each and every post page before running DTA?DownThemAll! works. Use the 'Links' tab, as the 'Media' tab will just download the thumbnails.
Thanks so much again for this man, any chance you've gotten it to work well for Coomer?
Where'd you see this member describing it as such? Curious as I'd like to pick their brain.I'd never even heard of that website before I read your post. Another member just described it as 'beyond illegal' so now I'm interested!
main.py: error: argument -o/--output-directory: Not a directory:
OUTPUT_DIRECTORY
if the script finds that the folder does not already exist. I'd like to keep my scraped threads separated nicely, but I'd like to avoid browsing to my download location each time and create folders manually.parser.add_argument(
"-o",
"--output-directory",
type=file_validator(directory=True),
default=".",
help="Directory to download media files to. Default is '.'",
)
Personally as a Unix nerd, my solution would be to create that directory first before running the scraper. You could also write a script that ensures that the output directory is created before running the scraper. However, I understand that people have different requirements, and some of us are even forced to do terrible things, like use MS Windows. So for that, yes you can remove theFirst of all, thank you mason2371 for sharing this great little tool
The issue I need some help with is this:main.py: error: argument -o/--output-directory: Not a directory:
I'm wondering what the best approach is to automatically create theOUTPUT_DIRECTORY
if the script finds that the folder does not already exist. I'd like to keep my scraped threads separated nicely, but I'd like to avoid browsing to my download location each time and create folders manually.
As I understand it the folder check happens when the argument is parsed here:
Python:parser.add_argument( "-o", "--output-directory", type=file_validator(directory=True), default=".", help="Directory to download media files to. Default is '.'", )
I'm trying to understand if I can just comment out the validation check and then add a folder creation command like this: https://www.geeksforgeeks.org/how-to-create-directory-if-it-does-not-exist-using-python/
type=file_validator(directory=True),
def file_validator(directory: bool) -> Callable[[str], pathlib.PurePath]:
"""
Convert a path string to a Path object. Raises ArgumentTypeError if the path does
not exist, or if the path is not the correct type of file as specified by the
directory argument. Meant to be used as the type argument for an argparse arg.
"""
def _validator(pathstr: str):
path = pathlib.Path(pathstr)
if directory:
try:
path.mkdir(parents=True, exist_ok=True)
except FileExistsError:
raise argparse.ArgumentTypeError(f"Not a directory: {pathstr}")
else:
if not path.is_file():
raise argparse.ArgumentTypeError(f"File not found: {pathstr}")
return path.resolve()
return _validator
Personally as a Unix nerd, my solution would be to create that directory first before running the scraper. You could also write a script that ensures that the output directory is created before running the scraper. However, I understand that people have different requirements, and some of us are even forced to do terrible things, like use MS Windows. So for that, yes you can remove theline and it should work fine. However, that line is there to ensure you didn't mess up your command and type the wrong directory, or worse yet type the name of an existing file. You could also add the code from that article to the script if you'd like. I would recommend you add it to the file_validator function so you can still detect errors such as trying to save into an existing file. That would look something like this:Code:type=file_validator(directory=True),
Python:def file_validator(directory: bool) -> Callable[[str], pathlib.PurePath]: """ Convert a path string to a Path object. Raises ArgumentTypeError if the path does not exist, or if the path is not the correct type of file as specified by the directory argument. Meant to be used as the type argument for an argparse arg. """ def _validator(pathstr: str): path = pathlib.Path(pathstr) if directory: try: path.mkdir(parents=True, exist_ok=True) except FileExistsError: raise argparse.ArgumentTypeError(f"Not a directory: {pathstr}") else: if not path.is_file(): raise argparse.ArgumentTypeError(f"File not found: {pathstr}") return path.resolve() return _validator
path.mkdir
command with the file validator so this is even better.{
"name": "Tits In Tops Thread Scraper Tool",
"script_path": "python3 main.py",
"working_directory": "/app/scripts/titsintops",
"description": "By mason2371",
"group": "scrapers",
"output_format": "terminal",
"parameters": [
{
"name": "Dry run",
"param": "--dry-run",
"no_value": true,
"description": "Do not write downloaded media to disk"
},
{
"name": "Username",
"required": true,
"param": "--username",
"type": "text",
"default": "CHANGE ME",
"constant": true
},
{
"name": "Password",
"required": true,
"param": "--password",
"type": "text",
"default": "CHANGE ME",
"constant": true
},
{
"name": "Verbose",
"param": "--verbose",
"no_value": true,
"description": "Print more detailed diagnostic messages"
},
{
"name": "Quiet",
"param": "--quiet",
"type": "text",
"no_value": true,
"description": "Do not print standard output messages. Does not affect --verbose diagnostic messages"
},
{
"name": "Overwrite",
"param": "--clobber",
"no_value": true,
"description": "Overwrite files that already exist in output directory. Default is to save files with a new name instead."
},
{
"name": "Link List",
"param": "--links",
"no_value": false,
"default": "links.txt",
"constant": true,
"description": "Write links posted in thread to a file"
},
{
"name": "Archive Enabled",
"param": "--archive",
"type": "text",
"default": "archive.txt",
"constant": true,
"description": "Record downloaded media to a file, and skip media files already listed in the archive"
},
{
"name": "Paginate",
"required": false,
"param": "--paginate",
"no_value": true,
"description": " Store files in directories for each thread page. Useful for extremely large threads as thousands of files in the same directory tends to cause performance issues."
},
{
"name": "Output Folder",
"required": true,
"param": "--output-directory",
"type": "text",
"default": "unsorted",
"description": "Directory to download media files to. Default is '.'"
},
{
"name": "Thread",
"required": true,
"type": "text",
"description": "URL of the thread to download"
}
]
}
Nice. Tits are of course the best motivator. And thanks for that github link, never heard of that project but I might have to set that up on my own server now.Thank you very much! This modification works like a charm. Didn't consider combining thepath.mkdir
command with the file validator so this is even better.
While my PC is running Windows, I'm actually hosting the script on my NAS which is running Linux. Since my last post, I went down a bit of a rabbit hole. I was able to import the script into script-server on my NAS. Now I can use it from my desktop browser without needing to open a terminal each time. I included the script-server config file below in case anyone would like to use it.
Learning a lot this week all in the name of boobs... hah...
JSON:{ "name": "Tits In Tops Thread Scraper Tool", "script_path": "python3 main.py", "working_directory": "/app/scripts/titsintops", "description": "By mason2371", "group": "scrapers", "output_format": "terminal", "parameters": [ { "name": "Dry run", "param": "--dry-run", "no_value": true, "description": "Do not write downloaded media to disk" }, { "name": "Username", "required": true, "param": "--username", "type": "text", "default": "CHANGE ME", "constant": true }, { "name": "Password", "required": true, "param": "--password", "type": "text", "default": "CHANGE ME", "constant": true }, { "name": "Verbose", "param": "--verbose", "no_value": true, "description": "Print more detailed diagnostic messages" }, { "name": "Quiet", "param": "--quiet", "type": "text", "no_value": true, "description": "Do not print standard output messages. Does not affect --verbose diagnostic messages" }, { "name": "Overwrite", "param": "--clobber", "no_value": true, "description": "Overwrite files that already exist in output directory. Default is to save files with a new name instead." }, { "name": "Link List", "param": "--links", "no_value": false, "default": "links.txt", "constant": true, "description": "Write links posted in thread to a file" }, { "name": "Archive Enabled", "param": "--archive", "type": "text", "default": "archive.txt", "constant": true, "description": "Record downloaded media to a file, and skip media files already listed in the archive" }, { "name": "Paginate", "required": false, "param": "--paginate", "no_value": true, "description": " Store files in directories for each thread page. Useful for extremely large threads as thousands of files in the same directory tends to cause performance issues." }, { "name": "Output Folder", "required": true, "param": "--output-directory", "type": "text", "default": "unsorted", "description": "Directory to download media files to. Default is '.'" }, { "name": "Thread", "required": true, "type": "text", "description": "URL of the thread to download" } ] }
Something got messed up with the site. For a while it said it was not available to me when I tried to visit today. And I had to clear out my cookies and site data and then explicitly tell my browser to proceed after it blocked me going forward. The certificate of the site might just need updating.Getting SSL certificate failed error.
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests/adapters.py", line 517, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='titsintops.com', port=443): Max retries exceeded with url: /phpBB2/index.php?login/ (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1006)')))
I thought it was just me. I'm still getting multiple errors and have to clear browsing data just to get access again. It took me 30 mins to type a comment earlier.Something got messed up with the site. For a while it said it was not available to me when I tried to visit today. And I had to clear out my cookies and site data and then explicitly tell my browser to proceed after it blocked me going forward. The certificate of the site might just need updating.
Unfortunately it does break the script. Hoping whatever changed/broke is resolved soon. This is such a great utility.
.
It's working again. The site certificate issue is resolved.
There already is one that does cyberdropim too dumb to know how to get this to work
it would be great if this worked with cyberdrop
Might be some server issues happening. I haven't had that same error again but there are times when the site is really slow to load or some links just won't open. Even the alerts dropdown will take a long time sometimes.I thought it was just me. I'm still getting multiple errors and have to clear browsing data just to get access again. It took me 30 mins to type a comment earlier.