I have some sewing patterns that I would like to share (and hopefully swap) but all of the PDFs have a

“This was purchased by John Doe john.doe@email.com #ordernumber - if you are not John Doe, please dob in the person you got this from to company@example.com so we can sick our lawyers on them”

sorta footer on every single page.

Obviously for privacy reasons (and because I don’t actually want lawyers sicked onto me), I need to remove this footer.

These are often complex PDFs with more than a hundred pages and multiple layers.

I managed to successfully remove the editing password (not user/viewing password, just can’t edit without password) with qpdf --decrypt. But removing that footer has left me at a dead end. I have even tried manually removing every single instance of those footers using Master PDF Editor but saving the file flattened it and you are no longer able to show/hide layers which is essential for correct printing. (Please don’t ask me how many different PDF editors I have tried because it has been so so SO many I have lost count).

Not that I really want to have to manually edit this out on what could amount to over a thousand pages but searching for a command to remove a certain phrase has come up empty. Even Master PDF Editor doesn’t seem to have a bulk remove or search and replace function (just search).

I use Linux btw.

  • dysprosium@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    6 days ago

    Perhaps just blocking it with a white box and then converting it into an image-only pdf. This makes the file much bigger but will keep the exact same layout while also removing the text blocked by the white box.

    Or does it have any interactive elements? What do you mean by “layers”?

  • Auster@thebrainbin.org
    link
    fedilink
    arrow-up
    13
    ·
    6 days ago

    Iirc, tested it out quite a few years ago, and I had to use a software that would both decompile and recompile the PDF, and while it was decompiled, I had to remove the repeating pattern I didn’t want with something like Notepad++. File got recompiled a bit over 50% bigger iirc, maybe different compression methods, but the pages themselves didn’t seem affected.

    Sadly can’t remember the name of the program I used for compiling and recompiling, only that it’d do both and that I looked for how to remove watermarks from PDFs. Also the program was certainly offline.

    • Auster@thebrainbin.org
      link
      fedilink
      arrow-up
      12
      ·
      6 days ago

      Found a few candidate tools though can test neither now, mutool (part of the mupdf tools), PDFtk, qpdf, pdf2txt (name sounds familiar though it might be memory playing tricks).

      If any of those could be found as a single portable exe around 2020, chances are it is the tool I used for it.

      • I Cast Fist@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 days ago

        There’s also GhostScript, which feels like advanced tooling for dealing with PDFs. I used it to scale down the images in some game PDFs I have and save as a copy, so that my old phones could actually open them. The Warcrow free PDF from Corvus Belli went from a whopping 623MB to around 111MB. Still way too fucking large for my phones. The PDF for Mantic Firefight went from 65MB to ~20MB.

  • Grumpy@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    5
    ·
    edit-2
    6 days ago

    My best answer at this point would be that you need to make your own program to find and remove by content. Because no other manual pdf editor would reasonably have such a feature since it’s so niche.

    Also vibe coding with AI tends to be very good for singular tasks like this.

    I wouldn’t recommend converting the pdf to anything else since that would remove the layer info unless it’s to more complicated formats like EPS, illustrator, etc.

  • cecilkorik@piefed.ca
    link
    fedilink
    English
    arrow-up
    38
    ·
    6 days ago

    Just because the visible footer gets removed doesn’t mean there isn’t other unique tracking information hidden deep in the PDF that could still get the lawyers sicced on you. Depending on how valuable this information is to the company, and how litigious they are, you have to judge how far they might’ve gone and might yet go to protect it.

    Unfortunately, that’s why this kind of copy protection can an actually be an effective tactic to prevent individuals from sharing their copies. While there might be ways to strip this kind of hidden data on simpler PDFs… even resorting to methods like screenshotting or printing and scanning, still cannot give you absolute confidence that there isn’t some subtle unique identifier invisibly hidden in the layout or through subtle inconspicuous variations, especially if you’re doing this regularly and they start targeting you and your account for identification. And on complex PDFs there are so many more ways they could hide this information digitally if they know where to look for it and you don’t. 99% of the time it’s going to be pretty obvious to strip out, but are you willing to take that risk even if you do find a technical method of removing the visible footers? If it’s a one-off, maybe you can get away with it, but in the long term this strategy is not viable and is a trap for rookies.

    The only truly safe way to share digitally watermarked content like this is to buy it with a burner account and full opsec in the first place. Nobody to sic lawyers on if it’s a hacked paypal or a stolen/prepaid credit card or an untraceable email and IP, or in a jurisdiction with no enforcement. Smash and grab, get the data anonymously and get out. Don’t share stuff from your personal account that’s literally got your name and banking information attached to it unless you can confirm it’s bit-for-bit indistinguishable from other innocent copies with something like a checksum.

    • Pup Biru@aussie.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      5 days ago

      to really hammer home this “many ways to hide”: the PDF is kinda just like a container… it contains other things like images (the patterns for example)… these patterns are probably vector graphics (made up of lines rather than pixels)… this means you can magnify them basically infinitely… and they can contain transparent lines and all sorts of things. they could easily embed that same text in the SVG image, at tiny scale (less than a pixel at 100% scale), and make it transparent… no PDF editor is going to touch the image data: it simply doesn’t really understand it to that degree - it’s an image; not a PDF after all… so that information will remain even after you’ve removed all visible/reasonable marks

      this is just 1 example of practically infinite places it could be - and remember, this text is just lines in an image! it’s not like you can ctrl+f for the text necessarily… you’d have to go through every image manually and inspect every single line, and even then there are no guarantees (perhaps they encoded that information like morse code in bumps in some lines that are only barely visible at 1000% magnification)

  • Tolookah@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    9
    ·
    6 days ago

    MaM irc or forums might be able to help with that, if you’re a member, they deal with PDFs and such all the time.

      • Tolookah@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        4
        ·
        6 days ago

        Two replies there that came to my attention, while I’m unable to get back to sleep at 5am. An old one mentioning https://github.com/kanzure/pdfparanoia which seems to be an old tool that removes watermarks, hasn’t been updated in 5 years, but neither has the PDF spec?

        The other is this paste of text:

        if it helps anyone, here’s what I do to prepare a pattern for uploading to make sure it is ‘clean’:

        Check over the PDF files for any reference to my name/email address (usually this is in a footer on each page, and not every pattern company does this)
        If my personal details are present, I unlock the files using a site like ILovePDF - There are other sites but this one has no daily limits
        Open the unlocked PDF in Adobe Professional or another PDF editor of your choice and delete the footer box. You can just delete the box on the first page it appears, or the first page it is a standalone box, then save the pdf, close and reopen it - usually it will now be gone from all pages.
        Repeat for any other PDF files (obviously)
        Run PDF and jpeg files through an exif cleaner
        Double check and upload.
        
  • queerlilhayseed@piefed.blahaj.zone
    link
    fedilink
    English
    arrow-up
    7
    ·
    6 days ago

    You might be able to do a find and replace with https://github.com/pymupdf/PyMuPDF . I’m not an expert on PDFs, so I’m not sure if it can be done in a way that preserves all the important formatting, but if you feel comfortable DMing me the PDF (or one of similar complexity) I could try to write a script that replaces all instances of the target text in a way that preserves the rest of the document.

  • Hackerpunk1@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    6 days ago

    You need to remove the edit password. Had success using Passware Kit to remove password. From there convert the pdf to word and remove what you need.

    • Thorned_Rose@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 days ago

      I was already able to remove the edit password with qpdf --decrypt. Most of the PDF editors I used, changed the PDF too much (e.g. added margins/padding) which ruined the very specific layout needed for the patterns to work. There has to be no changes to the PDFs apart from removing the ‘footer’ text :/

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    6 days ago

    Can you just drop a white box over it?

    Edit: if you’re sharing the PDF I suppose not

    • Onomatopoeia@lemmy.cafe
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      6 days ago

      You could do this with a PDF editor, then print to PDF so it’s a new file.

      Or use a PDF to Word converter (or similar), which would enable removing such things. Though that can be tricky

      • Thorned_Rose@sh.itjust.worksOP
        link
        fedilink
        English
        arrow-up
        4
        ·
        6 days ago

        I tried doing print to PDF but it flattened the resultant PDF so the layers were lost. Almost all of the software I used to try converting altered the PDF layout in some way and patterns must not change at all, otherwise they get messed up :/

  • istdaslol@feddit.org
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 days ago

    Adobo Acrobat can sensor pdfs, afaik you can choose between black and white so maybe this could be a manual road

  • mub@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 days ago

    Honestly there are so many ways pdfs can hold unique information that retaining the same format is probably not going to work.

    The 2 options I suggest are . . .

    1. Use a tool to extra the text into a plain text fromat then paste it into word along with screen shots of any images you need.
    2. You could try converting each page to a JPG image using something like Gimp. You could then desaturate it or convert it to pure black and white so it reveals any hidden text which you can paint over. If you want the images in colour just crop them out from the original file and paste into the new b&w copy. This is a crap description but hopefully it makes the point.