I have a program that require all keywords to be in a single paragraph, most of the time, separated by commas
For example:
I have those terms
1-Term
1.1-Term
2-Term
3-Term
4-Term
That i collected and organized into groups and subgroups with Titles and subtitles
Title
-
1-Term
-
1.1-Term
-
2-Term
- Sub-Title
- 3-Term
- 4-Term
- Sub-Title
But then i want to turn them into:
1-Term, 1.1-Term, 2-Term, 3-Term, 4-Term
Removing certain marked words(Titles and sub-Titles), any Empty/Blank space, and Line breaks, while adding the commas between The Terms. I want to keep certain dashes “-”(like in words )
1-Term,1.1 -Term,2-Term,3-Term,4-Term
Looks like the perfect domain for Awk in my eyes…
Your description is too vague to really get a good answer. In general, if you’re doing complex string manipulation, you’ll use a full-fledged programming language with regex support, like Python, Perl or Awk, possibly piped into each other and/or other tools like Sed or Cut. I can’t be more specific than that without a more specific description where you describe the actual data and criteria.
Are you starting with the first or second example? Why do the prefix numbers change between examples? How do you tell text and title/subtitle apart?
Why do the prefix numbers change between examples?
My bad, i fixed it
I want to show that the two terms are related e,g Star and Jedi by grouping them together
Franchises
Stars wars
JediTransformers
Also i am not able to add line breaks between bullet points in markdown. so instead i get this
Franchises
-
Stars wars
-
Jedi
-
Transformers
So i cant show the grouping thing in lemmy here. I would have also liked The list i make to be markdown compatible but i guess that separate issue.
-
Basically i collect keywords( e.g: transformers, A Deep dive, Harry Potter The worst, Xbox, stars worst, Jedi) from videos on my YouTube home page and organize them into a lists
-
YouTuber terms:
- A Deep Dive
- The Worst
- Franchises:
- Star wars
- Jedi
- Harry Potter
- Transformers
-
Companies:
- Xbox
And Turn it into:
A Deep Dive,The Worst, Star wars, Jedi, Harry Potter, Transformers,XboxRemoving the titles and subtitles.
How do you tell text and title/subtitle apart
I was thinking of putting a symbol like “#” for example, in front of the Title
# - YouTuber terms:so the script knows to ignore that whole line, like in general programming
This is not difficult to achieve at all with tools like
sedorawk. But unless you provide a concrete example input file or files, all we can do is point to those tools.Something like this?
- Franchise(Title): - Harry potter - Perfect Blue - Jurassic world - Jurassic Park - Jedi - Star wars - The clone wars - MCU - Cartoons(Sub-Title): - Gumball - Flapjack - Steven Universe - Stars vs. the forces of Evil - Wordgril - FlapjackTurned into
Harry potter,Perfect Blue,Jurassic world,Flapjack,Jedi,Star wars,The clone wars,MCU,Gumball,Flapjack,Steven Universe,Stars vs. the forces of EvilBoth “Franchis” and “Cartoons” where removed/ not included with the other words.
This is technically yaml I think, a list (with one entry) of lists that contains mostly single items but also one other list. You should be able to parse this with a yaml parser like pythons built in one.
Note that yaml is picky abiut the syntax though, so it wouldn’t be able to handle deviations.
-


