Changeset c185ca9


Ignore:
Timestamp:
Feb 12, 2024, 1:09:10 PM (3 months ago)
Author:
Peter A. Buhr <pabuhr@…>
Branches:
master
Children:
e7b04a3
Parents:
77bc259
Message:

more documentation on stream input

File:
1 edited

Legend:

Unmodified
Added
Removed
  • doc/user/user.tex

    r77bc259 rc185ca9  
    1111%% Created On       : Wed Apr  6 14:53:29 2016
    1212%% Last Modified By : Peter A. Buhr
    13 %% Last Modified On : Tue Jan 30 09:02:41 2024
    14 %% Update Count     : 6046
     13%% Last Modified On : Mon Feb 12 11:50:26 2024
     14%% Update Count     : 6199
    1515%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    1616
     
    41774177The \CFA header file for the I/O library is \Indexc{fstream.hfa}.
    41784178
     4179
     4180\subsubsection{Stream Output}
     4181
    41794182For implicit formatted output, the common case is printing a series of variables separated by whitespace.
    41804183\begin{cquote}
     
    42554258Note, \CFA stream variables ©stdin©, ©stdout©, ©stderr©, ©exit©, and ©abort© overload C variables ©stdin©, ©stdout©, ©stderr©, and functions ©exit© and ©abort©, respectively.
    42564259
     4260
     4261\subsubsection{Stream Input}
     4262
    42574263For implicit formatted input, the common case is reading a sequence of values separated by whitespace, where the type of an input constant must match with the type of the input variable.
    42584264\begin{cquote}
    42594265\begin{lrbox}{\myboxA}
    42604266\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    4261 int x;   double y   char z;
     4267char c;   int i;   double d
    42624268\end{cfa}
    42634269\end{lrbox}
     
    42664272\multicolumn{1}{c@{\hspace{2em}}}{\textbf{\CFA}}        & \multicolumn{1}{c@{\hspace{2em}}}{\textbf{\CC}}       & \multicolumn{1}{c}{\textbf{Python}}   \\
    42674273\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    4268 sin | x | y | z;
     4274sin | c | i | d;
    42694275\end{cfa}
    42704276&
    42714277\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    4272 cin >> x >> y >> z;
     4278cin >> c >> i >> d;
    42734279\end{cfa}
    42744280&
    42754281\begin{cfa}[aboveskip=0pt,belowskip=0pt]
    4276 x = int(input());  y = float(input());  z = input();
     4282c = input();   i = int(input());   d = float(input());
    42774283\end{cfa}
    42784284\\
    42794285\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
    4280 ®1® ®2.5® ®A®
     4286®A® ®1® ®2.5®
    42814287
    42824288
     
    42844290&
    42854291\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
    4286 ®1® ®2.5® ®A®
     4292®A® ®1® ®2.5®
    42874293
    42884294
     
    42904296&
    42914297\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
     4298®A®
    42924299®1®
    42934300®2.5®
    4294 ®A®
    42954301\end{cfa}
    42964302\end{tabular}
     
    43094315For floating-point types, any number of decimal digits, optionally preceded by a sign (©+© or ©-©), optionally containing a decimal point, and optionally followed by an exponent, ©e© or ©E©, with signed (optional) decimal digits.
    43104316Floating-point values can also be written in hexadecimal format preceded by ©0x© or ©0X© with hexadecimal digits and exponent denoted by ©p© or ©P©.
    4311 In all cases, all whitespace characters are skipped until an appropriate value is found.
    4312 \Textbf{If an appropriate value is not found, the exception ©missing_data© is raised.}
    4313 
    4314 For the C-string type, there are two input forms: any number of \Textbf{non-whitespace} characters or a quoted sequence containing any characters except the closing quote, \ie there is no escape character supported in the string..
    4315 In both cases, the string is null terminated ©'\0'©.
    4316 For the quoted string, the start and end quote characters can be any character and do not have to match \see{\ref{XXX}}.
    4317 
    4318 \VRef[Figure]{f:IOStreamFunctions} shows the I/O stream operations for interacting with files other than ©cin©, ©cout©, and ©cerr©.
     4317In all cases, whitespace characters are skipped until an appropriate value is found.
     4318\begin{cfa}[belowskip=0pt]
     4319char ch;  int i;  float f; double d;  _Complex double cxd;
     4320sin | ch | i | f | d | cxd;
     4321X   42   1234.5     0xfffp-2    3.5+7.1i
     4322\end{cfa}
     4323It is also possible to scan and ignore specific strings and whitespace using a string format.
     4324\begin{cfa}[belowskip=0pt]
     4325sin | "abc def";                                                §\C{// space matches arbitrary whitespace (2 blanks, 2 tabs)}§
     4326\end{cfa}
     4327\begin{cfa}[showspaces=true,showtabs=true,aboveskip=0pt,belowskip=0pt]
     4328®abc            def®
     4329\end{cfa}
     4330A non-whitespace format character reads the next input character, compares the format and input characters, and if equal, the input character is discarded and the next format character is tested.
     4331Note, a single whitespace in the format string matches \Textbf{any} quantity of whitespace characters from the stream (including none).
     4332
     4333For the C-string type, the default input format is any number of \Textbf{non-whitespace} characters.
     4334There is no escape character supported in an input string, but any Latin-1 character can be typed directly in the input string.
     4335For example, if the following non-whitespace output is redirected into a file by the shell:
     4336\begin{cfa}[belowskip=0pt]
     4337sout | "\n\t\f\0234\x23";
     4338\end{cfa}
     4339it can be read back from the file by redirecting the file as input using:
     4340\begin{cfa}[belowskip=0pt]
     4341char s[64];
     4342sin | wdi( sizeof(s), s );                              §\C{// must specify string size}§
     4343\end{cfa}
     4344The input string is always null terminated ©'\0'© in the input variable.
     4345Because of potential buffer overrun when reading C strings, strings are restricted to work with input manipulators \see{\VRef{s:InputManipulators}}.
     4346As well, there are multiple input-manipulators for scanning complex input string formats, \eg a quoted character or string.
     4347
     4348\Textbf{In all cases, if an invalid data value is not found for a type or format string, the exception ©missing_data© is raised and the input variable is unchanged.}
     4349For example, when reading an integer and the string ©"abc"© is found, the exception ©missing_data© is raised to ensure the program does not proceed erroneously.
     4350If a valid data value is found, but it is larger than the capacity of the input variable, such reads are undefined.
     4351
     4352
     4353\subsubsection{Stream Files}
     4354
     4355\VRef[Figure]{f:IOStreamFunctions} shows the I/O stream operations for interacting with files other than ©sin©, ©sout©, and ©cerr©.
    43194356\begin{itemize}[topsep=4pt,itemsep=2pt,parsep=0pt]
    43204357\item
     
    49324969
    49334970\subsection{Input Manipulators}
    4934 
    4935 The following \Index{manipulator}s control scanning of input values (reading), and only affect the format of the argument.
    4936 
    4937 Certain manipulators support a \newterm{scanset}, which is a simple regular expression, where the matching set contains any Latin-1 character (8-bits) or character ranges using minus.
     4971\label{s:InputManipulators}
     4972
     4973A string variable \emph{must} be large enough to contain the input sequence.
     4974To force programmers to consider buffer overruns for C-string input, C-strings may only be read with a width field, which should specify a size less than or equal to the C-string size, \eg:
     4975\begin{cfa}
     4976char line[64];
     4977sin | wdi( ®sizeof(line)®, line );              §\C{// must specify string size}§
     4978\end{cfa}
     4979
     4980Certain input manipulators support a \newterm{scanset}, which is a simple regular expression, where the matching set contains any Latin-1 character (8-bits) or character ranges using minus.
    49384981For example, the scanset \lstinline{"a-zA-Z -/?§"} matches any number of characters between ©'a'© and ©'z'©, between ©'A'© and ©'Z'©, between space and ©'/'©, and characters ©'?'© and (Latin-1) ©'§'©.
    49394982The following string is matched by this scanset:
    49404983\begin{cfa}
    4941 !&%$  abAA () ZZZ  ??  xx§\S\S\S§
    4942 \end{cfa}
    4943 To match a minus, put it as the first character, ©"-0-9"©.
    4944 Other complex forms of regular-expression matching are not supported.
    4945 
    4946 A string variable \emph{must} be large enough to contain the input sequence.
    4947 To force programmers to consider buffer overruns for C-string input, C-strings can only be read with a width field, which should specify a size less than or equal to the C-string size, \eg:
    4948 \begin{cfa}
    4949 char line[64];
    4950 sin | wdi( ®sizeof(line)®, line ); // must specify size
    4951 \end{cfa}
    4952 
    4953 Currently, there is no mechanism to detect if a value read exceeds the capwhen Most types are finite sized, \eg integral types only store value that fit into their corresponding storage, 8, 16, 32, 64, 128 bits.
    4954 Hence, an input value may be too large, and the result of the read is often considered undefined, which leads to difficlt to locate runtime errors.
    4955 All reads in \CFA check if values do not fit into the argument variable's type and raise the exception
    4956 All types are
     4984!&%$  abAA () ZZZ  ??§\S§  xx§\S\S§
     4985\end{cfa}
     4986To match a minus, make it the first character in the set, \eg ©"©{\color{red}\raisebox{-1pt}{\texttt{-}}}©0-9"©.
     4987Other complex forms of regular-expression matching are unsupported.
     4988
     4989The following \Index{manipulator}s control scanning of input values (reading) and only affect the format of the argument.
    49574990
    49584991\begin{enumerate}
    49594992\item
    4960 \Indexc{skip}( scanset )\index{manipulator!skip@©skip©}, ©skip©( $N$ )
    4961 The first form uses a scanset to skip matching characters.
    4962 The second form skips the next $N$ characters, including newline.
    4963 If the match successes, the input characters are discarded, and input continues with the next character.
     4993\Indexc{skip}( \textit{scanset} )\index{manipulator!skip@©skip©}, ©skip©( $N$ )
     4994consumes either the \textit{scanset} or the next $N$ characters, including newlines.
     4995If the match successes, the input characters are ignored, and input continues with the next character.
    49644996If the match fails, the input characters are left unread.
    49654997\begin{cfa}[belowskip=0pt]
    4966 char sk[§\,§] = "abc";
    4967 sin | "abc " | skip( sk ) | skip( 5 ); // match input sequence
     4998char scanset[§\,§] = "abc";
     4999sin | "abc§\textvisiblespace§" | skip( scanset ) | skip( 5 ); §\C{// match and skip input sequence}§
    49685000\end{cfa}
    49695001\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
    4970 ®abc   ®
    4971 ®abc  ®
    4972 ®xx®
    4973 \end{cfa}
    4974 
    4975 \item
    4976 \Indexc{wdi}( maximum, variable )\index{manipulator!wdi@©wdi©}
    4977 For all types except ©char *©, whitespace is skipped until an appropriate value is found for the specified variable type.
    4978 maximum is the maximum number of characters read for the current operation.
     5002®abc   abc  xxx®
     5003\end{cfa}
     5004Again, the blank in the format string ©"abc©\textvisiblespace©"© matches any number of whitespace characters.
     5005
     5006\item
     5007\Indexc{wdi}( \textit{maximum}, ©T & v© )\index{manipulator!wdi@©wdi©}
     5008For all types except ©char *©, whitespace is skipped and the longest sequence of non-whitespace characters matching an appropriate typed (©T©) value is read, converted into its corresponding internal form, and written into the ©T© variable.
     5009\textit{maximum} is the maximum number of characters read for the current value rather than the longest sequence.
    49795010\begin{cfa}[belowskip=0pt]
    49805011char ch;   char ca[3];   int i;   double d;   
     
    49855016\end{cfa}
    49865017Here, ©ca[0]© is type ©char©, so the width reads 3 characters \Textbf{without} a null terminator.
     5018If an input value is not found for a variable, the exception ©missing_data© is raised, and the input variable is unchanged.
    49875019
    49885020Note, input ©wdi© cannot be overloaded with output ©wd© because both have the same parameters but return different types.
     
    49905022
    49915023\item
    4992 \Indexc{wdi}( maximum size, ©char s[]© )\index{manipulator!wdi@©wdi©}
    4993 For type ©char *©, maximum is the maximum number of characters read for the current operation.
    4994 Any number of non-whitespace characters, stopping at the first whitespace character found. A terminating null character is automatically added at the end of the stored sequence
     5024\Indexc{wdi}( $maximum\ size$, ©char s[]© )\index{manipulator!wdi@©wdi©}
     5025For type ©char *©, whitespace is skippped and the longest sequence of non-whitespace characters is read, without conversion, and written into the string variable (null terminated).
     5026$maximum\ size$ is the maximum number of characters in the string variable.
     5027If the non-whitespace sequence of input characters is greater than $maximum\ size - 1$ (null termination), the exception ©cstring_length© is raised.
    49955028\begin{cfa}[belowskip=0pt]
    4996 char cstr[10];
    4997 sin | wdi( sizeof(cstr), cstr );
     5029char cs[10];
     5030sin | wdi( sizeof(cs), cs );
    49985031\end{cfa}
    49995032\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
    5000 ®abcd1233.456E+2®
    5001 \end{cfa}
    5002 
    5003 \item
    5004 \Indexc{wdi}( maximum size, maximum read, ©char s[]© )\index{manipulator!wdi@©wdi©}
    5005 For type ©char *©, maximum is the maximum number of characters read for the current operation.
     5033®012345678®
     5034\end{cfa}
     5035Nine non-whitespace character are read and the null character is added to make ten.
     5036
     5037\item
     5038\Indexc{wdi}( $maximum\ size$, $maximum\ read$, ©char s[]© )\index{manipulator!wdi@©wdi©}
     5039This manipulator is the same as the previous one, except $maximum$ $read$ is the maximum number of characters read for the current value rather than the longest sequence, where $maximum\ read$ $\le$ $maximum\ size$.
    50065040\begin{cfa}[belowskip=0pt]
    5007 char ch;   char ca[3];   int i;   double d;   
    5008 sin | wdi( sizeof(ch), ch ) | wdi( sizeof(ca), ca[0] ) | wdi( 3, i ) | wdi( 8, d );  // c == 'a', ca == "bcd", i == 123, d == 345.6
     5041char cs[10];
     5042sin | wdi( sizeof(cs), 9, cs );
    50095043\end{cfa}
    50105044\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
    5011 ®abcd1233.456E+2®
    5012 \end{cfa}
    5013 
    5014 \item
    5015 \Indexc{ignore}( reference-value )\index{manipulator!ignore@©ignore©}
    5016 For all types, the data is read from the stream depending on the argument type but ignored, \ie it is not stored in the argument.
     5045®012345678®9
     5046\end{cfa}
     5047The exception ©cstring_length© is not raised, because the read stops reading after nine characters.
     5048
     5049\item
     5050\Indexc{getline}( $wdi\ manipulator$, ©const char delimiter = '\n'© )\index{manipulator!getline@©getline©}
     5051consumes the scanset ©"[^D]D"©, where ©D© is the ©delimiter© character, which reads all characters from the current input position to the delimiter character into the string (null terminated), and consumes and ignores the delimiter.
     5052If the delimiter character is omitted, it defaults to ©'\n'© (newline).
    50175053\begin{cfa}[belowskip=0pt]
    5018 double d;
    5019 sin | ignore( d );  // d is unchanged
     5054char cs[10];
     5055sin | getline( wdi( sizeof(cs), cs ) );
     5056sin | getline( wdi( sizeof(cs), cs ), 'X' ); §\C{// X is the line delimiter}§
    50205057\end{cfa}
    50215058\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
    5022 ®  -75.35e-4® 25
    5023 \end{cfa}
    5024 
    5025 \item
    5026 \Indexc{incl}( scanset, wdi-input-string )\index{manipulator!incl@©incl©}
    5027 For C-string types only, the scanset matches any number of characters \emph{in} the set.
    5028 Matching characters are read into the C input-string and null terminated.
     5059®abc ?? #@%®
     5060®abc ?? #@%X® w
     5061\end{cfa}
     5062The same value is read for both input strings.
     5063
     5064\item
     5065\Indexc{quoted}( ©char & ch©, ©const char Ldelimiter = '\''©, ©const char Rdelimiter = '\0'© )\index{manipulator!quoted@©quoted©}
     5066consumes the string ©"LCR"©, where ©L© is the left ©delimiter© character, ©C© is the value in ©ch©, and ©R© is the right delimiter character, which skips whitespace, consumes and ignores the left delimiter, reads a single character into ©ch©, and consumes and ignores the right delimiter (3 characters).
     5067If the delimit character is omitted, it defaults to ©'\''© (single quote).
    50295068\begin{cfa}[belowskip=0pt]
    5030 char s[10];
    5031 sin | incl( "abc", s );
     5069char ch;
     5070sin | quoted( ch );   sin | quoted( ch, '"' );   sin | quoted( ch, '[', ']' );
     5071\end{cfa}
     5072\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
     5073®   'a'  "a"[a]®
     5074\end{cfa}
     5075
     5076\item
     5077\begin{sloppypar}
     5078\Indexc{quoted}( $wdi\ manipulator$, ©const char Ldelimiter = '\''©, ©const char Rdelimiter = '\0'© )\index{manipulator!quoted@©quoted©}
     5079consumes the scanset ©"L[^R]R"©, where ©L© is the left ©delimiter© character and ©R© is the right delimiter character, which skips whitespace, consumes and ignores the left delimiter, reads characters until the right-delimiter into the string variable (null terminated), and consumes and ignores the right delimiter.
     5080If the delimit character is omitted, it defaults to ©'\''© (single quote).
     5081\end{sloppypar}
     5082\begin{cfa}[belowskip=0pt]
     5083char cs[10];
     5084sin | quoted( wdi( sizeof(cs), cs ) ); §\C[3in]{// " is the start/end delimiter}§
     5085sin | quoted( wdi( sizeof(cs), cs ), '\'' ); §\C{// ' is the start/end delimiter}§
     5086sin | quoted( wdi( sizeof(cs), cs ), '[', ']' ); §\C{// [ is the start and ] is the end delimiter}\CRT§
     5087\end{cfa}
     5088\begin{cfa}[showspaces=true]
     5089®   "abc"  'abc'[abc]®
     5090\end{cfa}
     5091
     5092\item
     5093\Indexc{incl}( scanset, $wdi\ manipulator$ )\index{manipulator!incl@©incl©}
     5094consumes the scanset, which reads all the scanned characters into the string variable (null terminated).
     5095\begin{cfa}[belowskip=0pt]
     5096char cs[10];
     5097sin | incl( "abc", cs );
    50325098\end{cfa}
    50335099\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
     
    50365102
    50375103\item
    5038 \Indexc{excl}( scanset, wdi-input-string )\index{manipulator!excl@©excl©}
    5039 For C-string types, the scanset matches any number of characters \emph{not in} the set.
    5040 Non-matching characters are read into the C input-string and null terminated.
     5104\Indexc{excl}( scanset, $wdi\ manipulator$ )\index{manipulator!excl@©excl©}
     5105consumes the \emph{not} scanset, which reads all the scanned characters into the string variable (null terminated).
    50415106\begin{cfa}[belowskip=0pt]
    5042 char s[10];
    5043 sin | excl( "abc", s );
     5107char cs[10];
     5108sin | excl( "abc", cs );
    50445109\end{cfa}
    50455110\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
     
    50475112\end{cfa}
    50485113
    5049 \Indexc{quoted}( char delimit, wdi-input-string )\index{manipulator!quoted@©quoted©}
    5050 Is an ©excl© with scanset ©"delimit"©, which consumes all characters up to the delimit character.
    5051 If the delimit character is omitted, it defaults to ©'\n'© (newline).
    5052 
    5053 \item
    5054 \Indexc{getline}( char delimit, wdi-input-string )\index{manipulator!getline@©getline©}
    5055 Is an ©excl© with scanset ©"delimit"©, which consumes all characters up to the delimit character.
    5056 If the delimit character is omitted, it defaults to ©'\n'© (newline).
     5114\item
     5115\Indexc{ignore}( ©T & v© or ©const char cs[]© or $string\ manipulator$ )\index{manipulator!ignore@©ignore©}
     5116consumes the appropriate characters for the type and ignores them, so the input variable is unchanged.
     5117\begin{cfa}
     5118double d;
     5119char cs[10];
     5120sin | ignore( d );                                              §\C{// d is unchanged}§
     5121sin | ignore( cs );                                             §\C{// cs is unchanged, no wdi required}§
     5122sin | ignore( quoted( wdi( sizeof(cs), cs ) ) ); §\C{// cs is unchanged}§
     5123\end{cfa}
     5124\begin{cfa}[showspaces=true,aboveskip=0pt,belowskip=0pt]
     5125®  -75.35e-4 25 "abc"®
     5126\end{cfa}
    50575127\end{enumerate}
    50585128
Note: See TracChangeset for help on using the changeset viewer.